Intuition, reason, and metacognition
Highlights
► We substantiate a model of metacognitive monitoring and control in reasoning and decision-making. ► The fluency of generation determines the strength of the Feeling of Rightness for a decision. ► The strength of the Feeling of Rightness determines degree and type of subsequent analysis. ► Reasoning biasses may be confidently held because they are fluently retrieved. ► We integrate literatures on metacognition, confidence, and Dual Process Theories of reasoning.
Introduction
There is much evidence to support the thesis that reasoning and decision-making are accomplished by recourse to two qualitatively different types of processes (see Evans and Frankish (2009) for a recent review), differing in terms of the degree to which they are characterized as fast and automatic (Type 1) or slow and deliberate (Type 2). A variety of Dual Process Theories (DPT) have been proposed to explain the interaction of these two processing systems (e.g., Evans, 2006, Kahneman, 2003, Sloman, 2002, Stanovich, 2004). Although they make somewhat different claims about the extent, degree, and timing of Type 2 processes, they share the basic assumption that automatic Type 1 processes give rise to a highly contextualised representation of the problem and attendant judgments that may or may not be analysed extensively by more deliberate, decontextualised Type 2 processes.
According to DPTs, the outcome of a given reasoning attempt is determined jointly by the content of the information that is retrieved by Type 1 processes (see Kahneman, 2003, Stanovich, 2004 for extensive analyses) and by the quality and depth of Type 2 processing. As such, the explanatory value of DPTs depend critically on their ability to predict the circumstances under which Type 2 processes are more or less engaged (Evans, 2009, Stanovich, 2009, Thompson, 2009, Thompson, 2010). To date, explanations have focussed on global characteristics of the reasoner, such as cognitive capacity (De Neys, 2006a, Stanovich, 1999) or aspects of the environment, such as the amount of time allotted to complete the task (Evans and Curtis-Holmes, 2005, Finucane et al., 2000), the instructions provided to the reasoner (Daniel and Klaczynski, 2006, Evans et al., 1994, Newstead et al., 1992, Vadeboncoeur and Markovits, 1999), or variables that create a global perception of difficulty, such as presenting problems in a difficult-to-read font (Alter, Oppenheimer, Epley, & Eyre, 2007).
Missing from this analysis is an account of item-specific cues that trigger Type 2 thinking. To illustrate, consider the following two items. One is taken from Frederick’s (2005) Cognitive Reflection Test (CRT) and the second is an isomorphic version of it (Thompson, 2009):
If it takes 5 machines 5 min to make 5 widgets, how long would it take 100 machines to make 100 widgets?
____ minutes
If it takes 5 machines 2 min to make 10 widgets, how long would it take 100 machines to make 100 widgets?
____ minutes
The first problem strongly cues the response “100”, which is, in fact, erroneous but often given as an answer (Frederick, 2005). From a DPT view, Type 1 processes produce an initial response to the first version of the problem (i.e., 100). This answer is then examined by Type 2 processes, determined to be satisfactory, and (incorrectly) given as the answer by a large majority of participants (Evans, 2009, Kahneman, 2003). Less clear is the explanation for why this answer is so readily deemed to be satisfactory and the subsequent Type 2 analysis is so cursory; also missing is the explanation for why the second version of the problem is more likely to suggest that mental effort (Type 2 processing) will be needed to achieve the solution.
Such variability in performance across nominally equivalent problems is common (e.g. Bucciarelli and Johnson-Laird, 1999, Johnson-Laird, 1983, Marcus and Rips, 1979). The question, therefore, becomes this: For a given participant of a given cognitive capacity, operating under a given set of task instructions, in a given environment, what predicts the degree of Type 2 engagement? In the current paper, we propose an answer to this question that is grounded in basic metacognitive processes. Specifically, we posit that a third category of process monitors Type 1 outputs (Simmons and Nelson, 2006, Thompson, 2009, Thompson, 2010) and determines the extent of Type 2 engagement (see Evans, 2009, Stanovich, 2009 for related discussions).
This proposal draws heavily on the metamemory literature, which has long acknowledged the distinction between the processes responsible for retrieving information from memory and the processes responsible for monitoring that information (see Dunlosky and Bjork (2008) for an overview). Monitoring refers to the “subjective assessment of one’s own cognitive processes and knowledge” (Koriat, Ma’ayan, & Nussinson, 2006, p. 38). This assessment can be derived inferentially from implicit cues, such as the ease with which a memory comes to mind (Benjamin et al., 1998, Koriat and Ma’ayan, 2005), or based on explicit cues, such as beliefs about one’s skill at a task (e.g., Dunning et al., 2003, Prowse Turner and Thompson, 2009; see Koriat (2007), for a review).
By extension, we posit a set of processes that monitor Type 1 outputs, which in turn, determine the depth of Type 2 thinking. Under this proposal, Type 1 processes generate two distinct outputs: The first is the content of the initial answer and the second is an accompanying sense of the correctness of that answer (Simmons and Nelson, 2006, Thompson, 2009). This Feelings of Rightness (FOR) is predicted to vary in strength across a set of problems. On problems such as the first widget problem above, the FOR should provide a compelling cue that the initial inference is the correct one. In a variety of other situations, the FOR will be less compelling. In this way, the generation of Type 1 outputs is proposed to be analogous to memory retrievals (Thompson, 2009, Thompson, 2010), which, in addition to a specific memory content, also carry an affective component that acts as a cue to the correctness of the retrieval (e.g., Fazendeiro et al., 2005, Gruppuso et al., 2007, Johnston et al., 1985, Whittlesea, 1993; see Koriat (2007) for an extensive review).
Crucially, from the point of view of the current argument, the metacognitive experiences associated with memory retrievals determine both the allocation of resources as well as study or problem solving strategies. For example, metacognitive experiences predict the length of time spent studying an item for a subsequent recall test (e.g., Mazzoni and Cornoldi, 1993, Nelson, 1993, Son, 2004, Son and Metcalfe, 2000), the amount of time spent searching for an item in memory (Singer & Tiede, 2008), the decision to mass or distribute practice (Benjamin and Bird, 2006, Son, 2004), as well as the decision to derive a solution by computation vs. retrieve the answer from memory (Reder & Ritter, 1992). By analogy, therefore, the Feeling of Rightness (FOR) that accompanies Type 1 processing should signal whether the current output suffices or whether additional Type 2 processes are needed (Thompson, 2009, Thompson, 2010).
Surprisingly, there is little research on the role of these types of metacognitive processes in reasoning and decision-making. That is, the monitoring and control aspects of metacognitive function have been relatively neglected in this field, despite the large literature on the factors that produce overconfidence in judgments (e.g., Caputo and Dunning, 2005, Dunning et al., 2003, Ehrlinger et al., 2008, Hansson et al., 2008).
One exception is a study by Simmons and Nelson (2006), who provided preliminary support for a monitoring mechanism that constrains Type 2 analysis. Their task required participants to place a wager “against the spread”. This required them to decide whether the favoured team would win by a given margin, or whether the underdog would lose by less than that margin. Although the spread is constructed to equalise the probability of the two outcomes, people overwhelmingly bet on the favourite. Thus, betting on the favourite was categorized as the intuitive response. Confidence in the intuitive response was operationalised as the degree of consensus that the favourite would win, regardless of the spread. They found that the higher the degree of consensus, the more likely participants were to wager on the favourite, and the more confident they were in their bet. On this basis, they concluded that intuitive answers are often accompanied by a sense of correctness, which then determines the probability of subsequent processing.
The experiments reported in the current paper advanced on this initial work in a number of ways: First, we sought to measure the FOR explicitly, rather than by using a proxy such as consensus. This allowed us to make a direct link to the metamemory literature, where metacognitive constructs such as the Feeling of Knowing or Judgments of Learning are solicited directly from participants (see Van Overschelde (2008), for a recent overview). The direct measurement also allowed us to rule out an alternative explanation for the relationship between the consensus variable and the measure of intuitive responding, namely that both reflected a preference to choose the favourite. Importantly, we also derived measures of Type 2 engagement apart from participants’ tendency to give the intuitive response. This is because giving the intuitive answer in no way implies the absence of Type 2 thinking, given that participants may decide, after a lengthy period of Type 2 thinking, to go with the intuitive answer after all. Finally, we tested a number of hypotheses about the underlying determinants of the FOR.
Our first goal was to examine how participants monitor and regulate their performance over a series of problems. To do so, we modified Koriat and Goldsmith’s (1996) quantity-accuracy profile (QAP; see Goldsmith and Koriat (2008) for a recent review). Their procedure was designed to test, among other things, the efficacy of monitoring initial retrievals to determine whether they should be subsequently given as a response. For the QAP, participants are given two memory tests. On the first, they are required to respond to every item; they then give a confidence judgment, which is used to predict their performance under free-report conditions.
We adapted this procedure to fit the current requirements as follows: Participants gave two responses to each of a series of reasoning problems. For the first, participants were told that we were interested in studying reasoners’ intuitions and they were instructed to give the first answer that came to mind. Following this initial response, a subjective measure of the FOR was taken using a likert scale. The format of the scale varied somewhat; in some cases they were asked to evaluate certainty and in others, rightness. They were told to give the answer that was their first instinct or gut feeling. As a manipulation check, we asked them to indicate whether or not they had, indeed, done so for each trial. This initial response presumably reflects the outcome of Type 1 processing with minimal Type 2 analysis.
This assumption is based on the findings of several studies indicating that fast responses are more likely than slow responses to reflect the output of heuristic, Type 1 processes (De Neys, 2006b, Evans and Curtis-Holmes, 2005, Finucane et al., 2000, Roberts and Newton, 2001, Tsujii and Watanabe, 2010). For example, when forced to respond quickly, reasoners are more likely to respond on the basis of conclusion believability than when allowed additional time to consider their responses (Evans and Curtis-Holmes, 2005, Tsujii and Watanabe, 2010); they are also more likely to show matching bias on Wason’s selection task (Roberts & Newton, 2001) and to make choices guided by affect (Finucane et al., 2000). Similarly, heuristic, Type 1 responses require less time to produce than their analytic counterparts (De Neys, 2006b). Finally, near-infrared spectroscopy shows that the interior frontal cortex, which is involved in inhibiting belief-based responses, is less engaged when participants are forced to respond quickly than when they are not (Tsujii & Watanabe, 2010). Thus, requiring a fast response from participants should produce responses that are based largely on the output of Type 1 processes.
To measure Type 2 engagement, participants were allowed as much time as needed to produce a final answer to the problems. Although the instructions were tailored to the specific tasks that participants completed, they all indicated that participants should be sure at this point that they had taken their time and thought about the problem carefully.
From this, we derived three measures of Type 2 engagement. The first measure was the degree or probability of change from the first answer to the second answer. A change of answer would indicate that some additional analysis had taken place and should therefore be a reliable index of Type 2 thinking. Note that while a change of answer should reliably indicate Type 2 engagement, failure to do so is not evidence for the absence of Type 2 engagement. That is, there is reason to believe that at least some Type 2 thinking is directed at rationalising the initial response (e.g., Evans, 1996, Shynkaruk and Thompson, 2006, Stanovich, 2004, Stanovich, 2009; see also Wilson and Dunn (2004) for a related discussion). Thus, we also measured the amount of time spent re-thinking each problem. Given that Type 2 processes are assumed to be deliberate, time consuming processes, the amount of time spent engaging in a problem should be a reliable index of the extent of Type 2 processing (De Neys, 2006b).
Our third measure was a traditional measure of analytic engagement, namely whether or not the final answer was correct by a relevant normative standard. Such measures are presumed to reflect successful application of the rules of probability or logic; success by these standards is typically more likely among those of high cognitive or working memory capacity (De Neys et al., 2005a, De Neys and Verschueren, 2006, Stanovich, 1999) and thus thought to be a signature of capacity-demanding, deliberate Type 2 processes. However, given that Type 2 processes may also be engaged to produce normatively incorrect responses (Evans, 2007b, Stanovich, 2009) and that normatively correct responses may be produced by non-analytic processes (Gigerenzer et al., 1999, Oaksford and Chater, 2007), this latter measure should be considered the least reliable indicator of Type 2 engagement.
The hypothesis tested was that initial responses to reasoning problems are accompanied by a FOR that determines the quality and extent of Type 2 processes. On this basis, it was predicted that low FOR’s should lead to longer rethinking times, greater probability of answer change, and increased probability of normatively correct responding than high FOR’s.
There were two strategies for testing the hypothesis. In all four experiments, we measured FORs across a series of trials and used this to predict rethinking times, answer changes, and the probability of giving a normatively correct answer. In Experiments 3 and 4, we manipulated factors that were predicted to affect FOR judgments, and compared outcomes in conditions with relatively lower and higher FOR judgments.
We used four different reasoning tasks: Experiment 1 tested conditional reasoning, a form of deductive reasoning in which participants are asked to make inferences about conditional relationships (e.g., if the car runs out of gas, then it stalls). Experiment 2 tested a more complex version of this task in which participant were asked to determine what follows from two conditional statements (e.g., if something is a rose, then it has a gebber; if something has a gebber, then it is a flower). Experiment 3 used a probability judgment task, in which participants were asked to estimate the probability than an individual belonged to a category given two pieces of information: the base rate of the category and a personality description. Finally, Experiment 4 used categorical syllogisms, in which participants were asked to evaluate the validity of conclusions drawn from quantified premises.
In addition to testing the relationship between FORs and Type 2 thinking, a second goal was to investigate potential determinants of the FOR. In the metamemory literature, several variables have been demonstrated to mediate metamemory judgments; many of these variables are tied to retrieval processes, rather than to the contents of memory per se (e.g., Benjamin et al., 1998, Busey et al., 2000, Jacoby et al., 1989, Koriat, 1995, Koriat, 1997, Koriat and Levy-Sadot, 1999, Schwartz et al., 1997). For example, familiarity of the retrieval cues, as opposed to familiarity of the answer determines Feelings of Knowing (Reder and Ritter, 1992, Schunn et al., 1997, Vernon and Usher, 2003), as does the amount of ancillary information that is brought to mind during the retrieval attempt (Koriat, 1993, Koriat, 1995, Koriat et al., 2003).
We identified three variables that might play a similar role in reasoning judgements; one was a task-independent variable called answer fluency which was tested in all four experiments. In addition, we examined two task-specific variables, namely the probability that a conclusion is accepted as valid (Experiments 1 and 2) and the presence or absence of competing responses (Experiment 3).
Answer fluency refers to the ease with which the initial conclusion comes to mind. In the metamemory literature, there is much evidence to suggest that the fluency with which items can be retrieved from memory is a powerful determinant of the sense that they have been or will be accurately remembered (e.g., Benjamin et al., 1998, Jacoby et al., 1989, Kelley and Jacoby, 1993, Kelley and Jacoby, 1996, Matvey et al., 2001, Whittlesea and Leboe, 2003). Indeed, fluent processing can produce an illusion that an item has been previously experienced, regardless of whether it has or not (e.g., Jacoby et al., 1989, Whittlesea et al., 1990). Similarly, and critical for the current work, subjective confidence in the correctness of a memory retrieval varies as a function of the speed with which the answer comes to mind (Costermans et al., 1992, Kelley and Lindsay, 1993, Robinson et al., 1997). Consequently, the most straightforward prediction about the origins of the FOR accompanying an initial answer is that it is determined by the speed or fluency with which that answer is produced.
This concept of answer fluency is distinct from that of processing fluency (e.g., Koriat, 2007, Van Overschelde, 2008) that has been shown to affect other types of complex judgments (see Schwarz (2004) for a summary). Making a stimulus difficult to perceive can affect judgments of aesthetic pleasure (Reber, Schwarz, & Winkielman, 2004), truth (Reber & Schwarz, 1999), and reasoning (Alter et al., 2007). For example, Alter et al. (2007) demonstrated that reasoners were more likely to give normatively correct answers to problems like the first widget problem above when the problems were presented in a difficult to read font (disfluent) than an easy to read one (fluent). In other words, when the experience of processing a task was fluent, there was less Type 2 engagement relative to conditions where the experience was less fluent.
Although these data offer support for the hypothesis that fluency, broadly construed, may act as a cue to Type 2 processing, they do not speak to the processes that monitor outputs on a trial by trial basis. That is, although answer fluency and processing fluency are sometimes treated as interchangeable constructs, (Alter and Oppenheimer, 2009, Briñol et al., 2006), they are, in fact, distinct. Specifically, for a given level of processing fluency, answer fluency and the accompanying FOR might vary substantially across items for the simple reason that some answers will be produced faster than others. We propose these sources of variance are linked, such that answer fluency and the accompanying FOR can explain why some items elicit more or less Type 2 thinking than others. In all four Experiments, we measured variation in both fluency and FOR judgments for each trial, using the latter measure to predict the probability and extent of Type 2 engagement. In Experiments 3 and 4 we manipulated variables that were predicted to affect the fluency of responses, and then compared FOR judgments across conditions.
The current paradigm also afforded us the opportunity to explore two other potential determinants of the FOR. In previous studies of deductive reasoning (used as paradigms in Experiments 1 and 2), we have noticed that reasoners accept a provided conclusion as valid more often than warranted by chance (or by other task-relevant features such as the validity or believability of the conclusion; Shynkaruk & Thompson, 2006). A possible explanation for this is that the decision to accept a conclusion is accompanied by a higher FOR than the decision to reject it; we will test this hypothesis in Experiments 1 and 2.
In addition, many reasoning theories give a special status to problems for which two or more answers come into conflict (e.g., Amsel et al., 2008, Ball et al., 2006, Evans, 2006, Sloman, 1996), as for example, when a conclusion is valid but not believable. The need to resolve the conflict is posited to cue additional Type 2 processing that would not be engaged in its absence (De Neys and Glumicic, 2008, Evans, 2007a). In Experiment 3, we will test the hypothesis that FOR’s are sensitive to conflicting inputs, such that the presence of conflict lowers the FOR.
Section snippets
Experiment 1
The goal of this experiment was twofold. The first was to test the hypotheses generated from the Metacognitive Reasoning Theory regarding the relationships between FOR and Type 2 thinking; the second was to test the relationship between FOR and two putative determinants, namely answer fluency and probability of conclusion acceptance. These hypotheses were tested using the conditional inference task described below, modified to our new two-response procedure. We were not interested in
Experiment 2
Experiment 2 was intended as a replication and extension of Experiment 1 using different types of items and a modified conditional reasoning task. There were two primary objectives. The first was to increase the complexity of the task and allow more scope for Type 2 processes to be recruited. Thus, in contrast to Experiment 1, which employed simple conditional statements with familiar content, we shifted to a three-term format with nonsense terms used to link the premises, e.g.,
If something is
Experiment 3
The preceding two Experiments provided evidence to indicate that answer fluency mediates FOR, such that the more fluently an answer can be retrieved, the stronger the FOR that accompanies it. The procedure used allowed us to compare, within a block of trials, the consequences of relative fluency for FOR, and in turn, how the FOR impacts Type 2 thinking. An alternative test of the hypothesis would be to manipulate characteristics of the answers believed to affect FOR judgments and to then
Experiment 4
Whereas in the previous study, we sought to manipulate FOR judgments by manipulating the congruence of base rates and stereotypes, in the current study we did so by manipulating the fluency with which the initial answer could be produced. Participants in this experiment were asked to solve quantified syllogisms by indicating whether or not the conclusion provided followed validly from the premises, e.g.,
Some of the nurses are magicians.
All of the winemakers are nurses.
Therefore, some of the
General discussion
Until recently, the issue of monitoring and control processes in reasoning has received little attention. Indeed, the need to include such a mechanism into models of reasoning has only recently been acknowledged (De Neys and Glumicic, 2008, Evans, 2009, Thompson, 2009, Thompson, 2010). In the current paper, we adapted a paradigm from the metamemory literature (Koriat & Goldsmith’s the quantity-accuracy profile) that allowed us to undertake a detailed analysis of monitoring in the context of
Acknowledgments
This research was funded by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada awarded to the first author, by a Post Graduate Scholarship from the same funding agency awarded to the second author, and by an Undergraduate Summer Research Award from the same agency to the third author. The authors would like to thank Jamie Campbell, Shira Elqayam, and Jonathan Evans for useful comments and feedback on an earlier draft of this manuscript.
References (125)
- et al.
A dual-process account of the development of scientific reasoning: The nature and development of metacognitive intercession skills
Cognitive Development
(2008) - et al.
Metacognitive control of the spacing of study repetitions
Journal of Memory and Language
(2006) - et al.
Strategies in syllogistic reasoning
Cognitive Science
(1999) - et al.
What you don’t know: The role played by errors of omission in imperfect self-assessments
Journal of Experimental Social Psychology
(2005) - et al.
The probability heuristics model of syllogistic reasoning
Cognitive Psychology
(1999) - et al.
Conflict monitoring in dual process theories of thinking
Cognition
(2008) - et al.
Why the unskilled are unaware: Further explorations of (absent) self-insight among the incompetent
Organizational Behavior and Human Decision Processes
(2008) - et al.
Adult egocentrism: Subjective experience versus analytic bases for judgment
Journal of Memory and Language
(1996) - et al.
Remembering mistaken for knowing: Ease of retrieval as a basis for confidence in answers to general knowledge questions
Journal of Memory and Language
(1993) - et al.
The effects of encoding fluency and retrieval fluency on judgments of learning
Journal of Memory and Language
(2005)
Conditional reasoning
Journal of Verbal Learning & Verbal Behavior
Empirical tests of a fast and frugal heuristic: Not everyone “takes-the- best”
Organizational Behavior and Human Decision Processes
The source of belief bias effects in syllogistic reasoning
Cognition
Effects of perceptual fluency on judgments of truth
Consciousness and Cognition
Metacognitive experiences in consumer judgment and decision making
Journal of Consumer Psychology
Uniting the tribes of fluency to form a metacognitive nation
Personality and Social Psychology Review
Overcoming intuition: Metacognitive difficulty activates analytic reasoning
Journal of Experimental Psychology: General
Effects of belief and logic on syllogistic reasoning: Eye-movement evidence for selective processing models
Experimental Psychology
The mismeasure of memory: When retrieval fluency is misleading as a metamnemonic index
Journal of Experimental Psychology: General
The malleable meaning of subjective ease
Psychological Science
Accounts of the confidence-accuracy relation in recognition memory
Psychonomic Bulletin & Review
Metacognition in strategy selection
Confidence level and feeling of knowing in question answering: The weight of inferential processes
Journal of Experiential Psychology: Learning, Memory, and Cognition
Conditional reasoning and causation
Memory and Cognition
Developmental and individual differences in conditional reasoning: Effects of logic instructions and alternative antecedents
Child Development
Dual processing in reasoning: Two systems but one reasoner
Psychological Science
Automatic-heuristic and executive-analytic processing during reasoning: Chronometric and dual-task considerations
The Quarterly Journal of Experimental Psychology
Inference suppression and semantic memory retrieval: Every counterexample counts
Memory & Cognition
Working memory and everyday conditional reasoning: Retrieval and inhibition of stored counterexamples
Thinking & Reasoning
Working memory and counterexample retrieval for causal conditionals
Thinking & Reasoning
Working memory capacity and a notorious brain teaser: The case of the Monty Hall dilemma
Experimental Psychology
Handbook of metamemory and memory
Why people fail to recognize their own incompetence
Current Directions in Psychological Science
Deciding before you think: Relevance and reasoning in the selection task
British Journal of Psychology
The heuristic–analytic theory of reasoning: Extension and evaluation
Psychonomic Bulletin and Review
On the resolution of conflict in dual process theories of reasoning
Thinking & Reasoning
Hypothetical thinking: Dual processes in reasoning and judgment
How many dual process theories do we need: One, two, or many?
Rapid responding increases belief bias: Evidence for the dual-process theory of reasoning
Thinking & Reasoning
Reasoning under time pressure: A study of causal conditional inference
Experimental Psychology
Reasoning about necessity and possibility: A test of the mental model theory of deduction
Journal of Experimental Psychology: Learning, Memory, & Cognition
Debiasing by instruction: The case of belief bias
European Journal of Cognitive Psychology
If
False recognition across meaning, language, and stimulus format: Conceptual relatedness and the feeling of familiarity
Memory and Cognition
The affect heuristic in judgments of risks and benefits
Journal of Behavioral Decision Making
Automated choice heuristics
Cognitive reflection and decision making
Journal of Economic Perspectives
Rationality for mortals: How people cope with uncertainty
Homo heuristicus: Why biased minds make better inferences
Topics in Cognitive Science
Cited by (476)
It feels, therefore it is: Associations between mind perception and mind ascription for social robots
2024, Computers in Human BehaviorBoosting debiasing: Impact of repeated training on reasoning
2024, Learning and InstructionTourism myths and the Dunning Kruger effect
2024, Annals of Tourism ResearchThe formation and revision of intuitions
2023, Cognition