## My basic issue with cognitive load theory

The idea of “working memory” — well established since the 1950s — is that the most objects someone can hold in their working memory is 7 plus or minus 2. There have been some revisions to the idea since (mainly that the size of the chunks matter; for instance, learners in languages that use less syllables for their numbers have an easier time memorizing number sequences).

This was extrapolated in the 1980s to educational theory via “cognitive load theory” by stating that the learner’s working memory capacity should not be exceeded; this tends to be used to justify “direct instruction” where the teacher lays out some example problems and the students repeat problems matching the examples. The theory here is by matching examples students suffer as little cognitive load as possible.

Cognitive load theory has some well-remarked problems with a lack of falsification and a lack of connection with modern brain science. These issues likely deserve their own posts.

My issue with cognitive load theory as applied to education is more basic: the contention that direct instruction requires less working memory than any discovery-based alternative. It certainly is asserted often

All problem-based searching makes heavy demands on working memory. Furthermore, that working memory load does not contribute to the accumulation of knowledge in long-term memory because while working memory is being used to search for problem solutions, it is not available and cannot be used to learn.

but the assertion does not match what I see in reality.

To illustrate, here’s a straightforward example — defining convex and concave polygons — done with three discovery-type lessons and direct instruction.

### Discovery Lesson #1

Click on the image below to use an interactive application. Use what you learn to write a working definition of “convex” and “concave”.

Then draw one example each of a convex polygon and a concave polygon. Justify why your pictures are correct.

### Discovery #2

The polygons on the left are convex; the polygons on the right are concave. Give a working definition for “convex” and “concave”.

Then draw one example each of a convex polygon and a concave polygon (not copying any of the figures above). Justify why your pictures are correct.

### Discovery #3

The polygons on the left are convex; the polygons on the right are concave. Try to decide looking at the picture the difference between the two.

…after discussion…

A convex polygon is a polygon with all interior angles less than 180º.
A concave polygon is a polygon with at least one interior angle greater than 180º. The polygons on the left are convex; the polygons on the right are concave.

Draw one example each of a convex polygon and a concave polygon (not copying any of the figures above). Justify why your pictures are correct.

### Direct Instruction

A convex polygon is a polygon with all interior angles less than 180º.
A concave polygon is a polygon with at least one interior angle greater than 180º. The polygons on the left are convex; the polygons on the right are concave.

Draw one example each of a convex polygon and a concave polygon (not copying any of the figures above). Justify why your pictures are correct.

### Analysis

Parsing and understanding technical words creates a demand on memory. The hardcore cognitive load theorist would claim such a demand is less than that of having the student create their own definition, but is that really the case? The student using their own words can rely on more comfortable and less technical vocabulary than the one reading the technical definition. The technical definition is easy to misunderstand and the intuitive visualization is only clear to a student if they have the subsequent examples.

Discovery #1 does not appear to have heavy cognitive load. On the contrary, being able to an immediately switch between “convex” and “concave” upon passing the 180º mark is much more tactile and intuitive than either of the other lessons. Parsing technical language creates more mental demands than simply moving a visual shape.

There might be a problem of a student in Discovery #1 or Discovery #2 coming up with an incorrect definition, but that’s why discovery is hard without a teacher present.

Discovery #3 is exactly identical to the direct lesson except the definition and examples are reversed places. Having a non-technical intuition built up before trying to parse the technical definition makes it easier to read; again it appears to have less cognitive demand.

### Overestimating and underestimating

One of the basic assumptions of cognitive load theorists seems to be that the mental demands of discovery are given all at once. Usually the demands involve some sort of scaffolding. For instance, in Discovery #3 the intuitive discussion of the pictures and then definition are NOT given at the same time. Only after students have settled on an idea of the difference between the shapes — essentially reducing down to one mental object — is the definition given, which as I already pointed out is easier to read for a student who now has some context.

On the other hand, cognitive load theorists seem to underestimate the demands of direct instruction. While exact entire sentences tend not to be parsed by the student in definitions (this would clearly fail the “only seven units” test) mathematical language routinely has dense and specific enough language that breaking any supposed limit is quite easy. Using the direct instruction example above, taking everything in on one go would require a.) parsing and accepting the new terms “convex” b.) same for “concave” c.) recalling definitions of “polygon” d.) same for “interior angles” e.) keeping in mind the visual of greater and less than 180º f.) keeping track of “at least one” meaning 1, 2, 3, or more and g.) parsing the connection between a-f and the examples given below.

There are obviously counters to some of these — the definitions for instance should be internalized to a degree they are easy to grab from long term memory — but the list doesn’t look that different from a “discovery” lesson, and doesn’t possess the advantage of reducing pressure on vocabulary and language.

### The overall concern

In truth, working memory is well-understood for memorizing digit sequences (called digit span) but the research gets fuzzy as processes start to include images and sounds. Any sort of declaration (including my own) that the working memory is busted by a particular task when the task involves mixed media is essentially arbitrary.

On top of that, the brain is associative to such an extent that memory feats are possible which appear to violate these conditions. For instance, there is a memory trick I used to perform for audiences where they would give me a list of 20 objects and I would repeat the list backwards. The trick works by pre-memorizing a list of 20 objects quite thoroughly — 1 for pencil, 2 for swan, say — and then associating the list with those objects. If the first object given was “yo-yo” I would imagine a yo-yo hanging off a pencil. The trick is quite doable by anyone and — given the fluency of the retrieval — suggests that association of images have a secondary status that exceeds that of standard “working memory”. (This is also how the competitors of the World Memory Championship operate, allowing them feats like memorizing 300 random words in 5 minutes.)

### 14 Responses

1. Thanks for this post. CLT seems true, but too atomic a theory to help much in practice. Fair?

• I’d say something like “the prefrontal cortex does more than just memory, so any theory that focuses just on that is incomplete” but “too atomic” works.

2. Jason, I’ll reply to your post more specifically later, but Dan suggested Iink my comments to his recent CLT twitter discussion to your post, so here goes. My thoughts are over at my blogg.

3. This is a great articulate of the classroom issues with CLT. Here’s my approach:

I’ve been thinking a lot recently about what makes students effective mathematical thinkers, and I’m coming more and more firmly to the conclusion that mathematical thinking is all about the connections between mathematical ideas — students need to build models in their mind of mathematical ideas, and if they are constantly looking for the similarities and differences between mathematical ideas, they build the capacity to transfer knowledge and solve new problems.

Based on that premise, learning has less to do with the cognitive load on working memory — learning has to do with connecting what is in working memory with long-term memory so that it “sticks”, and adds to a model that can be used to solve similar classes of problems.

From that perspective, students exploring the applet on their own seems to do a better job activating prior knowledge about “polygon” than the definition shown — in particular for lower-skilled students who have fewer connections between the concrete (pictures) and the abstract (words). I particularly like the idea of having students decide what makes one class convex/concave — they will naturally be considering a number of features of the polygons, and, with the right guidance, making connections between the features they understand and a new feature of a polygon. There’s a wealth of research showing that the differences between novice and expert problem solvers, in all fields, are more about the ways experts attend to the critical features of a problem, while novices focus on irrelevant, surface features. This activity provides an opportunity for students to test-drive that process, and get feedback by immediately testing their theory against the examples given. We can debate about what exactly that looks like all day, but from my perspective, we’re using students’ working memory in service of learning — by using it to connect prior knowledge with a new problem.

I’ve been laser-focused on this idea of mathematical expertise as connections between mathematical ideas, and I’m curious what you think of this perspective. Would you agree with a modified conclusion from CLT:

Working memory load during mathematical learning should be used by students to make connections between a problem situation and prior knowledge.

What am I missing? Or is CLT just too theoretical for classroom application?

• Working memory is a at-the-moment feature of the mind. I wouldn’t say this is irrelevant for classroom use, although what specifically constitutes a chunk when there’s a lot more going on than just “numbers” or “syllables” is unclear.

The associativity you speak of is more long term (although sometimes more of a medium term). As I imply at the end of this post, the relationship between the two is not straightforward.

So while I think it’s useful to look at working memory, it’s a smaller part of a larger picture. Here’s another example: it is possible to cogitate about things without it passing through the conscious mind. For instance, experiments have been done with people who have lost the ability to process one side of their visual field; they are shown something, swear they have seen nothing, but are later able to identify what they have seen. Also: people who think about larger numbers look up and to the right while people who think about smaller numbers look down and to the left without necessarily being aware they are even doing this.

4. Excellent summary. However, why does the phrase “more than one” here mean “1, 2, 3 or more”? I would use the phrase “at least one”.

5. I’m surprised no one has attempted to incorporate conceptual information learned during discovery *into* direct instruction. For example, Discovery Lesson 1 is:

Discovery Lesson #1
Click on the image below to use an interactive application. Use what you learn to write a working definition of “convex” and “concave”.

The student’s journey to the working definition will expose them to conceptual information missing from the Direct Instruction lesson:

Direct Instruction
A convex polygon is a polygon with all interior angles less than 180º.
A concave polygon is a polygon with at least one interior angle greater than 180º. The polygons on the left are convex; the polygons on the right are concave.

So, the DI lesson could potentially be improved by having students read a narrative of how a student thought through the task and arrived at the working definition. This will expose the DI students to misconceptions and other information usually absent from DI content. This is similar to what we experience when we read Socratic diagloues (e.g. Plato’s Meno).

Interestingly, I don’t read much how this kind of DI on blog or in research journals…

• Well, discovery #3 is essentially what you did (except it starts with the in-class group discussion variant rather than the computer variant) and it matches pretty closely to our school’s Carnegie Learning curriculum.

Hardcore DI folks would not tolerate such a compromise. It’s a philosophical standpoint as much as a pedagogical one. This is odd since it creates a “DI vs. everything else” binary when there’s a vast amount of options. It’s silly to lump what you propose with, say, (to take the most extreme possible) unschooling, where students do absolutely everything on their own volition.

6. Yep. I’m more of a discovery person myself. Direct instruction and I don’t get along well. In the classroom, to teach convex vs. concave, I’d follow a similar discovery lesson like yours prior to discussing any upfront details.

7. Firstly, the cute little program has a third option, self intersecting polygon, which is neither convex nor concave.
Secondly, the CCSSM document does not contain the words convex or concave anywhere.
Thirdly, I would hardly call it a discovery approach as presented. better to mix up a bunch of polygons and let the kids come up with various ways of classifying them. With luck they will hit on the convexity property, might need a bit of prodding, and only then is it sensible to introduce the jargon.
Fourthly, the most interesting difference between convex and concave is what happens when you draw a few straight lines through the polygons. The angle property is only valid where angles have measures.
Fifthly, the interest in convex polygons arose when linear programming was thought to be an important topic in school math (about 40 years ago). Same for linear inequalities. then linear programming got booted out.

• Good points here.

In an earlier draft I had a “start from scratch and draw some polygons and then talk about them” discovery lesson, but it covered more than just convex/concave and was distracting from the main point.

You do have an interesting comment about the others not truly being discovery. The DI folks I’ve read would disagree. I’m not sure there’s good terminology to cover everything on the spectrum, because the terms seem to be more philosophical standpoints rather than point-by-point elucidations of what the lessons should or should not contain. I would say both abbreviated and elongated discovery have their place.

I agree the “cave” aspect of concave is really the interesting thing. This would undoubtedly show up in a full lesson somewhere.

when linear programming was thought to be an important topic in school math (about 40 years ago)

I am completely fascinated by how the “standard curriculum” has evolved. It certainly can have an influence on debate today (especially when people try to compare eras — high school algebra from the 1950s is not what we’d call high school algebra).

Not everyone is 100% detached from linear programming for high school — see for instance question #9 from the SAGE sample exam, even if it isn’t common core compliant.

• I went to the sage sample, and remembered seeing it originally. it really is an atrocious question. No context, so no objective !
Thanks for your observations on my comment. 1950’s algebra did seem more to the point than the current approach.

If you want something to get your teeth into try this:

Click to access wwc_algebra_040715.pdf

I nearly cried.