The Issue
Imagine someone attempting to teach you an unfamiliar language. If they teach you the meaning and pronunciation of two words, you would probably find relative success. If they tried to teach you the meaning of each of the words in a lengthy sentence as your initial learning, however, you would likely feel overwhelmed and fail to learn any of the words contained in the expression. One might be tempted to characterize this illustration with the oversimplification of “less is more,” but we can more aptly consider this through the lens of Cognitive Load Theory.
Our brains first process information by holding onto and filtering information in our working memory (Ashman, 2023). For learning to occur, however, we must connect this information to organized structures of knowledge (known as schemas) in our long-term memory (Ashman, 2023). According to Cognitive Load Theory (CLT), the process of learning is thus limited by the cognitive load our working memory is able to successfully handle, and enhanced by practices that reinforce effortless retrieval of this knowledge from our long-term memory schemas (Ashman, 2023). The principles outlined by this pedagogical framework can shape not only what but how we as educators teach our students. In fact, Cognitive Load Theory has been touted as “the single most important thing for teachers to know” by renowned educationalist Dylan Wiliam (Ashman, 2023).
Proponents of Cognitive Load Theory claim that the empirical evidence that supports it far outweighs the theoretical bases that ground other competing frameworks (such as constructivism). However, most of the research that serves as the foundation for CLT is based not only on college-aged learners, but those who are attempting to learn how to solve complex mathematical problems.
Therefore, although CLT is based on what we currently know to be true regarding the limitations of working memory and how we consolidate knowledge into our long-term memory, there are still theoretical assumptions we must make in terms of how these ideas translate to teaching foundational reading skills.
The Research
Cognitive Load Theory purports that learning will always be subject to the constraints inherent to working memory. Our working memory can only process about 4 elements at a time (Cowan, 2001) or 6-7 if simply holding onto the string of elements (Miller, 1956), such as remembering a telephone number for the length of time needed to dial it. However, what is considered an “element” is determined by the schemas we have already created in our long-term memory (Ashman, 2023). For example, using a familiar local area code when dialing an otherwise unfamiliar number places almost no strain on one’s working memory because it is already stored as a well-learned chunk. But if an unfamiliar area code is needed, one must actively hold those three extra digits in their working memory, which makes the task more mentally demanding. This process is similar to motor procedures as well, such as knitting. An experienced knitter learning a new technique might only be concentrating on one additional or different step in her well-established routine, while a novice attempting the same sequence would find much more difficulty, having no familiarity with any of the hand positions or movements.
Because our working memory is limited in how much information it can process at the same time (Centre for Education Statistics and Evaluation, 2017), learning is slowed or even stopped altogether when working memory is overloaded (Centre for Education Statistics and Evaluation, 2017; Deans for Impact, 2015). And, because the load on working memory varies according to the schemas a learner has established, the potential for working memory overload will depend on the individual (Ashman, 2023). For example, writing the word ‘chief’ would be a simple, relatively effortless task for most secondary students and adults. If a young learner was trying to write this word down as part of a vocabulary lesson, however, they may find it effortful to segment out each phoneme, associate each phoneme with a grapheme, remember how to correctly form each letter, and choose between multiple spelling options for the long /ē/ sound. While the student may be able to correctly spell this word, CLT suggests that if this taxed their working memory, the student may not have also grasped the main learning objective (learning the meaning of the word).
Cognitive Load Theory sits within the broader fields of cognitive science, neuroscience, and other developmental and learning sciences. This growing multi-disciplinary field of research is often referred to as the Science of Learning. While a full discussion of the science of learning is beyond the scope of this article, research from the field has shown there are practical principles that help us strengthen our capacity for learning and retrieve this knowledge effortlessly.
When we can retrieve knowledge effortlessly, it's not as taxing to our working memory, which in turn reduces our cognitive load.
One such principle is spaced practice. This principle emphasizes how retrieving knowledge or practicing a skill repeatedly over time consolidates and reinforces schemas in long-term memory (Brown et al., 2014). This consolidation of learning ensures a reduced taxing on working memory when that knowledge or skill is later used, which subsequently allows for new learning to be layered on (Ashman, 2023). For example, if trying to teach young students that the letters <sh> represent the phoneme /sh/, teachers would find it most advantageous to practice this knowledge in relatively brief intermittent sessions across several days, rather than continuing to drill this concept in a longer activity within one day.
Similarly, another principle of the science of learning that is often included in conversations of CLT is known as interleaving (i.e., mixed practice) (Brown et al., 2014). A teacher would be practicing this principle if, while practicing the relatively new grapheme <ch> she displayed the previously learned <sh> as a bit of a curveball for students to review. Spaced practice and interleaving both support a student’s retention and retrieval of knowledge from their long-term memory (Brown et al., 2014), which subsequently aids learners in being able to use this knowledge without overly taxing their working memory (Ashman, 2023), such as the ‘chief’ example above.
Not unlike the broader ideas of science of learning, Cognitive Load Theory encompasses many principles, often known as effects that are again outside the scope of this article. The aspects of CLT that most strongly relate to teaching foundational reading skills are consolidated below.
Teaching Implications
- Teachers should introduce a limited amount of information at one time, in order to optimize learning. But the amount of information a student can learn is dependent on its complexity, as well as the individual learner’s prior knowledge (Ashman, 2023; Centre for Education Statistics and Evaluation, 2017).
- Learning a complex concept or process (e.g., writing a paragraph) is dependent on a learner's ability to quickly and easily retrieve knowledge (e.g., how to spell words and form letters) from long-term memory. Cognitive Load Theory thus compels the need for not only accuracy, but also automaticity with foundational literacy skills to support this effortless retrieval, whether reading or writing.
- When learning a new activity or routine, explicit instruction, modeling, scaffolds, and corrective feedback help make early learning most successful (Ashman, 2023; (Centre for Education Statistics and Evaluation, 2017; Weinstein et al., 2018) by helping reduce the cognitive load. Rather than focusing on learning the activity steps, students focus their working memory on the actual learning objectives.
- Extraneous, frivolous stimuli (such as animations on a PowerPoint slide) can distract students and tax their working memory, negatively impacting their learning (Ashman, 2023).
- Complex information can be simplified by presenting it both orally and visually (Centre for Education Statistics and Evaluation, 2017; Weinstein et al., 2018).
- Expecting students to identify patterns can tax their working memory (Ashman, 2023; Sweller et al., 1982), so teachers should plan to explicitly label these as necessary. For example, a teacher may find it beneficial to point out to students that the phoneme /oi/ is usually spelled <oi> at the beginning or middle of words, while it's usually spelled <oy> at the end of words, rather than expecting students to identify this pattern independently.
- When learning is optimally varied (in contrast to rote, predictable, and/or repetitive tasks), it leads to more durable and functional learning (Bjork & Bjork, 2009). For example, after introducing the graphemes <oi> and <oy>, a teacher may want to practice a list of words that include both graphemes for the students to choose the correct spelling pattern for, rather than a list of only words ending with <oy>.
- Solidifying prior learning encourages new learning by reducing the cognitive load on a student’s working memory. Teachers can facilitate this learning and long-term retrieval by:
- Expecting students to generate responses, rather than selecting the correct choice (Brown et al., 2014)
- Expecting students to practice retrieving the information repeatedly, with breaks in between (i.e., spaced practice) and intermittently reviewing previously learned information (i.e., interleaving) (Brown et al., 2014; Deans for Impact, 2015; Weinstein et al., 2018)
In Sum
Cognitive Load Theory provides a valuable framework for teachers in structuring the introduction of new material so that students are most likely to understand and retain it. When paired with the principles of spaced practice and interleaving, teachers can design opportunities to review previously learned material in ways that best support retrieval and long-term retention.
This consolidation of knowledge then allows students to acquire and integrate new knowledge more efficiently. It is important to note, however, that because of how individualized the complexity of an information or task is, due to the student’s own unique prior knowledge and experiences, we don’t have a reliable way of predicting or differentiating between when an assignment overtaxes a student to the point of impeding learning, and when it results in a desirable difficulty (McDaniel & Butler, 2011). For these reasons, Cognitive Load Theory should be seen not as a rigid prescription, but more as a guiding framework to help teachers make decisions about when and how to scaffold material and when and how to challenge students to foster durable and effective learning.
Nitty-gritty of some of the research:
Sweller & Cooper, 1985 (Experiment 3*)
Participants
- 22, 9th grade students
Study Design
- Students were split into two groups. One group worked through algebraic equations independently, and subsequently implemented trial-and-error procedures and unnecessary steps while attempting to solve them. This group was established to demonstrate the effects of learning algebraic solutions while students’ working memory was taxed. The second group, however, studied completed solutions of the same problems, rather than completing the problems themselves, allowing them to be more cognitively free to notice patterns and focus on the correct procedures.
Findings
- The group that had studied worked solutions made fewer errors and worked more efficiently than the group who solved the equations independently.
Notes/Limitations
- It’s important to note that neither of the groups in this experiment was provided any explicit instruction. Students in the problem-solving group were also not provided any feedback loop correcting any errors or inefficiencies in their work, so it is unclear to what extent this study demonstrates an ineffective learning procedure, rather than the impact of an overloaded working memory while trying to problem solve. It’s important to note that none of the students in experiment 3 were provided any explicit instruction. Students in the problem-solving group were also not provided any feedback to support them in correcting any errors or inefficiencies in their work. So, it is unclear to what extent this study demonstrates an ineffective learning procedure, rather than the impact of an overloaded working memory while trying to problem solve. Additionally, it would not be advisable to use this study as justification for omitting student practice in an instructional routine in favor of only teacher modeling, especially for younger students who may not demonstrate the attention and/or motivation that these participants demonstrated by studying the worked examples to learn the mathematical procedures.
*This study is often quoted as seminal research in the field of Cognitive Load Theory. Experiment 3 (described above) is one of several different, but related experiments contained within this study. These, along with other research on working memory, its capacity, and learning procedures, helped shape the overall idea of cognitive load, rather than one study alone.
Rohrer & Taylor, 2006
Participants
- 116 college students
Study Design
- Students were taught a previously unfamiliar math procedure through a tutorial. They were then given 10 practice questions, with solutions to each provided before the following question was given. Half of the students solved all ten problems in one day (an example of massed practice), while the other half did 5 problems initially, and then the other 5 after a week (an example of spaced practice). Half of all students took a test one week after this second session, while the remaining students were tested 4 weeks later.
Findings
- There was not a statistically significant difference between the two groups for the students who took the test after one week. However, when tested 4 weeks after practicing, the spaced practice group performed largely better.
Notes/Limitations
- Although similar studies have demonstrated the benefit of spaced practice in younger students, those studies largely centered around rote memorization (such as reciting times tables). In contrast, participants in this study had to both remember and implement a taught procedure, which may (or may not?) have a stronger alignment to the expectation of students remembering the sound for a particular grapheme, for example, and then using that phoneme when sounding out an unfamiliar word. While this study does show promise of the benefit of spaced practice, there remains a theoretical leap in aligning this with literacy skills. Furthermore, it does not answer the question of the optimal number of practice sessions (would three or four or more be better than two sessions?), or what the best duration in between practice sessions is.
Rohrer et al., 2014
Participants
- 126, 7th grade students
Study Design
- Students were taught two mathematical principles (finding the slope of a line; and graphing a linear equation) in two different ways. Half of the participants learned how to find the slope using blocked practice (12 consecutive practice problems) and how to graph an equation via interleaved instruction (4 practice problems) immediately after instruction, with the remaining 8 spread out across later assignments. For the other half of the students, the procedure and its aligned instructional method were reversed. For all assignments, teachers reviewed the correct strategy and solution the following day, and students were expected to correct their own errors. After the unit was completed, all students were given a review sheet with one graph problem and one slope problem along with other math problems. Students were then given an unannounced test either the next day or 30 days after the end of the unit and subsequent review.
Findings
- Regardless of which mathematical strategy students learned via interleaved practice, students performed better. When tested one day after the unit review, interleaved practice demonstrated a moderate benefit. When students were tested after 30 days, however, interleaved practice showed a large difference, suggesting this type of instruction leads to retained knowledge.
Notes/Limitations
- Due to the design of the study, students who had received blocked practice went a significant period of time never practicing this procedure again (at least, not in work assigned by the classroom teacher) until the single question provided in the review. Thus, the element of time may have contributed to the learning loss as much or more than the type of instructional practice. It is unknown if shortening this window (or widening the interval separating the interleaved practice from the test) would have lessened or negated the difference between the two methods. Similarly, it is unknown whether increasing the number of problems provided in the review would have changed the outcome. Of course, there is also a theoretical leap that one must take to assume that the instructional methods that help students learn and apply mathematical procedures also apply to other learning, such as literacy development.
Ashman, G. (2023). A little guide for teachers: Cognitive load theory. Corwin
Brown, P. C., Roediger, H. L. III, & McDaniel, M. A. (2014). Make it stick: The science of successful learning. The Belknap Press of Harvard University Press.
Centre for Education Statistics and Evaluation. (2017, September). Cognitive load theory: Research that teachers really need to understand. NSW Department of Education. https://www.cese.nsw.gov.au
Cowan N. (2001). The magical number 4 in short-term memory: a reconsideration of mental storage capacity. The Behavioral and brain sciences, 24(1), 87–185. https://doi.org/10.1017/s0140525x01003922
Deans for Impact (2015). The Science of Learning. Austin, TX: Deans for Impact.
McDaniel, M. A., & Butler, A. C. (2011). A contextual framework for understanding when difficulties are desirable. In A. S. Benjamin (Ed.), Successful remembering and successful forgetting: A festschrift in honor of Robert A. Bjork (pp. 175–198). Psychology Press.
Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81–97. https://doi.org/10.1037/h0043158
Rohrer, D., & Taylor, K. (2006). The effects of overlearning and distributed practice on the retention of mathematics knowledge. Applied Cognitive Psychology, 20, 1209–1224.
Rohrer, D., Dedrick, R. F., & Stershic, S. (2015). Interleaved practice improves mathematics learning. Journal of Educational Psychology, 107(3), 900–908. https://doi.org/10.1037/edu0000001
Sweller, J., & Cooper, G. A. (1985). The use of worked examples as a substitute for problem solving in learning algebra. Cognition and Instruction, 2(1), 59–89. https://doi.org/10.1207/s1532690xci0201_3
Sweller, J., Mawer, R. F., & Howe, W. (1982). Consequences of history-cued and means-end strategies in problem solving. American Journal of Psychology, 95, 455–483.
Weinstein, Y., Madan, C. R., & Sumeracki, M. A. (2018). Teaching the science of learning. Cognitive research: principles and implications, 3(1), 2. https://doi.org/10.1186/s41235-017-0087-y