Follow Through, the largest US government educational experiment ever

I. Unconditional war

On January 8, 1964, President Lyndon B. Johnson declared an “unconditional war on poverty in America.”

The effort spawned the creation of Medicare, Medicaid, food stamps, the Economic Opportunity Act of 1964, and the Elementary and Secondary Education Act. (The ESEA has been renewed every 5 years since; when renewed in 2001, it went by the moniker “No Child Left Behind”.)

1965 also saw the launch of the Head Start Program, designed to provide early childhood education to low income children while involving the parents.

The program was designed to promote the growth and development of parents and their children. The Planning Committee for Head Start felt that children would benefit from their parents’ direct involvement in the program. They agreed that the best way for parents to learn about child development was by participating with their children in the daily activities of the program.
Sarah Merrill

At the time, parental involvement was controversial:

Although parent involvement was written into law in 1967, their role in governance was spelled out for the first time in 1970 through Part B in the Head Start Policy Manual. This policy was also known as 70.2. Policy 70.2 defined the responsibilities of Policy Councils at the program, delegate, and agency levels. At that time, many Head Start grantees—especially those in public school settings—called Washington, DC and threatened to leave Head Start because 70.2 gave so much authority to parents.
Sarah Merrill

This point is important for what’s to come.

II. 352,000

In 1967, Congress authorized funds to expand Head Start under a program called Follow Through.

Congress authorized Follow Through in 1967 under an amendment to the Economic Opportunity Act to provide comprehensive health, social, and educational services for poor children in primary grades who had experienced Head Start or an equivalent preschool program. The enabling legislation anticipated a large-scale service program, but appropriations did not match this vision. Accordingly, soon after its creation, Follow Through became a socio-educational experiment, employing educational innovators to act as sponsors of their own intervention programs in different school districts throughout the United States. This concept of different educational improvement models being tried in various situations was called “planned variation.”
Interim Evaluation of the National Follow Through Program, page 22

In other words, Congress approved a service program which had to be cut down due to lack of funds to an experimental program.

Various sponsors — 22 in all — picked particular models that would be used for a K-3 curriculum (although it should be noted that due to the social service origin not every sponsor had a curriculum right away — more on that later). Four cohorts (the first group entering in fall 1969, the last fall 1972) went through the program before it was phased out, the last being very scaled down:

cohortdata

The sponsors had classrooms spread throughout the entire country implementing curriculum as they saw fit.

mapsites

[Source.]

Note that this was not a case of them putting in their colleagues; teachers were chosen from sites and given their curriculum via trainers or handbooks. Other teachers taught “comparison groups” not using the interventions that were chosen to be as similar as possible to the experimental groups. The idea was to see if students using the sponsor’s curriculum would outperform the comparison groups.

Teachers were not always happy being forced to participate:

New Follow Through teachers sometimes resisted changing their teaching strategies to fit the Follow Through models, and they found support for their resistance among others who felt as powerless and as buffeted as they.
Follow Through Program 1975-1976, page 31

A wide swath of measures was chosen to assess quality.

testlist

[Source.]

Notice the number that says “Sponsor” — those are questions submitted by the sponsors, knowing the possibility of a mismatch between the curriculum learned and the curriculum tested.

Not all of the data above was used in the final data analysis. By the end of the experiment the main academic measure was the Metropolitan Achievement Test Form F Fourth Edition. Note the only minor use of the MAT in the chart above representing the early years (noted by SAT/MAT — MAT and Stanford word problems were mixed together). Sponsor questions, for instance, dropped by the wayside.

In 1976 the program ended and the program as a whole was analyzed — 352,000 Follow through and comparison children — resulting in a report from 1977 called Education as Experimentation: A Planned Variation Model.

The best summary of the results comes from three charts, which I present directly from the book itself. The dots are the averages, the bars represent maximums and minimums:

basicskillseffect

cognitiveskillseffect

affectiveskills

“Basic skills” represents straightforward reading and arithmetic, “cognitive skills” represent complex problem solving, and “affective skills” dealing with feelings and emotional areas.

The report makes some attempt to combine the data, but the different programs are so wildly dissimilar I don’t see any validity to the attempt. I’d first like to focus on five of them: SEDL, Parent Education, Mathemagenic Activities, Direct Instruction, and Behavior Analysis.

III. SEDL

The Southwest Educational Development Laboratory (SEDL) model is a bilingual approach first developed for classrooms in which 75 percent of the pupils are Spanish-speaking, but it can be adapted by local school staffs for other population mixes. In all cases the model emphasizes language as the main tool for dealing with environment, expressing feelings, and acquiring skills, including nonlinguistic skills. Pride in cultural background, facility and literacy in both the native language and English, and a high frequency of “success” experiences are all central objectives.
Follow Through Program Sponsors, page 31

The SEDL is a good example to show how difficult it is to compare the sponsors; in this case rather than forming a complete curriculum, the emphasis of SEDL is on helping Spanish speakers with a sensitive and multicultural approach. The goals were not to gain basic skills in arithmetic, oral skills are emphasized over written, and based on the target sample there was a higher difficulty set on improving reading.

Given these factors, the result (smaller effect on basic skills, larger effect on cognitive and affective development) seems to be not surprising at all.

IV. Parent Education

This sponsor perhaps makes it clearest that Follow Through started as a social service program, not an education program.

A fundamental principle of the BOPTA model is that parents and school personnel can, and want to, increase their ability to help their children learn. Also, parents and school personnel together can be more effective than either can alone. The sponsor’s goal is to assist both school and home to develop better child helping skills and ways to implement these skills cooperatively and systematically. These child helping skills are derived from careful study of child development, learning, and instructional theory, research, and practice. The, approach is systematically eclectic and features both diagnostic sequential instruction and child-initiated discovery learning.
Follow Through Program Sponsors, page 37

The results from the data for this program were roughly around average; basic skills did slightly better than cognitive skills. However, the idea of including home visits training makes a much different set of variables than just training the teacher.

Related but even more dissimilar was the Home School Partnership:

A parent aide program, an adult education program, and a cultural and extra-curricular program are the principal elements of this model. The model aims to change early childhood education by changing parent, teacher, administrator, and child attitudes toward their roles in the education process. It is believed this can be done by motivating the home and school to work as equal partners in creating an environment that supports and encourages learning.
Follow Through Program Sponsors, page 25

This is a program that had no educational component at all — it was comparing parent intervention versus no parent intervention, which led to confusion:

The instructional component of this program is in disarray. Since there is no in-class instructional model, teachers are on their own. Some are good, but in too many classes bored children and punitive teachers were observed.
Follow Through Program 1975-1976, page 66

Note that in both cases, however, as mentioned earlier: the idea of home parental involvement was innovative and controversial enough on its own it created a burden the other projects did not have. (To be fair, teachers as in-class aides occur in the other programs.)

V. Mathemagenic Activities

This sponsor ran what most people would consider closest to a modern “discovery” curriculum.

The MAP model emphasizes a scientific approach to learning based on teaching the child to make a coherent interpretation of reality. It adheres to the Piagetian perspective that cognitive and affective development are products of interactions between the child and the environment. It is not sufficient that the child merely copy his environment; he must be allowed to make his own interpretations in terms of his own level of development.

An activity-based curriculum is essential to this model since it postulates active manipulation, and interaction with the environment as the basis for learning. Individual and group tasks are structured to allow each child to involve himself in them at physical and social as well as intellectual levels of his being. Concrete materials are presented in a manner that permits him to experiment and discover problem solutions in a variety of ways.

The classroom is arranged to allow several groups of children to be engaged simultaneously in similar or different activities. Teachers’ manuals including both recommended teaching procedure and detailed lesson plans for eight curriculum areas (K-3) are provided in the model. Learning materials also include educational games children can use without supervision in small groups or by themselves. Art, music, and physical education are considered mathemagenic activities of equal importance to language, mathematics, science, and social studies.

Follow Through Program Sponsors, page 33

MAP did the best of all the sponsors at cognitive skills and merely over baseline on basic skills.

The term “mathemagenic” was 60s/70s term that seems not to be in use any more. A little more detail from here about the word:

In the mid-1960’s, Rothkopf (1965, 1966), investigating the effects of questions placed into text passages, coined the term mathemagenic, meaning “to give birth to learning.” His intention was to highlight the fact that it is something that learners do in processing (thinking about) learning material that causes learning and long-term retention of the learning material.

When learners are faced with learning materials, their attention to that learning material deteriorates with time. However, as Rothkopf (1982) illustrated, when the learning material is interspersed with questions on the material (even without answers), learners can maintain their attention at a relatively high level for long periods of time. The interspersed questions prompt learners to process the material in a manner that is more likely to give birth to learning.

There’s probably going to be interest in this sponsor due to the obscurity and actual performance, but I don’t have a lot of specific details other than what I’ve quoted above because. It’s likely the teacher manual that was used during Follow Through is buried in a university library somewhere.

VI. Direct Instruction

This one’s worth a bigger quote:

directinstructdescript

[Source; quotes below from the same source or here.]

This one’s often considered “the winner”, with positive outcomes on all three tests (although they did not get the top score on cognitive skills, at least it improved over baseline).

What I find perhaps most interesting is that the model does not resemble what many think of as direct instruction today.

Desired behaviors are systematically reinforced by praise and pleasurable activities, and unproductive or antisocial behavior is ignored.

The “carrot rather than stick” approach reads like what currently labeled “progressive”. The extremely consistent control is currently labeled “conservative”.

In the classroom there are three adults for every 25 to 30 children: a regular teacher and two full-time aides recruited from the Follow Through parent community. Working very closely with a group of 5 or 6 pupils at a time, each teacher and aide employs the programmed materials in combination with frequent and persistent reinforcing responses, applying remedial measures where necessary and proceeding only when the success of each child with a given instructional unit is demonstrated.

The ratio here is not 1 teacher lecturing to 30 students. It is 1 to 5.

Emphasis is placed on learning the general case, i.e., developing intelligent behavior, rather than on rote behavior.

While the teacher explains first, the teacher is not giving mute examples. They are trying to make a coherent picture of mathematics.

Before presenting the remaining addition facts, the teacher shows how the facts fit together–that they are not an unrelated set of statements. Analogies teach that sets of numbers follow rules. Fact derivation is a method for figuring out an unknown fact working from a known fact. You don’t know what 2+5 equals, but you know that 2+2 equals 4; so you count.

2 + 2 = 4
2 + 3 = 5
2 + 4 = 6
2 + 5 = 7

Then the children are taught a few facts each day so that the facts are memorized.

This is the “counting on” mentioned explicitly in (for example) Common Core and possibly the source of the most contention in all Common Core debates. This differs from those who self-identify with “direct instruction” but insist on rote-first.

Also of note: employees included

a continuous progress tester to reach 150 to 200 children whose job it is to test the children on a 6 week cycle in each core area.

Assessment happened quite frequently; it is not surprising, then, that students would do well on a standardized test compared with others when they were very used to the format.

VII. Behavior Analysis

The behavior analysis model is based on the experimental analysis of behavior, which uses a token exchange system to provide precise, positive reinforcement of desired behavior. The tokens provide an immediate reward to the child for successfully completing a learning task. He can later exchange these tokens for an activity he particularly values, such as playing with blocks or listening to stories. Initial emphasis in the behavioral analysis classroom is on developing social and classroom skills, followed by increasing emphasis on the core subjects of reading, mathematics, and handwriting. The goal is to achieve a standard but still flexible pattern of instruction and learning that is both rapid and pleasurable.

In the behavior analysis classroom, four adults work together as an instructional team. This includes a teacher who leads the team and assumes responsibility for the reading program, a full-time aide who concentrates on small group math instruction, and two project parent aides who attend to spelling, handwriting, and individual tutoring.

Follow Through Program Sponsors, page 9

I bring up this model specifically because

a.) It often gets lumped with Direct Instruction (including in the original chart you’ll notice above), but links academic progress with play in a way generally not associated with direct instruction (the modern version would be the Preferred Activity Time of Fred Jones, but that’s linked more to classroom management than academic achievement).

b.) It didn’t do very well — second to last in cognitive achievement, barely above baseline on basic skills — but I’ve seen charts claiming it had high performance. This is even given it appears to have included assessment as relentlessly as Direct Instruction.

c.) It demonstrates (4 to a class!) how the model does not resemble a standard classroom. This is true for the models that involve a lot of teacher involvement, and in fact none of them seem comparable to a modern classroom (except perhaps Bank Street, which is a model that started in 1916 and is still in use; I’ll get to that model last).

Let’s add a giant grain of salt to the proceedings —

VIII. Data issues

There was some back-and-forth criticizing the statistical methods when Education as Experimentation: A Planned Variation Model was published in 1977. Quite a few papers were written 1978 and 1981 or so, and a good summary of the critiques are at this article which claims a.) models were combined that were inappropriate to combine (I agree with that, but I’m not even considering the combined data) b.) Questionable statistics were used (particularly getting fussy about reliance on analysis of covariance) and c.) the test favored particular specific learnings (so if a class was strong in, say, handwriting, that was not accounted for).

I think the harshest of data critique came before the 1977 report even came out. The Comptroller General of the U.S. made a report to Congress in October 1975 and was blistering:

reliability

The “data analysis contractor” mentioned presenting reservations are the same Abt Publications that came out with the 1977 report.

The report also mentions part of the reason why 22 sponsors are not given in the comparison graph:

Another result of most LEAs not being restricted in their choice of approaches is that some sponsors were associated with only a few projects. The evaluation design for cohort three–the one OE plans to rely most heavily on to determine model effectiveness–requires that a sponsor be working with at least five projects where adequate testing had been done to be compared with other sponsors.

Only 7 of the 22 sponsors met that requirement.

By the end, some sponsors were omitted from the 1977 report altogether. The contractor was also dubious about analysis of covariance:

In an effort to adjust for the initial differences, the data analysis contractor used a statistical technique known as the analysis of covariance … however, the contractor reported that the Follow Through data failed to meet some requirements believed necessary for this technique to be an effective adjustment device.

Additionally:

Further, no known statistical technique can fully compensate for initial differences on such items as pretest scores and socioeconomic characteristics. Accordingly, as OE states in its June 1974 summary, “the basis for determining the effects of various Follow Through models is not perfect.” Our review of the March 1974 report indicated that, for at least four sponsors, the adjustments were rather extensive. Included among the four is the only sponsor that produced significant differences on all four academic measures and the only two sponsors that produced any academic results significantly below their non-Follow-Through counterparts.

This issue was noted as early as 1973, calling out the High Scope, Direct Instruction, and the Behavior Analysis models specifically.

Substantial analysis problems were encountered with these project data due to non-equivalence of treatment and comparison groups.

Interim Evaluation of the National Follow Through Program 1969-1971

(High Scope was one of the models on the “open framework” end of the scale; students experience objects rather than get taught lessons.)

The extreme data issues with Follow Through may be part of the reason why quasi-experiments are more popular now (taking some natural comparison between equivalent schools and adjusting for all the factors via statistics). When the National Mathematics Advisory Panel tried to locate randomized controlled studies, their report in 2008 only found 8 that matched their criteria, and most of those studies only lasted a few days (the longest lasted a few weeks).

IX. Conclusions

These days Follow Through is mostly brought up by those supporting direct instruction. While the Direct Instruction model did do well,

a.) The “Direct Instruction” does not resemble the direct instruction of today. The “I do” – “now you do” pattern is certainly there, but it occurs in small groups and with general ideas presented up front like counting on and algebraic identities. “General rather than rote” is an explicit goal of the curriculum. The original set up of a teacher only handling five students at a time with two aides is not a comparable environment to the modern classroom.

b.) The group that made the final report complained about the inadequacy of the data. They had misgivings about the very statistical method they used. The Comptroller of the United States in charge of auditing finances felt that the entire project was a disaster.

c.) Because the project was shifted from a social service project to an experimental project, not all the sponsors were able to handle a full educational program. At least one of the sponsors had no in-class curriculum at all and merely experimented with parental intervention. The University of Oregon frankly ran their program very efficiently and had no such issue; this lends to a comparison of perhaps administrative competence but not necessarily curricular outlook. For instance, the U of O’s interim report from 1973 noted that arithmetic skills were no better than average in the early cohorts, so adjusted their curriculum accordingly.

arithmeticcheck

d.) While Direct Instruction did best in basic skills, on the cognitive measures the model that did best was a discovery-related. Based on descriptions of all the models Mathemagenic perhaps the closest to what a modern teacher thinks of as an inquiry curriculum.

e.) Testing was relentless enough in Direct Instruction they had an employee specifically dedicated to that task, while some models (like Bank Street) did no formal testing at all during the year.

Of the two other models noted in the report as being in the same type as Direct Instruction, Behavior Analysis did not do well academically at all and the Southwest Educational Development Laboratory emphasis on language and “pride in cultural background” strikes a very different attitude than the controlled environment of Direct Instruction’s behaviorism.

X. A Lament from Bank Street

Before leaving, let’s hear from one of the groups that did not perform so well, but was (according to reports) well managed: Bank Street.

In this model academic skills are acquired within a broad context of planned activities that provide appropriate ways of expressing and organizing children’s interests in the themes of home and school, and gradually extend these interests to the larger community. The classroom is organized into work areas fine; with stimulating materials that allow a wide variety of motor and sensory experiences, as well as opportunities for independent investigation in cognitive areas and for interpreting experience through creative media such as dramatic play, music, and art. Teachers and paraprofessionals working as a team surround the chidren with language that they learn as a useful, pleasurable tool. Math, too,is highly functional and pervades the curriculum. The focus is on tasks that are satisfying in terms of the child’s own goals and productive for his cognitive and affective development.

Follow Through Program Sponsors, page 7

Bank Street is still around and has been for nearly 100 years. While their own performance tests came out positive, they did not do well on any of the measures from Abt’s 1977 report.

In 1981, one of the directors wrote:

The concepts of education we hold today are but variations of the fundamental questions that have been before is since the origins of consciousness. Socrates understood education as “discourse”, a guidepost in the search for wisdom. He valued inquiry and intuition. In contrast, Plato conceived of the State as the repository of wisdom and the overseer of all human affairs, including education. He was the first manager. As so has it always evolved: Dionysian or Apollonian, romanticism or classicism, humanism or behaviorism. All such concepts are aspects of one another. They contribute to evolutionary balance. They allow for alternative resolutions to the same dilemmas and they foster evolutionary change. Thus, a model is not a fixed reality immobilized in time. It is, as described above, a system, an opportunity to structure and investigate a particular modality, to be influenced by it and to change it by entering into its methods. The Bank Street model does not exist as a child-centered, humanistic, experientially-based approach standing clearly in opposition to teacher-centered, behaviorist modalities. These polarities serve more to define the perceived problem than they do to describe themselves.

Follow Through: Illusion and Paradox in Educational Experimentation

Direct instruction and the PCAP 2010 (Math Wars, continued)

So based on my last post about Canada’s math wars I had a number of people stop to comment about direct instruction in general, including Robert Craigen who kindly linked to the PCAP (Pan-Canadian Assessment Program).

(Note: I am not a Canadian. I have tried my best based on the public data, but I may be missing things. Corrections are appreciated.)

For PCAP 2010, close to 32,000 Grade 8 students from 1,600 schools across the country were tested. Math was the major focus of the assessment. Math performance levels were developed in consultation with independent experts in education and assessment, and align broadly with internationally accepted practice. Science and reading were also assessed.

The PCAP assessment is not tied to the curriculum of a particular province or territory but is instead a fair measurement of students’ abilities to use their learning skills to solve real-life situations. It measures learning outcomes; it does not attempt to assess approaches to learning.

Despite the purpose to “solve real-life situations” the samples read to me more like a calculation based test (like the TIMSS) rather than a problem solving test (like the PISA) although it is arguably somewhere in the middle. (More about this difference in one of my previous posts.)

pcapsamples

Despite the quote that “it does not attempt to assess approaches to learning”, the data analysis includes this graph:

directinstructgraph

Classrooms that used direct instruction achieved higher scores than those who did not.

One catch of note (although this is more of a general rule of thumb than an absolute):

Teachers at public schools with less poverty are more likely to use a direct instruction curriculum than those who teach at high-poverty schools, even if they are given some kind of mandate.

This happened in my own district, where we recently adopted a discovery-based textbook. There was major backlash at the school with the least poverty. This seemed to happen (based on my conversations with the department head there) because the parents are very involved and conservative about instruction, and there’s understandably less desire amongst the teachers to mess with something that appears to work just fine. Whereas with schools having more students of poverty, teachers who crave improvement are more willing to experiment.

While the PCAP data does not itemize data by individual school, there are a two proxies that are usable to assess level of poverty:

canadachart

Lots of books in the home is positively correlated to high achievement on the PCAP (and in fact is the largest positive factor related to demographics) but also positively correlated to the use of direct instruction.

Language learners are negatively correlated to achievement on the PCAP (moreso than any other factor in the entire study) but also negatively correlated to an extreme degree in the use of direct instruction.

It thus looks like there’s at least some influence of a “more poverty means less achievement” gap creating the positive correlation with direct instruction.

Now, the report still claims the instruction type is independently correlated with gains or losses (so that while the data above is a real effect, it doesn’t account for everything). However, there’s one other highly fishy thing about the chart above that makes me wonder if the data was accurately gathered at all: the first line.

It’s cryptic, but essentially: males were given direct instruction to a much higher degree than females.

Unless there’s a lot more gender segregation in Canada than I suspected, this is deeply weird data. I originally thought the use of direct instruction must have been assessed via the teacher survey:

pcapsurvey

But it appears the data instead used (or least included) how much direct instruction the students self-reported:

canadastudentsurvey

The correlation of 10.67 really ought to be close to 0; this indicates a wide error in data gathering. Hence, I’m wary of making any conclusion at all of the relative strength of different teaching styles on the basis of this report.

Robert also mentioned Project Follow Through, which is a much larger study and is going to take me a while to get through; if anyone happens to have studies (pro or con) they’d like to link to in the comments it’d be appreciated. I honestly have no disposition for the data to go one way or the other; I do believe it quite possible a rigid “teaching to the test” direct instruction assault (which is what two of the groups in the study seemed to go for) will always beat another approach with a less monolithic focus.

Canada’s math wars and bad use of the PISA

Canada went through a bit of a panic recently when the PISA 2012 scores came out.

canadascores

[Source.]

Oh no! Scores are dropping! There must be something done wrong, so it’s time to change policy:

“If you look at what’s been happening, predominantly over the last decade, there’s been an unprecedented emphasis on discovery learning,” said Donna Kotsopoulos, an associate professor in Wilfrid Laurier University’s education faculty and former teacher.

Robert Craigen, a University of Manitoba mathematics professor who advocates basic math skills and algorithms, said Canada’s downward progression in the international rankings – slipping from sixth to 13th among participating countries since 2000 – coincides with the adoption of discovery learning.

[Source.]

As I pointed out in a recent post, PISA essentially measures problem solving, and it seems strange to beef up calculation in an attempt to improve problem solving, especially considering Canada’s performance on the TIMSS which does tend to measure calculation. While Canada as a whole hadn’t participated in TIMSS since 1999 (they did in 2015 although the report isn’t out yet), some provinces did:

Ontario 8th grade: 2003 (521), 2007 (517), 2011 (512)
Ontario 4th grade: 2003 (511), 2007 (512), 2011 (518)
Quebec 8th grade: 2003 (543), 2007 (528), 2011 (532)
Quebec 4th grade: 2003 (506), 2007 (519), 2011 (533)

canadastat

So: Ontario 8th grade had a minor dip in 8th and rise in 4th grade, both nearly within statistical significance, and Quebec fluctuated down and then up in 8th grade and had an overall rise in 4th grade.

This does not sound like the sort of data to cause major shift in education policy. If anything, the rising numbers on 4th grade (where lack of drill gets decried the most) indicate that discovery curriculum has helped rather than hurt with calculation skills. (Ontario, for instance, while requiring 4th graders to be able to multiply up to 9, does not require memorizing multiplication tables.)

Let’s also lay on the table these quotes on the troubled nature of PISA scores themselves:

What if you learned that Pisa’s comparisons are not based on a common test, but on different students answering different questions? And what if switching these questions around leads to huge variations in the all- important Pisa rankings, with the UK finishing anywhere between 14th and 30th and Denmark between fifth and 37th?

… in Pisa 2006, about half the participating students were not asked any questions on reading and half were not tested at all on maths, although full rankings were produced for both subjects.

While I wouldn’t say the scores are valueless, I think using them as the sole basis of educational policy shift is troubling. Even if we take PISA scores at face value, the wide-open nature of the actual questions which mimic a discovery curriculum indicate you’d want more discovery curriculum, not less.

Unlearning mathematics

I was reading the comment thread in an old post of mine when I hit this gem by Bert Speelpenning:

Here is a short list of things that kids in math class routinely unlearn in their journey from K through 12:
* when you add something, it gets bigger
* when you see the symbol “+” you are supposed to add the numbers and come up with the answer
* the answer is the number written right after the “=” symbol
* you subtract from the bigger number
* a fraction is when you don’t have enough to make a whole
* a percentage can only go up to 100
* the axes on a graph look like an L
* straight lines fit the equation y=mx+b
* the values (labels) on the axes must be evenly spaced
* putting a “-” in front of something makes it negative
* a reciprocal is a fraction that has 1 on top.

What are some other things our students unlearn?

Which things are acceptable to teach initially in a way that will later be changed? When is unlearning problematic?

Which things are impossible to avoid having the unlearning effect? (For instance, even if the teacher avoids saying it explicitly, it’s hard for students to avoid assuming “when you add something, it gets bigger” before negative numbers get introduced.)

TIMSS, PISA, and the goals of mathematics education

It is tempting when hearing about student performance on an international or national test is to assume they measure some monolithic mathematical ability. When a country is doing well on a test mathematical teaching is doing fine, and when a country is doing worse math teaching needs to be looked at and changed.

Additionally, it is contended any countries that are doing well should have their strategies mimicked and any countries doing badly should have their strategies avoided.

One issue with these thoughts is that the two major international tests — the TIMSS and PISA — measure rather different things. Whether a country is doing well or not may depend on what you think the goals of mathematics education are.

Here are some samples from PISA:

PISA Sample #1

pisasample1

PISA Sample #2

You are asked to design a new set of coins. All coins will be circular and coloured silver, but of different diameters.

Researchers have found out that an ideal coin system meets the following requirements:

· diameters of coins should not be smaller than 15 mm and not be larger than 45 mm.

· given a coin, the diameter of the next coin must be at least 30% larger.

· the minting machinery can only produce coins with diameters of a whole number of millimetres (e.g. 17 mm is allowed, 17.3 mm is not).

Design a set of coins that satisfy the above requirements. You should start with a 15 mm coin and your set should contain as many coins as possible.

PISA Sample #3

A seal has to breathe even if it is asleep in the water. Martin observed a seal for one hour. At the start of his observation, the seal was at the surface and took a breath. It then dove to the bottom of the sea and started to sleep. From the bottom it slowly floated to the surface in 8 minutes and took a breath again. In three minutes it was back at the bottom of the sea again. Martin noticed that this whole process was a very regular one.

After one hour the seal was
a. At the Bottom
b. On its way up
c. Breathing
d. On its way down

Here are samples of TIMSS questions:

TIMSS Sample #1

Brad wanted to find three consecutive whole numbers that add up to 81. He wrote the equation

(n – 1) + n + (n + 1) = 81

What does the n stand for?

A)The least of the three whole numbers.
B)The middle whole number.
C) The greatest of the three whole numbers.
D)The difference between the least and greatest of the three whole numbers.

TIMSS Sample #2

Which of these is equal to y^3?

A) y + y + y
B) y x y x y
C) 3y
D) y^2 + y

TIMSS Sample #3

To mix a certain color of paint, Alana combines 5 liters of red paint, 2 liters of blue paint, and 2 liters of yellow paint. What is the ratio of red paint to the total amount of paint?
A) 5:2
B)9:4
C)5:4
D)5:9

The PISA tries to measure problem-solving, while the TIMSS focuses on computational skills.

This would all be a moot point if countries who did well on one test did well on the other but this is not always the case.

Possibly the most startling example is the United States, which scored below average in the 2012 PISA

pisacountrychart

but above average in the 2011 8th grade TIMSS, right next to Finland

timss2011

This is partly explained by the US having more students in than any in the world “who thought of math as a set of methods to remember and who approached math by trying to memorize steps.”

The link above chastises the US for doing badly at the PISA without mentioning the TIMSS. It’s possible to find articles with reversed priorities. Consider this letter via some Finnish educators:

The mathematics skills of new engineering students have been systematically tested during years 1999-2004 at Turku polytechnic using 20 mathematical problems. One example of poor knowledge of mathematics is the fact that only 35 percent of the 2400 tested students have been able to do an elementary problem where a fraction is subtracted from another fraction and the difference is divided by an integer.

If one does not know how to handle fractions, one is not able to know algebra, which uses the same mathematical rules. Algebra is a very important field of mathematics in engineering studies. It was not properly tested in the PISA study. Finnish basic school pupils have not done well in many comparative tests in algebra (IEA 1981, Kassel 1994-96, TIMSS 1999).

That is, despite the apparently objective measure of picking some test or another as a comparison, doing so asks the question: what’s our goal in mathematics education?

(Continued in Canada’s math wars and bad use of the PISA.)

Abstract and concrete simultaneously

In most education literature I have seen going from concrete to abstract concepts as a ladder.

abstractladder

The metaphor has always started with concrete objects before “ascending the ladder” to abstract ideas.

I was researching something for a different (as yet unrevealed) project when I came across the following.

ascentconcrete

[Source.]

That is, someone used the same metaphor but reversed the ladder.

This is from a paper on the Davydov curriculum, used in parts of Russia for the first three grades of school. It has the exotic position of teaching measuring before counting. Students compare objects with quantity but not number — strips of paper, balloons, weights:

Children then learn to use an uppercase letter to represent a quantitative property of an object, and to represent equality and inequality relationships with the signs =, ≠, >, and B, or A<B. There is no reference to numbers during this work: “A” represents the unmeasured length of the board.

[Source.]

A later exercise takes a board A and a board B which combine in length to make a board C, then have the students make statements like “A + B = C” and “C – B = A”.

ABCcompare

Number is eventually developed as a matter of comparing quantities. A small strip D might need to be used six times to make the length of a large strip E, giving the equation 6D = E and the idea that number results from repetition of a unit. This later presents a natural segue into the number line and fractional quantities.

The entire source is worth a read, because by the end of the third year students are doing complicated algebra problems. (The results are startling enough it has been called a “scam”.)

I found curious the assertion that somehow students were starting with abstract objects working their way to concrete ones. (The typical ladder metaphor is so ingrained in my head I originally typed “building their way down” in the previous sentence.) The students are, after all, handling boards; they may be simply comparing them and not attaching numbers. They give them letters like A and B, sure, but in a way that’s no less abstract than naming other things in the world.

After enough study I realized the curriculum was doing something clever without the creators being aware of it: they were presenting situations that (for the mind) were concrete and abstract at the same time.

For a mathematician’s perspective, this is impossible to do, but the world of mental models works differently. By handling a multitude of boards without numbers and sorting them as larger and smaller, an exact parallel is set up with the comparison of variables that are unknown numbers. Indeterminate lengths work functionally identical to indeterminate number.

This sort of thing doesn’t seem universally possible; it’s in this unique instance the abstract piggybacks off the concrete so nicely. Still it may be possible to hack it in: for my Q*Bert Teaches the Binomial Theorem video I used a split-screen trick of presenting concrete and abstract simultaneously.

qbertsplit

Although the sequence in the video gave the concrete example first, one could easily imagine the concrete being conjoined with an abstract example cold, without prior notice.

(For a more thorough treatment of the Davydov curriculum itself, try this article by Keith Devlin.)

My basic issue with cognitive load theory

The idea of “working memory” — well established since the 1950s — is that the most objects someone can hold in their working memory is 7 plus or minus 2. There have been some revisions to the idea since (mainly that the size of the chunks matter; for instance, learners in languages that use less syllables for their numbers have an easier time memorizing number sequences).

This was extrapolated in the 1980s to educational theory via “cognitive load theory” by stating that the learner’s working memory capacity should not be exceeded; this tends to be used to justify “direct instruction” where the teacher lays out some example problems and the students repeat problems matching the examples. The theory here is by matching examples students suffer as little cognitive load as possible.

Cognitive load theory has some well-remarked problems with a lack of falsification and a lack of connection with modern brain science. These issues likely deserve their own posts.

My issue with cognitive load theory as applied to education is more basic: the contention that direct instruction requires less working memory than any discovery-based alternative. It certainly is asserted often

All problem-based searching makes heavy demands on working memory. Furthermore, that working memory load does not contribute to the accumulation of knowledge in long-term memory because while working memory is being used to search for problem solutions, it is not available and cannot be used to learn.

but the assertion does not match what I see in reality.

To illustrate, here’s a straightforward example — defining convex and concave polygons — done with three discovery-type lessons and direct instruction.

Discovery Lesson #1

Click on the image below to use an interactive application. Use what you learn to write a working definition of “convex” and “concave”.

convexact

Then draw one example each of a convex polygon and a concave polygon. Justify why your pictures are correct.

Discovery #2

The polygons on the left are convex; the polygons on the right are concave. Give a working definition for “convex” and “concave”.

convconc

Then draw one example each of a convex polygon and a concave polygon (not copying any of the figures above). Justify why your pictures are correct.

Discovery #3

convconc

The polygons on the left are convex; the polygons on the right are concave. Try to decide looking at the picture the difference between the two.

…after discussion…

A convex polygon is a polygon with all interior angles less than 180º.
A concave polygon is a polygon with at least one interior angle greater than 180º. The polygons on the left are convex; the polygons on the right are concave.

Draw one example each of a convex polygon and a concave polygon (not copying any of the figures above). Justify why your pictures are correct.

Direct Instruction

A convex polygon is a polygon with all interior angles less than 180º.
A concave polygon is a polygon with at least one interior angle greater than 180º. The polygons on the left are convex; the polygons on the right are concave.

convconc

Draw one example each of a convex polygon and a concave polygon (not copying any of the figures above). Justify why your pictures are correct.

Analysis

Parsing and understanding technical words creates a demand on memory. The hardcore cognitive load theorist would claim such a demand is less than that of having the student create their own definition, but is that really the case? The student using their own words can rely on more comfortable and less technical vocabulary than the one reading the technical definition. The technical definition is easy to misunderstand and the intuitive visualization is only clear to a student if they have the subsequent examples.

Discovery #1 does not appear to have heavy cognitive load. On the contrary, being able to an immediately switch between “convex” and “concave” upon passing the 180º mark is much more tactile and intuitive than either of the other lessons. Parsing technical language creates more mental demands than simply moving a visual shape.

There might be a problem of a student in Discovery #1 or Discovery #2 coming up with an incorrect definition, but that’s why discovery is hard without a teacher present.

Discovery #3 is exactly identical to the direct lesson except the definition and examples are reversed places. Having a non-technical intuition built up before trying to parse the technical definition makes it easier to read; again it appears to have less cognitive demand.

Overestimating and underestimating

One of the basic assumptions of cognitive load theorists seems to be that the mental demands of discovery are given all at once. Usually the demands involve some sort of scaffolding. For instance, in Discovery #3 the intuitive discussion of the pictures and then definition are NOT given at the same time. Only after students have settled on an idea of the difference between the shapes — essentially reducing down to one mental object — is the definition given, which as I already pointed out is easier to read for a student who now has some context.

On the other hand, cognitive load theorists seem to underestimate the demands of direct instruction. While exact entire sentences tend not to be parsed by the student in definitions (this would clearly fail the “only seven units” test) mathematical language routinely has dense and specific enough language that breaking any supposed limit is quite easy. Using the direct instruction example above, taking everything in on one go would require a.) parsing and accepting the new terms “convex” b.) same for “concave” c.) recalling definitions of “polygon” d.) same for “interior angles” e.) keeping in mind the visual of greater and less than 180º f.) keeping track of “at least one” meaning 1, 2, 3, or more and g.) parsing the connection between a-f and the examples given below.

There are obviously counters to some of these — the definitions for instance should be internalized to a degree they are easy to grab from long term memory — but the list doesn’t look that different from a “discovery” lesson, and doesn’t possess the advantage of reducing pressure on vocabulary and language.

The overall concern

In truth, working memory is well-understood for memorizing digit sequences (called digit span) but the research gets fuzzy as processes start to include images and sounds. Any sort of declaration (including my own) that the working memory is busted by a particular task when the task involves mixed media is essentially arbitrary.

On top of that, the brain is associative to such an extent that memory feats are possible which appear to violate these conditions. For instance, there is a memory trick I used to perform for audiences where they would give me a list of 20 objects and I would repeat the list backwards. The trick works by pre-memorizing a list of 20 objects quite thoroughly — 1 for pencil, 2 for swan, say — and then associating the list with those objects. If the first object given was “yo-yo” I would imagine a yo-yo hanging off a pencil. The trick is quite doable by anyone and — given the fluency of the retrieval — suggests that association of images have a secondary status that exceeds that of standard “working memory”. (This is also how the competitors of the World Memory Championship operate, allowing them feats like memorizing 300 random words in 5 minutes.)

Follow

Get every new post delivered to your Inbox.

Join 148 other followers