## TIMSS, PISA, and the goals of mathematics education

It is tempting when hearing about student performance on an international or national test is to assume they measure some monolithic mathematical ability. When a country is doing well on a test mathematical teaching is doing fine, and when a country is doing worse math teaching needs to be looked at and changed.

Additionally, it is contended any countries that are doing well should have their strategies mimicked and any countries doing badly should have their strategies avoided.

One issue with these thoughts is that the two major international tests — the TIMSS and PISA — measure rather different things. Whether a country is doing well or not may depend on what you think the goals of mathematics education are.

Here are some samples from PISA:

PISA Sample #1

PISA Sample #2

You are asked to design a new set of coins. All coins will be circular and coloured silver, but of different diameters.

Researchers have found out that an ideal coin system meets the following requirements:

· diameters of coins should not be smaller than 15 mm and not be larger than 45 mm.

· given a coin, the diameter of the next coin must be at least 30% larger.

· the minting machinery can only produce coins with diameters of a whole number of millimetres (e.g. 17 mm is allowed, 17.3 mm is not).

Design a set of coins that satisfy the above requirements. You should start with a 15 mm coin and your set should contain as many coins as possible.

PISA Sample #3

A seal has to breathe even if it is asleep in the water. Martin observed a seal for one hour. At the start of his observation, the seal was at the surface and took a breath. It then dove to the bottom of the sea and started to sleep. From the bottom it slowly floated to the surface in 8 minutes and took a breath again. In three minutes it was back at the bottom of the sea again. Martin noticed that this whole process was a very regular one.

After one hour the seal was
a. At the Bottom
b. On its way up
c. Breathing
d. On its way down

Here are samples of TIMSS questions:

TIMSS Sample #1

Brad wanted to find three consecutive whole numbers that add up to 81. He wrote the equation

(n – 1) + n + (n + 1) = 81

What does the n stand for?

A)The least of the three whole numbers.
B)The middle whole number.
C) The greatest of the three whole numbers.
D)The difference between the least and greatest of the three whole numbers.

TIMSS Sample #2

Which of these is equal to y^3?

A) y + y + y
B) y x y x y
C) 3y
D) y^2 + y

TIMSS Sample #3

To mix a certain color of paint, Alana combines 5 liters of red paint, 2 liters of blue paint, and 2 liters of yellow paint. What is the ratio of red paint to the total amount of paint?
A) 5:2
B)9:4
C)5:4
D)5:9

The PISA tries to measure problem-solving, while the TIMSS focuses on computational skills.

This would all be a moot point if countries who did well on one test did well on the other but this is not always the case.

Possibly the most startling example is the United States, which scored below average in the 2012 PISA

but above average in the 2011 8th grade TIMSS, right next to Finland

This is partly explained by the US having more students in than any in the world “who thought of math as a set of methods to remember and who approached math by trying to memorize steps.”

The link above chastises the US for doing badly at the PISA without mentioning the TIMSS. It’s possible to find articles with reversed priorities. Consider this letter via some Finnish educators:

The mathematics skills of new engineering students have been systematically tested during years 1999-2004 at Turku polytechnic using 20 mathematical problems. One example of poor knowledge of mathematics is the fact that only 35 percent of the 2400 tested students have been able to do an elementary problem where a fraction is subtracted from another fraction and the difference is divided by an integer.

If one does not know how to handle fractions, one is not able to know algebra, which uses the same mathematical rules. Algebra is a very important field of mathematics in engineering studies. It was not properly tested in the PISA study. Finnish basic school pupils have not done well in many comparative tests in algebra (IEA 1981, Kassel 1994-96, TIMSS 1999).

That is, despite the apparently objective measure of picking some test or another as a comparison, doing so asks the question: what’s our goal in mathematics education?

### 21 Responses

1. Excellent. Goes towards explaining why Math Wars traditionalists discount PISA and tout TIMSS.

2. >The PISA tries to measure problem-solving, while the TIMSS focuses on computational skills.

Agreed, TIMSS assesses more rudimentary mathematics ability, so where’s the contradiction?

A third test, NAEP, also shows American 8th graders improving, while 12th grade performance remains flat (a stagnant 30% proficiency rate), which aligns with US “better” TIMSS (8th grade) performance and “worse” PISA (15-16 year olds) performance.

These results seem to confirm the notion that the US focuses on teaching students how to calculate, but continues to shortchange students (badly) in instilling more complex problem solving ability.

Of course, there are those that seek to undermine teaching basic calculation also, which would leave, well, not much.

• Agreed, TIMSS assesses more rudimentary mathematics ability, so where’s the contradiction?

I never said there was one.

I’d be careful with the long term NAEP — the 12th grade scores are near enough to the ceiling that it’s harder statistically for any improvement to register. Additionally, the problems at that level are a little more removed from what high school students learn in mathematics (this is related to the fact that the test was made in the 70s and is almost literally identically to the one given today and the “standard route” of curriculum was a little different).

These results seem to confirm the notion that the US focuses on teaching students how to calculate, but continues to shortchange students (badly) in instilling more complex problem solving ability.
Of course, there are those that seek to undermine teaching basic calculation also, which would leave, well, not much.

It’s tricky; certainly the data seems to indicate this isn’t zero-sum and you can have it both ways, but the countries like Korea that do very well at both have other factors like a mad amount of after-school tutoring.

Still, your comment seems to indicate you think the US should have more problem solving, you just want to make sure calculation isn’t sacrificed in the process?

3. First, “CCSSIMATH” is the anonymous handle of a group or individual that opposes and has been repeatedly (and not always fairly) critical of the CCSSI-Math standards. The anonymity coupled with the purposely misleading pseudonym is troubling. What’s there to hide here? It’s hardly risky to criticize the US DOE, Obama, or the Common Core, yet “CCSSIMATH” chooses to snipe from behind a mask. Not very confidence inspiring.

To be clear, I’m not a fan of specific things about the Common Core Math Standards, but my bigger concerns have been with the testing and the political and financial agendas behind the overall Common Core Initiative, the uses to which the testing is being or will be put, the fact that there are so many tie-ins to Big Publishing and Big Testing, and the likelihood that much of the support for the overall national “reform” package comes from forces interested in destroying public education and promoting private ownership and control of our allegedly democratic school systems. I’ve written critically about the Common Core for many years now and will continue to do so.

At the same time, many attacks on the Common Core seem ill-founded and politically suspect. Is there NOTHING of value in the entirety of the mathematics standards? That would be strange indeed. The last time I read criticism of mathematics standards that was entirely one-sided and negative was during the Math Wars, when members of groups like Mathematically Correct, NYC-HOLD, etc., relentlessly attacked various NCTM standards volumes, publications, and any curriculum that emerged in the ’90s from various NSF-funded development projects intended to provide districts with materials that reflected the spirit of NCTM-style reforms.

Indeed, reading various opinion pieces over the last two years, I’ve concluded that many of the usual suspects in the Math Wars are still hard at work, for good or ill, arguing as if nothing has happened of note in the last quarter century or so that should give anyone pause or lead to reevaluating assumptions and biases either for or against so-called reform and so-called traditional mathematics curricula and instruction.

Throw Tea Party lunacy into the mix and you have a lot more yelling over conspiracies to weaken the fabric of America’s youth (turns out that both critics and advocates of the Common Core see education as part of a global competition which the US is losing. The only difference between the sides is that the critics claim we’re going to lose even worse with Common Core while the supporters claim that this will raise our students to – what else? – NUMBER ONE IN THE WORLD! (Sound of piccolos, drums, etc. follow a bugle fanfare and 21 gun salute).

Pardon me for saying: a plague on both your houses. I’m not interested in mathematics education as part of a global competition. I find that view of these tests to be equally ridiculous and meaningless. The only value I see (or at least the main one) is to see what it is that a nation things is important for kids to be able to do when it comes to solving mathematical problems of various flavors, drawn from a reasonable range of specific content knowledge, but grounded in the understanding that there simply is no “one-size-fits-all” list of topics that will satisfy all the “experts” or be suitable for all children at each point in their development and growth as human beings.

I like this piece because it helps highlight some important differences in what these two international exams emphasize and attempt to measure. Our anonymous critic asks: “where’s the contradiction?” I don’t recall anyone saying there was a contradiction. But there is a lesson to be learned when listening to people who think that the TIMSS is meaningful and PISA is not. And such people do exist and continue to rail against PISA. They worship the nations that knock TIMSS out of the park, and utterly discount the high performance on PISA that Finnish students had. Seems a little strange, a little biased, a little contradictory, if both computational and problem-solving skills are important.

And Jason’s point about how US students view mathematics continues unanswered (at least by “CCSSIMATH”: maybe that doesn’t matter to him/her/them. It does to me. I suspect it does to many mathematics teachers, particularly those who value more than computational and procedural skills. I’m all for having children know how to compute, but that isn’t even close to enough, and historically the more we fret and fuss and obsess over “the basics,” the worse things are for most Americans when it comes to learning and valuing mathematics.

Sure there are a few people who’ve tried to shake things up by saying things about not teaching paper-and-pencil computation anymore, but they’ve had little impact and mostly have been misinterpreted (it’s easy to grab onto a comment that sounds radical and ignore the context and the exposition; makes for wonderful scare tactics, as we see in “CCSSIMATH’s” closing salvo. And Constance Kamii has said for decades that we need to avoid the PREMATURE teaching of algorithms to children (with research to support her views), only to have this repeated as “Never teach algorithms.” Hard to have a constructive debate with people who cannot resist twisting and perverting what you say into nonsense and then holding all their opponents accountable for things no one (or virtually no one) has ever said or thought.

As for radically shaking up the teaching of basic mathematics in this country, I have some question: has America gone metric yet? No? Any lessons to draw from that?

• I’m ok with people being anonymous here as long as they their comments are respectful and thoughtful of others. (Unlike a pitched mathforum battle this is my blog I can always close the comments.)

I don’t disagree with anything here. I mainly would like it if people would lay their cards out, because I get frustrated when people argue they clearly disagree about something fundamental but rather than discussing that fundamental thing (and possibly making headway) they go on about some ramification (where cogent discussion isn’t possible because the people in discussion aren’t even using the same premises).

• I’ve just gotten tired of this particular anonymous commenter because he/she/they show up on their own blog, on blogs I read elsewhere, and always with an agenda that never gets stated explicitly. After several years exposure to CCSSIMATH, I’m waiting for someone to take responsibility for what gets posted under that name, regardless of where it’s getting posted. That’s why I prefaced my comments with that issue.

That said, the actual comment just seems typical: there’s sniping, but it’s so indirect that if I hadn’t read lots of other comments from the same source, I might walk away wondering what the agenda was. After enough exposure, however, I feel like I have a pretty good idea. It would be nice if it just got said, clearly, for once, with a real person taking ownership.

Of course, I’m not even vaguely telling you how to run your blog. I am not consistent in how I handle comments on my own blog, so how would I feel well-positioned to give you direction or advice? I really don’t. It just seems weird to me at this particular point in the fighting over Common Core that people would feel compelled to hide their identities. Unless perhaps CCSSIMATH is actually Arne Duncan or David Coleman. That I would understand. 😉

4. So, then which existing assessments are valid measures of mathematical ability? Can anyone refer me to some that emphasize both deep and superficial knowledge?

• I suppose you’re asking what’s my favorite standardized test?

Out of all the tests I’ve seen, my favorite was from Hong Kong. But that was a while ago, and there easily could have been some state test that knocked things out of the park that I haven’t seen.

5. […] TIMSS, PISA, and the goals of mathematics education […]

6. […] TIMSS, PISA, and the goals of mathematics education […]

7. Only recently discovered your excellent blog; a belated comment.
I think it is important to take into account the fact that TIMSS has different categories for items. Your examples seem to come from 1995 (rather old), the three specific ones refer to ‘Performing routine procedures’. There also are ‘complex procedures’. In the current framework items are classified as being about ‘knowing’, ‘applying’ and ‘reasoning’. TIMSS report also has tables showing this, and shows that for knowing USA on average scores better (under their mean) and Applying and Reasoning less. My point is that TIMSS does show this distinction and in my view therefore might be more useful as an assessment, because PISA *only* looks at problem solving. TIMSS also samples classrooms which means we can say something about teachers. See Rindermann for excellent articles on the differences TIMSS and PISA, but also underlying IQ (g) etc.

By the way, you reference an article by Boaler on the Hechinger report and the article references a statement on ‘memorisation’ achieving lower. I was quite surprised when I read this in ‘Fluency without Fear’, as Asian countries seem to do both (knowing and understanding) very well. The references all boil down to a yet unpublished article by Boaler & Zoido which I would love to read. Until then I remain somewhat skeptical.

• The most recent released samples of TIMSS I’ve seen are 2007, and they still seem pretty calculation-oriented to me. I might toss a few samples on the post when I have time, though. The data-analysis artifacts are more interesting, but the actual questions related are pretty straightforward; certainly compared to an open-ended request to describe a procedure for finding the area of an irregular shape. (How do they even grade that one fairly across countries?)

I want to see the Boaler & Zoido too. Let me know if you come across a copy. I should clarify the contention seems to be along the lines of “if you did a survey, Americans are the most likely to say mathematics is a process of regurgitated memorized procedures” which is not the same thing as “in class Americans do more memorizing than anyone else”.

• True. Really read Rindermann if you haven’t already, neatly addresses some differences, also influence literacy. The cultural differences certainly are problematic, it is Kreiner’s big criticism of the assessments cos statistical procedure assume equal difficulty of items between countries, by using Rasch model.

On memorization: ok, too often things like bastardize morphed imo.

• Jason, I’ll reiterate my comment on a different thread but in a different context here: I think it’s overly narrow to say that TIMSS is about “calculation”. TIMSS is a test designed for diagnostic value in assessing the learning of skills in an educational system. Accordingly questions must be designed to produce a minimum of ambiguity. In a PISA test a question whose mathematical content is supposedly the addition of two fractions with different denominators might be embedded in a question about truckloads of dirt, with a great deal of cognitive “noise” and possibly a complex problem to solve on top of the simple skill being “tested”. When students do poorly on the question, where does one locate the problem? What should that education system fix? Is it a reading problem? A problem-solving deficiency? A logic error? Or an arithmetical fluency glitch? Or is it possible that students have no trouble with any of the above yet collapse under the challenge of putting them all together — a task-management difficulty? Because of the noise-to-signal problem there is difficulty drawing policy-relevant specific answers to such questions. TIMSS is designed to pinpoint problems with as little noise as possible.

Consider the two TIMSS problems in the graphic I used in my Globe & Mail article in 2013:

The first question tests, rather specifically, whether students know a correct procedure for subtracting fractions with different denominators. Practically all other noise is eliminated from the problem. It is, as you say, about calculation (but of the simplest sort — and for a reason; it is not testing ability to do a sophisticated calculation; it is only testing knowledge of a correct procedure, so the simpler the numbers, the better)

The second question also tests (fairly well) a single skill, which is clearly *not* about calculation. There is nothing to calculate, no procedure to know this time: It is testing whether or not students know the effect of multiplying positive non-integers well enough to infer the location of the result in a special case: when both numbers are less than 1. The results are dismal. And while the hiving of this skill away from conflating noise by the test designers is nothing short of heroic, I think it cannot entirely escape the noise. Could it be, for example, that many students are confused by the diagram, or the way in which the problem is posed? It’s hard to say. (NOTE: My diagram is somewhat different than the original in TIMSS — I modified because of space restrictions, trying to keep the spirit of the original.)

I would recommend you stop speaking of TIMSS as testing “only calculation”. It does not — it is a test whose purpose appears to be to finely parse individual skills for diagnostic purposes to inform educational policy-relevant decision-making. PISA, in contrast, tries to measure the overall ability to use multiple types of skills — mathematical and nonmathematical — in concert. While such “real-world” math skills may be an educationally valuable thing to know, of what value is it in fixing problems? Is it of more value to you when you go to the doctor for stomach pains that he takes your vital signs and pronounces you “ill” or if he performs fine-tuned diagnostics and declares that he has determined your appendix is about to erupt?

• Granted there may be a better word than “calculational” here. I’ve been thinking of it as a spectrum between that and project-based.

I intentionally quoted TIMSS #1 because it is a bit conceptual, and I consider that a different axis.

PISA, in contrast, tries to measure the overall ability to use multiple types of skills — mathematical and nonmathematical — in concert. While such “real-world” math skills may be an educationally valuable thing to know, of what value is it in fixing problems?

I think this comment gets at the heart of a lot of the disagreement going on.

Statistics in particular is good to refer to here; stats is weird to teach because it is nonsense without coherent reference to the reality is modeling. You can’t just whip out a mean instead of a median without some justification based on the reality of what you’re doing.

Since stats involves necessarily inverse problems where there is not one right answer, the answer must be tempered by justification and argument. An AP Stats class necessarily involves a lot more writing than AP Calculus.

The other thing going on is referring to authentic math modelling, and it’s something that math textbooks do horribly. I often see models which are flat out _wrong_ to use in context and the textbook blithely skips along plugging in values to get a 3rd degree polynomial for, say, company profit, when the model makes no sense at all for the real situation. When I see such a thing I know the textbook writer has not had enough exposure to applied math.

Maybe a good “litmus test” question would be: do you think the majority of students should aspire towards a.) statistics or b.) calculus?

While such “real-world” math skills may be an educationally valuable thing to know, of what value is it in fixing problems? Is it of more value to you when you go to the doctor for stomach pains that he takes your vital signs and pronounces you “ill” or if he performs fine-tuned diagnostics and declares that he has determined your appendix is about to erupt?

Not sure what you mean here — are you claiming TIMSS is superior because mistakes on the PISA may have more to do with the real-life linkages than with mathematics, so it is hard to diagnose if math is really the issue?

• No, I’m saying that PISA provides summative data; TIMSS provides formative. I thought that was obvious. PISA takes (let us say) 12 skills and bundles them into a single question. TIMSS makes a single question for each skill. Where PISA says, your system is performing at a 0.6 (say out of 1) on these skills but we have no #\$% idea which skills are not being serviced properly, TIMSS will say your system is performing at 0.8 or above on 7 of them but at 0.2, 0.2 and 0.1 on three of them — AND we can tell you which ones.

The medical metaphor was supposed to bring this out — I thought it was obvious. If someone sticks a thermometer up your butt and says, hmm, you’ve got a high temperature, is that more or less valuable than that person doing three tests to see specific indicators of specific problems and says, it looks very likely that you lave low-grade spinal meningitis?

TIMSS parses skills at a fine mesh, thus providing relatively accurate information about what needs fixing. PISA lumps skills together into compount aggregate tasks and simply says how good they are taken together.

This is reading both assessments largely according to their intentions. As you point out elsewhere, PISA does not do a particularly good job at being “real world”. However, as the Freudenthal Institute has written at length, it is wrong for English-speaking people to interpret RME simply as based on “real-world problem solving” — it means more like a “realistic” way of putting skills together and seeing them in aggregate, which may involve a lot of word problems, but is not obsessed with making everything “true to life”. Apparently the meaning is somewhat clearer in Dutch. In any case, as a professional mathematician, I don’t think RME-based material is even more slightly realistic (if one means really reflecting professional practice in or application of the discipline) than conventional instruction.

And … I think it’s a fool’s errand to try to structure all, most or even a large part of mathematics education in service of such an end. That’s why we have applied math programs. For the most part the power of mathematical education lies in the subject matter itself. And the breadth of its applicability lies in its abstraction, not in any one, two or ten carefully chosen “real world” versions of it.

As I often say, 5+5=10 has a million applications. But \$5+\$5=\$10 is a piece of accounting trivia.

• That’s intriguing rhetoric, Robert. If only things were so cut-and-dried. I’m getting the feeling that you use educational terminology a heck of a lot more loosely than you’d tolerate non-mathematicians using mathematics terminology. Whatever do you mean when you say “formative” and “summative” assessment? It doesn’t appear, from the example of how you’ve used it, that your definitions correspond to what experts in the field intend. You seem to have redefined “summative” assessment to mean “questions that lump two or more skills or ‘bits’ of mathematical knowledge together, thus making it impossible to determine from the student’s erroneous response what s/he doesn’t understand.” That’s a useful idea, but it isn’t what educators mean by “summative assessment.” And then to suggest that FORMATIVE assessment would consist of single-skill questions while a logical continuation of your other definition, just compounds the confusion for knowledgeable readers and the likelihood of misleading non-experts. If you’re going to make up your own terminology, at least mention in passing what you’re up to, and give a nod to the standard definitions.

I’m in basic agreement that there is always a problem with test items that lump together multiple skills, if all you look at is which answer is selected on a multiple-choice test. However, teachers give questions like that all the time in math classes: just not in multiple-choice format. High-stakes and standardized tests: well, there are cost factors to consider if free-response or short-answer questions are given, and who knows WHAT the scorer’s qualifications or level of competence really are for any given test paper or item? Chaos ensues.

Unless, of course, we’re talking about actual formative assessment, a la experts like Dylan Wiliam and Paul Black. That’s assessment centered on teachers giving students specific constructive individual feedback. No letter or numerical grades. Just information on what was done well, what needs improvement to get to the next level (whatever exactly that might mean), and some ideas about what might be done to make such improvements.

Summative assessment just means a grade that “sums up” work over a given time frame, be it a lesson, a unit, half-term, or whole course. The grade is supposed to be everything the student needs to know and as Black and WIliam and other researchers have shown, that grade obviates any other feedback that might accompany it, like those aforementioned constructive, specific comments. Students just see the grade and either ignore or forget or fail to process the comments.

From that perspective, TIMSS, PISA, the ACT, SAT, and countless other tests are equally useful (or, on my view, useless) for students, parents, educators, et al. Your attempt to distinguish between “good” summative and “bad” summative tests by changing the meaning of summative is, as I said, a nice rhetorical ploy, but it’s ultimately not honest and not actually useful.

Part of the problem with what you attempt is that there really are few teachers who make their summative tests the way you apparently think they should be made (so as to be “formative,” in your sense). And of course the political reality is that the people who write legislation forcing the high stakes tests onto everyone couldn’t care less about what you seem to think is the focus: giving useful feedback for the cycle of learning to teachers, students, and other stakeholders. If that actually is produced (which is rare indeed), it’s almost a complete coincidence or the result of tons of work by the individual teachers, who would have to go over every answer on every item of every student’s test, then probably spend some time asking individual students for verbal responses about what they were thinking and why they picked the answer they did (assuming coin-flipping wasn’t going on or the equivalent). Teachers don’t do much of that analysis. They’d love to get that information, but Pearson, ETS, the ACT, et al., do NOT provide it or try to glean it themselves. There’s no formative assessment going on at all.

And the politicians are fine with that, because if they actually wanted to see testing lead to meaningful improvement and change, they wouldn’t advocate the sort of profit-focused testing that is the stock in trade of Big Publishing/Big Testing. They’d demand something better. For about 5 minutes, I thought that was going on with the two big testing consortia hired to write Common Core tests, but that was illusory, to put it mildly. Again, meaningful assessment is really costly, both in terms of time and money. And it also is useful for improving instruction and learning. Which isn’t what’s been going on in the US and many other countries when it comes to testing. Not by a long shot.

• Not sure MPG’s comment is worth replying to here.

• Right, because it just knocked some of the pins out from your fantastical commentary here. You had no problems yesterday suggesting that Jo Boaler pulled an idea out of her . . . [insert ‘clever’ rhetorical device here plucked straight from the playground], but you don’t think that you need to address your own invented definitions of various types of assessment.

I understand, Robert, really. When people get caught doing something less than professional, silence is one way to go, hoping no one else will notice. But as long as you’re spouting off here, I’ll be reading your “stuff,” and on occasion will probably point out some of the more “inventive” things you say. I couldn’t possibly go through every creation of yours here, as they’ve already piled up impressively beyond my willingness to invest time, but some of your howlers, wherever they’re “pulled from,” cannot be allowed to stand unchallenged. Jason is too polite. I’m not.

8. I read this claim of Boaler in a chapter of her book she had published as a magazine article a few years back. To support the statement about memorization she references a single source: a particular article on the OECD site. I downloaded and skimmed the article in question. Not finding anything pertaining to her claim, I text-searched the word “memorization”, then the substring “memor”. This does not occur anywhere in the very large document. I am guessing that this idea was pulled out of Boaler’s … ah … active imagination, and the OECD document was voluminous enough to be a credible place to bury a reference and expect to maintain plausible deniability.