The Sour Grapes of Pisa

Still standing.

Still standing.


The new Pisa 2012 will be released on Tuesday, which for those who are unfamiliar with it is a recurrent survey on the performance of schoolchildren from all over the world. The winners in this survey tend to be the same over the years: various Chinese populations (Shanghai, Hong Kong and Singapore), Finland, Canada, Australia, Japan and South Korea.

A high rank is generally interpreted as the result of a good policy and a low rank will usually create headlines demanding reforms.

The Pisa Hall of Shame

At the bottom of the order we find poor and often Muslim countries like Kyrgyzstan, Azerbaijan, Peru, Panama, Qatar and Albania. But besides rich countries at the top and poor at the bottom, there is also the phenomenon of over- and underachievers, poor countries whose children perform well and vice versa. This measure is more interesting since it indicates a failed education policy or other factors that may have been overlooked.

Some of the worst underachievers (excluding tax havens and small oil countries) are USA, UK, Austria, Germany, Denmark and Sweden. How do people in these countries respond to the results of the survey?

America: Self-Criticism and Fear of China

When commenting on the results American media have mainly been comparing themselves with China and been surprisingly self-critical, as for instance from Stacie Nevadomski in the Huff Post shows,

“The truth, the real news, is that there is no news here. These results should be no surprise. The long slide in American student performance relative to global peers has been a constant drumbeat, paralleling the domestic failures of our schools shown in Waiting for ‘Superman’.”

Or education secretary Arne Duncan,

“The findings, I have to admit, show that the United States needs to urgently accelerate student learning to try to remain competitive in the knowledge economy of the 21st century.”

James Fallows in The Atlantic agrees but adds that that Shanghai, the winner of the Pisa 2009, isn’t representative of the whole of China – which is correct; neither is Hong Kong or Singapore who also rank at the very top. These are all elite populations. America scored better against the other Chinese regions of Macao and Taiwan and it would probably do even better compared to all of China. Although those who are familiar with unpublished results from other parts of China claim they are very respectable.

Regardless of how well America compares to China, it’s still a fact that 13 countries score better than America and all have significantly lower GDP per capita. Maybe that would be a more constructive focus.

European Skepticism

A more disturbing reaction has come from some of the European underachievers. Recently, the largest newspaper in Sweden, Dagens Nyheter (Today’s News), has featured an article about the upcoming Pisa 2012, with the headline “Several Countries Cheated with School Results”, suggesting that countries like Italy, Slovenia and the United Arab Emirates has falsified their results. The article is based on an unpublished study by German and Canadian sociology professors Jörg Blasius and Victor Thiessen. “The result means that the credibility of the Pisa survey can be questioned,” says Blasius.

This story is also getting attention in Denmark, another underachiever, where one of the major papers, Berlingske Tidende has an article about it. The article includes other criticism as well, mainly that of Svend Kreiner, a statistics professor at the University of Copenhagen. Kreiner has analysed earlier results. He is critical of how a lot of questions are omitted for some countries but included for others. He claims the methods of scoring are so arbitrary Denmark could be ranked second or 42th depending on arbitrary tweaks in the evaluation. In the article, president of the Danish Teachers Association, Anders Bondo Christensen, says it’s time to scrap the survey altogether.

In the UK (also an underachiever), there is a similar discussion on the TES educational community. In an article, TES’s William Stewart writes,

“Politicians worldwide, such as England’s education secretary Michael Gove, have based their case for sweeping, controversial reforms on the fact that their countries’ Pisa rankings have “plummeted”. Meanwhile, top-ranked success stories such as Finland have become international bywords for educational excellence, with other ambitious countries queuing up to see how they have managed it.”


“But what if there are “serious problems” with the Pisa data? What if the statistical techniques used to compile it are “utterly wrong” and based on a ‘profound conceptual error’? Suppose the whole idea of being able to accurately rank such diverse education systems is ‘meaningless’, ‘madness’?”

Sour Grapes?

However, fact is the alleged cheating is only concerned with follow-up questions to principals that have been found to be largely identical in many cases. It doesn’t concern the performance of the schoolchildren. It hasn’t even been established if it is actual fraud designed to make the countries in question look better or if it’s just a matter of laziness or even the fact that some principals are heads of more than one school.

Also for Kreiner’s analysis, Pisa’s own statistician, Andreas Schleicher, questions it on grounds that Kreiner is using a very small part of the data in spite of having access to all of it. He also questions the methods Kreiner used and suggests that they our outdated. As a response to alleged cherry picking, Kreiner replies by accusing Pisa/Schleicher of doing similar things. To me, that sort of rhetoric doesn’t exactly increase his credibility.

It’s not easy for a non-expert to make any sense of this, but I have to say that there is something disconcerting with the fact that Svend Kreiner is being awarded a prize for his critique while no one in Danish press is asking the questions that Schleicher’s comment raises. Is everyone in Denmark so familiar with statistics that it’s a non-issue? And big headlines about cheating even though it hasn’t been established?

Alternative Explanations

Rather than blaming the statistics, there could be other things behind why some countries underachieve. The most obvious thing would be changes in national IQs.

The Pisa survey (and similar tests) correlates strongly to intelligence tests; so much in fact that it actually is an intelligence test although it’s rarely referred to as such. This explains a lot of the rank order, because we know that intelligence is highly heritable and resistant to external forces – like education policies. Smart people like the Chinese are going to rank at the top and less smart people like Ugandans are going to be somewhere at the bottom. This is also a reason to be skeptical of the European sour grapes skepticism I mentioned earlier. If there was something seriously wrong with the Pisa it wouldn’t correlate so much with similar tests.

But intelligence alone can’t explain under- and overachievers. If we look at the latest national IQ estimates, the underachievers score like this,

Austria 99, UK 99.1, USA 97.5, Germany 98.8, Denmark 97.2, Sweden 98.6,

and, the three overachievers score like this,

Finland 100.9, Estonia 99.7 and Poland 96.1.

There is not much difference; the averages for these groups are 98.4 and 98.9. But maybe this snapshot disguises a trend in which underachievers are on the way down and vice versa?


I would suggest that this is the case, and that the reason for this is immigration. East Asian countries don’t have much immigration to speak of, but in Europe there has been a varying influx of people in recent years, especially from Muslim countries. The national IQs in these countries are usually around 85 so Western countries that receive a lot of these immigrants should see a larger decline in national IQ averages than other countries. If we look at PEW’s survey of Muslims in Europe, we can make a comparison between over- and underachievers. The most striking overachievers are Estonia, Poland and Finland, countries that all have extremely small Muslim minorities making up 0.1, 0.1 and 0.8 percent of the population respectively. Compare that with the figures for the underachievers Austria 5.7, UK 4.6, Denmark 4.1, Germany 5.0 and Sweden 4.9. Many immigrants are very young children who will take the Pisa survey in years to come or are taking it now but have yet to become adults and have an effect on the economy. Since the Pisa survey is just an intelligence test for children they simply reflect the influx of young and low IQ people. Underachievers have a larger influx so they score worse than you’d expect from the current national IQs and wealth because the effects on these metrics will kick in some years in the future. And overachievers are just maintaining their national IQs and consequently rising in rank since the rank order is relative.

So the way to improve the scores is not to reform the education system but to change the immigration policy.

So, Any Bets for Tuesday?

If I was to guess I would base it solely on national IQs, immigration and introversion scores, although that last one is a bit speculative. This would lead me to the safe bet that East Asians will stay at the top and no real low IQ countries will surprise anyone with a high rank. Judging by the immigration projections from PEW, Eastern Europe looks like it could be on the rise, or at least maintaining positions, although Russia and Bulgaria look problematic. The real winners here are probably small to medium sized countries that are relatively stable, like Estonia, Poland, Slovenia, Croatia and Hungary. Western Europe will show a downward trend, especially for countries that are increasing their share of the Muslim population from an already high level, like the UK, Austria, Sweden, Belgium and France.

But whatever happens, you can be certain that many people in the underachieving countries will keep blaming the test. Because changing your view on human nature and society is hard work and shooting the messenger is easy.

For more details about the Pisa survey, check out Steve Sailers blog which features several interesting posts on this subject.

85 Responses to The Sour Grapes of Pisa

  1. SP says:


    To further your prediction, you would expect the small countries such as New Zeeland and Canada with comparatively larger Chinese immigrants than their OECD counterparts have scored well relatively (seems that they were the leading Western countries in 2009 apart from Finland), and will keep scoring well in 2012 and probably 2015, until the point where the influx of much larger low IQ Indian immigrants take over.

    For slightly larger Anglo-Saxen country like Australia, the 2012 PISA results would be relatively stable, since it’s PISA age population has currently similar amount of high IQ Chinese and low IQ Indians. It will gradually go downwards though in the near further because just like in New Zeeland and Canada, Indians (with avg IQ 1sd-2sd above India’s mean) are fastly becoming the largest immigrant group in Australia. The law of mean reversal starting in the coming decade/s will ensure the 2nd and 3rd generations of these 1st gen smarter Indians (relative to India’s average) start to become a drag on their respective national averages even though credited with the newfound Flynn Effect points.

    • Staffan says:

      That sounds about right; unless these countries will rethink their policies. I get the feeling that Australia and New Zealand may wake up, but Canada looks worse with immigrants just passing through Europe on their way there.

      Regression to the mean, as you mention is another hidden problem. I saw a documentary on TV recently where a intellectual Palestinian man here (in Sweden) didn’t understand why his daughter wanted to wear a veil and was so interested in Islam.

  2. pauljaminet says:

    I am very interested in your suggestion that introversion scores differ among nations and affect PISA scores. Have you written about this?

    • Staffan says:

      No, I thought about including it here but it’s problematic. The Big Five measure is pretty useless in this regard (or perhaps it’s better now) and Eysenck’s measure (more biologically based) is highly correlated to intelligence so it’s hard to tell if it adds anything beyond that. Sadly, I don’t have the statistical skills to control for intelligence, but Finland is certainly conspicuous in the regard. Estonians are known to be very introverted as well and neither them or the Finns have spectacular IQ averages like those found in East Asian countries.

      For what it’s worth, here is Richard Lynns data,

      Click to access LynnMartin1995.pdf

      There may be something to be found in Peter Rentfrows Big Five statewide data too if you have statewide stats for Pisa scores.

  3. SP says:

    Regression to the mean is in my view the single biggest problem which has been largely ignored by the HBD community.

    The immigration policy of the fastly elder West (notablely the US, the UK, Canada, Australia) is largely based on the assumption that it can and will constantly absorb the best and the brightest of the third world, particularly from huge countries such as India. It looks like a prefect cycle at the surface where new waves of smarter immigrants would be drawed whenever the previous waves of immigrants become pensioners…and so on.

    However, the immigrants exporting nations have usually very low average IQ (except several temporary exceptions from the East Asia such as China or Korea – this source will dry up after the economy of China and Korea converge with advanced countries in the near future) and have therefore very shallow source pool on the right side of the curve. Not long ago WSJ featured a story that India is fastly burning through its shallow right side talent pool. So the “perfect plan” is only true up to a tipping point, where the net positive economic contribution of 1st gen smarter immigrants, e.g.Indians, is mooted by the numerically far larger chain immigration who usually have much lower average IQ, together with IQ mean reversal phenominon associated with offsprings of all of them. This is where catastrophic and inreversable consequences will become evident to everyone.

    • Staffan says:

      I don’t think the HBD community is ignoring this, I hear it regularly on various blogs. But you’re absolutely right about the scenario. The policy is blank slatist and views intelligence as a kind of renewable resource – a skill that anyone can acquire.

      California is at the forefront of this process and hopefully their situation will become clear soon and make others snap out of their Enlightenment nonsense. If Los Angeles becomes the next Detroit then (sadly) that could do it.

  4. JayMan says:

    Great work!

    James Fallows in The Atlantic agrees but adds that that Shanghai, the winner of the Pisa 2009, isn’t representative of the whole of China – which is correct; neither is Hong Kong or Singapore who also rank at the very top. These are all elite populations. America scored better against the other Chinese regions of Macao and Taiwan and it would probably do even better compared to all of China. Although those who are familiar with unpublished results from other parts of China claim they are very respectable.

    That information is available. Indeed, it’s a key part of the argument against poverty being the cause of low IQ:

    Welcome Readers from Portugal! | JayMan’s Blog

    Evo and Proud: Too darn hot?

    • Staffan says:


      Regarding the poverty argument, it is a strong argument. Sometimes I wonder if even mild malnutrition would make any big difference.

      Americans have a lot of respect/fear of the Chinese. I suspect it may be that the reason for this could be that the greates Chinese elite of all is that residing in America.

  5. K says:

    Immigration certainly has had some effect on the poor performance of e.g. Scandinavian countries on the PISA tests, but it’s not the whole story. The natives do rather poorly, too.

    I’m interested in seeing how Finland fares in the new PISA study. A recently published national study showed that today’s 15-year-olds score about 0.5 SDs below the 15-year-olds of 2001 on reasoning, math, and reading comprehension tests. The researchers suggested that the increasing influence of ”socio-constructivist pedagogy” (I’m not sure what that is, but it sure sounds like nonsense) is one of the culprits, pointing out that it spread to Finland from Scandinavia where it had previously wreaked havoc. Probably relatedly, several Nordic countries (including Finland) have witnessed a negative Flynn effect on IQ scores since the late 1990s or so.

    • Staffan says:

      There may be other factors too, no argument there. Weird and slack liberal pedagogy is a likely candidate.

      The internet is another possible factor. A recent finding from Statistics Sweden (which in spite of the informal name is the national bureau of statistics) here in Sweden found that 21 percent of boys 13-15 years old (Pisa age) are playing computer or video games at least three hours a day. Only 6 percent of the girls did that. This seems to be mirrored in their school performance.

  6. […] The Sour Grapes of Pisa – from staffan. […]

  7. If one breaks the results down by race, the underachievers UK and USA don’t look so bad anymore. People unconsciously picture the majority population of a country, so Turks in Germany don’t come to mind. But they are in the numbers.

    In the American media interpretation, it is always those other countries who are doing something so brilliant in their educational systems that stupid Americans just won’t. Even if that as contradictory, such as the hard-driving South Korean system versus the laid-back Finnish one. Any stick is good enough to beat the schools with.

    I have heard, though it was just in suggestion mode in Smithsonian, that the Finns make it a point to let the children know that national honor is at stake and to try their best. That would not lift the top scores (they are already doing that), but it might lifte the floor quite a bit. That’s what our local school district does to look best in the state.

  8. Staffan says:

    Yes, the results reflect a bad immigration policy more than anything else. You have to wonder what Korean children think when they notice how the Finnish kids are taking it easy with little homework and frequent holidays – and still being at the absolute top.

    Another thing people tend to forget is that the wealth of a society is also built on personality. While introversion probably will promote school performance, America has that entrepreneurial personality profile that is essential for long-term growth. I believe the Finns have a bit of that too, but their introversion is probably holding them back in this regard.

    I haven’t heard about the Finnish kids being taught that it’s about national honor, not sure that would work either. We have a lot of Finns here in Sweden, I grew up with several of them and they don’t express much patriotism. I get the feeling it’s more like they’re suspicious of foreigners. I think the big key is introversion. They are like a nation of aspies; you don’t have to ask them to read a book.

    • SP says:

      “You have to wonder what Korean children think when they notice how the Finnish kids are taking it easy with little homework and frequent holidays – and still being at the absolute top”.

      Me thinks perhaps you get this wrong, Staffen.

      Koreans have avg IQ well above the Fins and they are working at least 50% harder. If makes no logical sense that there’s little gap between 2 groups. There is a big gap between the Koreans and the Fins actually, make no mistake about it. You don’t see the gap because the PISA test is still quite easy at the level, and lots of rot knowledge, very useful btw, are not tested.

      The world’s toughest (to the extend of inhuman) and hence most competitive secondary school systems is China’s, followed by S Korea’s, both are designed to compete at the highest level for the selection in their respective national university entrance exams, pushing both the intellect and the physical conditions of the Chinese and Korean kids to the very limit. A regular school starts at 7am ends at 11pm or even later, with perhaps 5 days real holiday per year. When you have a free day, the homework and extra schooling alone will make it even tougher than a regular school day. You only have to look at suicide rates at that age group in China and Korea to gauge the picture.

      • Staffan says:

        It depends on what you think I got wrong : ) Clearly, the Koreans have higher IQs than the Finns but they also score very similar on the Pisa survey which suggests that the difference isn’t that big. I don’t agree that Pisa is an easy test because in that case we’d see less variation. It may be that the Koreans are suffering from a case of diminishing returns in that the test is too easy for them to shine – but there is still a gap beween them and Shanghai so they haven’t hit the ceiling. There are probably other small groups who score even higher but aren’t featured in the ranking, like Ashkenazi Jews.

        Rote learning may be useful for some purposes but is it related to intelligence? A matter of definition but my guess is that it is antagonistic to critical thinking and unrelated to abstract thinking in general.

  9. Gottlieb says:

    As a student, I performed a test related to national math olympics. How not to gain any point of participation , I and most of my classmates despise the test and did it anyway so we can leave early from class.
    One possible explanation for the catastrophic results of Brazil and Argentina of the last PISA may be just the complete lack of zeal that students give this type of testing, which usually is not worth anything in terms of points for their individual notes newsletter.

    Many factors related to intelligence are despised by these types of tests, mainly because they are made ​​by the idea of ​​universality of human cognitive characteristics, which indeed does exactly not seem real.
    People of different nationalities, for reasons of micro-evolutionary selections for cognitive abilities tend to present as a result, different cognitive styles and therefore different ways to learn. Likewise, there are different cognitive realities within the own national sub-segments, eg in the case of white Americans, where we have an inner diversity, both in quantity as in quality.
    These tests seem to fit perfectly for Asians, precisely because they relate to a number of personality characteristics like subservience to authority and learning style based on the emphasis of sequential memorization. Large heads can fit more information.

    The Scandinavian societies, always at the forefront, is ”slowly ” internalizing the idea of ​​a school where you can find and polish talents, not only rewarding technical skills and concurrently changing its structure, left-brain oriented, that most schools have, and trying to propose new ideas to transform schools, rather than a factory of technically exploitable individuals to greenhouses for the cultivation of individual talents. The old emphasis in the left without old-Marxist tendencies in the individual and not on statistically related groups.

    • Staffan says:

      But why would Argentinians and Brazilians ignore the test while others don’t?

      The tests may suit Asians but so does reality, judging by the national gdp and economic growth. You sound a little bit like a Dane here ; )

      • Gottlieb says:

        Much also depends on the structure of both the school environment , as the cultural environment . Schools in both countries and speak more specifically Brazil , are very bad , unless the structure itself , and more on the relationship between student – environment, students in 70% of cases are sufferable both for learning long – term as for behavior. Besides this factor , an environment depressed ( with lousy teachers salaries , dysfunctional students ) also appear two other factors , genetic and cultural dominance .
        Brazilians are world known for their outgoing personality , overly extroverted , a genetic factor . But Brazil’s current culture is the real trash. Combine low technical intelligence , excessive extroversion and a permissive society equal to modern Sweden . What we have ?
        The Argentine case I speculate that recent neo Marxist tendencies, aiming to restructure the school structural dynamics may have resulted in their contempt towards the test. Honestly, it is unlikely that our neighbors have average IQ 80, impossible. They are almost European in behavior.
        These PISA tests should have specific reports on a number of environmental variables that perchance may affect the final results, just the ones I mentioned. The level of motivation of the students, if the students studied the matter before the exams (which would be categorized as fraud, since it is a quasi-IQ test) etc..
        In fact, when you create a certain type of society to a certain segment of the population, you have an almost perfect correlation fitting. In a Western-style society where memorization, sequential learning and docile personality are valued, including through a bureaucratic structure, it is well known that those populations with plenty of this type will show its superiority, is put as a player of volleyball in a basketball game.
        I am not a relativist -premature-ejaculation, but it is necessary to evaluate every detail so that a leftist psycho do not use these to your advantage methodological vacuums, because assured that he will.
        About Finland, very interesting what you say. Could it be that they are a nation of aspies??
        I think not, why would it be necessary to analyze all its characteristics consistent with the neuro-condition under discussion.
        I think European Jews seem to be closer to this possibility, not entirely, but due to a number of features that have been speculated about and they had told him earlier.

  10. Staffan says:

    Brazilian extraversion is likely one explanation. It does seem that underachievers like Denmark, UK and USA have the same problem – although at the same time this trait adds to entrepreneurial competence. Everyone knows that Danes are business savvy.

    Argentina is a bit of a mystery. They measure around 96 in IQ but estimates that combine this with Pisa and other tests end up around 93. Judging by their demographics I suspect that 96 is closer to the truth. Argentina is also a economic disaster, not sure why.

    Finns are not exactly aspies but their introversion is striking. But they are not into rote learning; they are much more independent thinkers. In terms of the MBTI, I would say they are INTP.

    • Gottlieb says:

      You should have no idea how the Brazilian people are extroverted. In an unbearable level. I can not let it lie. The people here are in a constant state of delusion of ”happiness” and positive view of the (brazilian) world, I am not saying that these traits are bad, but in Brazil?
      As I said some time ago here, introversion is an essential trait for high intelligence and as I read recently, the achievements of the extroverted, real achievements and not socio-subjective, are only achieved when they are in a state of introspection.
      Estimates from PISA Argentine are misleading, the possible reasons I have outlined. There is no possibility that the average Argentine is similar in intelligence Brazilian average. Clearly see an important difference between the two groups. Argentines actually must have a IQ around 96.
      Well, you know that in the past, the Argentina was the fifth world economic power (50’s), driven by their commercial monopoly on the export of meat.
      The successive incompetent governments may help explain the Argentine situation, but seeing the country even after the severe economic crisis, I do not think they have worsened considerably as the point of reaching a level of social gravity as Brazil.
      If we could compare them, I would suggest that Argentina, through its predominantly southern European genetic heritage , have an overall performance expected for the southern European societies, these were not random engulfed ( hypothetically) by the euro zone, in other words, modern Argentine and Uruguay are only like as southern european countries without financial aid by ”rich and blonde” north.
      The endemic corruption can also be expected to be relatively natural by different socio-biological traits they inherited from their most important ancestors, italians and spaniards.
      .Interesting, I also fit into this category. The part where it says that people in this category tend to have two types of personality, the inner and the outer, I found myself completely. In fact, I’m fairly introverted, but internally has a histrionic bipolar with schizotypal tendencies. But my outer ashamed of my interior. A case of self-slavery and self-oppression.

  11. SP says:

    A separate note on Shanghai scores:

    The western mainstream papers, TVs, columnists and blogs mostly get it very wrong. Intentionally or not, Andreas Schleicher of PISA is making a very popular name like Shanghai a star while ignoring the much bigger stars – various anonymous (to the ears of the West) Chinese provinces. China’s govt doesn’t want to “loss face” (i.e. in case they are not #1) so they ask PISA not to publish the results for the provinces prior to 2015, because they want to make it look even better once published.

    1. Shanghai is not the top scorer of China, not in maths, not in science, not in reading, far from it actually if Chinese Gaokao – the one and only THE test in China – is a guide. Shanghai has been scoring average or even slightly below average most of the time in Gaokao Science and Arts (both include maths) for decades.

    2. Shanghai’s counterpart in the US is not Massachusetts State, but New York State, and Shanghai scores quite similar within China to what NY State would usually score like in American league – just about the average in ranking.

    3. China’s IQ map in Jayman’s quote is not accurate enough, but generally right. Make no mistake, Shanghai is amongst China’s highest avg IQ population – the Wu Chinese. The real reason why Shanghai couldn’t score top in China’s most important test Gaokao is because it enjoys a separate university entrance criterion from the rest of China, so that Shanghai kids don’t need to study hard and score high at all in order to get to pretty good local universities (hence much higher university acceptance ratio than China’s average) that require sometimes, e.g. 10 or 20% more scores on math, from students of other provinces to get the same university place. Precisely because of this, many Chinese provinces ( off my heads I know at least 7 or 8)could easily outscore Shanghai in PISA maths, science, not sure about the reading though as it appears to me that high reading scores is also somewhat very correlated to per cap wealth/edu investment that put Chinese poor provinces into comparative disadvantage. So majority students from China’s relatively poorer provinces are working much harder and are much hungrier than Shanghai’s and usually score higher.

    4. China’s top scores, China’s “Massachusetts” or “bay area LA” that is to say, is either Jiangsu province or Zhejiang province. Both are about equally strong. 2009 PISA tested 12 Chinese provinces, a very representative sample I have to say, covering total 621 schools from allover China, including Gaokao top powerhouses, the mid scorers and the lowest scoring provinces that are largely Western Muslin and Southern ones bordering Vietnam and Laos. The general result was leaked. The average score of China was 520, still higher than the US average 496 ( see here: If standalone, these 2 would be the world’s top 2, outpacing Shanghai by a mile.

    5. Jiangsu’s scores are not known up to now, but Zhejiang’s scores were leaked in local Chinese newspaper: Science 567 (No.2 in the world, 8 points below Shanghai), Reading 525 ( No.7 in the world, 1 point behind Singapore), Maths 598 (No.2 in the world, only 2 points behind Shanghai). So, why Zhejiang scored badly compared to Shanghai?? It is because Andreas Schleicher didn’t randomly select schools in Zhejiang as he did in other countries, but deliberately select 80% scores from the poorest rural areas of Zhejiang (imbedded in the tiny footnote of PISA release). I suspect the similar happened to nearby Jiangsu province. Without this manoeuvre, Zhejiang’s and Jiangsu’s statistically valid real Science and Maths scores would have made Shanghai’s look like a regular Joe – as usually reflected in China’s Gaokao. Yet hey, the name of Shanghai is much sexier than Zhejiang to PISA’s campaigns.

    • Staffan says:

      Very interesting. You wouldn’t know if the people of Zheijang are more introverted or differ in other traits, like conscientiousnes?

      It’s odd that Pisa would play along like this, and much more damning than the Danish critique – although it doesn’t change the overall picture much for Denmark and the UK. But Canada and New Zealand are actually ahead of China which isn’t exactly what the official ranking looks like. Although Canada will not keep that position for long.

      • SP says:

        Wu Chinese is a branch of Han Chinese. Dunno if they are more introverted or not…nonetheless you can go check related info here:

        This area is called “Jiang Nan”( meaning South of River-refer to Yangtze river), covering both Jiangsu and Zhejiang provinces. The place has traditionally amongst the top in Imperial China’s Exams for 1000+ years. Personally I believe they have average IQ (after adding several Flynn Effect points on their existing ones) higher than Ashkenazi Jews.

        There’re countless famous Wu Chinese in Chinese history… For familiar names for the West, perhaps you hear of Chiang Kai-shek? Or just off my head for example American Chinese such as An Wang (founder of Wang Laboratories)? Jerry Yang (of yahoo)? Steve Hsu (the IQ blog physician)? Jeremy Lin (NBA basketball player) ?… They are all quite typical Wu Chinese.

        Second thing, I’d be rather careful to claim that Canada and New Zeeland were ahead of China in 2009 PISA. Reason

        1: population-wise small countries like Canada and New Zeeland have disproportionally big ethnic Chinese populace (most come from mainland China for New Zeeland , both HK and mainland Chinese for Canada, all with about > average IQ level). It is well known in both New Zeeland and Canada that Chinese pupils are on the top of rankings on most of their school exams. Honestly, these countries’ being ranked very high in PISA has a lot to do with sheer numbers of the ethnic Chinese there.

        2. Chinese provinces, including Shanghai, have per capita education spending 1/10th, or even far lower than 1/10th , than those of Canada and New Zeeland. So logically with even 1/5th in the future, it could easily improve the Chinese average performance to some extend.

        3. Most importantly, we only know that PISA did random sampling in Shanghai Province. We know that PISA discriminated Zhejiang Province in sampling as my previous post explained. We know that PISA tested a wide range of Chinese provinces – from the very top to the middle to the very bottom. Yet we don’t know the sampling criteria and methodology used on them, which are complicated further by drastically different population densities from many of these provinces tested. As Zhejiang’s case demonstrated, China’s “520” average PISA scores is just a very very rough indication to show the world that “Even in rural areas and in disadvantaged environments, you see a remarkable performance” ( to quote Schleicher’s own words) and could easily be misleading (particularly towards the downside in my view). So unless we can get the detail explanations on above questions from Schleicher through PISA footnotes, we just don’t know if Canada, or anyone, got better scores than China at this stage. As it goes “without data, you are just another person with an opinion”.

        Furthermore, James Fallows is dead wrong in saying that Hong Kong and Singapore are exceptions because they are all elite populations. Hong Kong ( mostly Cantonese) and Singapore ( Mostly Fujianese and Cantonese) are in fact very average sub population, in terms of average IQ, with Han Chinese. In Chinese standard, they are not dumb, but certainly not amongst the best. That’s also why you see Taiwan with its majority Fujianese, despite some elites, scored not at the very top in some areas. Shanghai on the other hand, despite of its dismal historical Gaokao performance in China vía-a-vís many other provinces, is actually an elite population average IQ-wise, simply because they are Wu Chinese. This is the fundamental root reason why Shanghai City will replace Singapore, Hong Kong and Tokyo as the most important financial centre in the Far East in the near future. The raw power – the average IQ of Shanghai – is there. Publicly admit it or not, people in the Far East know that. Even firing with half a cylinder due to China’s rigid one party restriction and all the corruptions and lack of proper innovations it entails, Shanghai is still catching them up in a blistering speed.

  12. SP says:

    In contrary to what James Fallows claims, what Singapore and Hong Kong’s performance actually shows is that to which level an average ( IQ-wise, in Chinese standard) sub Han Chinese population could achieve, if given the OECD level of education spending per capita and enviroment.

  13. SP says:

    An analogy on the East and the West:

    Jiangsu province+Shanghai+ Zhejiang province are almost like Germany+Benelux of Europe.

    China’s s North Eastern provinces (Manchuria) and Korea are like a “Scandinavia” and a “Denmark”.

    The rest of China has its seperate share of “Italy, France, Spain, Poland, Switzerland, Hungary, Greece, Czech, Russia…”, and even a small “Turkey” – China’s far west province.

    Japan is like Britain.

    Singapore, Hong Kong are like Monaco.

    Striking similarities on the both sides of Eurasia!

    • Staffan says:

      The idea that Canada and New Zealand are doing well because of their Chinese minorities is of course true, but the Pisa ranking is about countries, not ethnic groups. And if we make this comparison we’d have to adjust for other minorities that are lowering the White scores.

      As for education expenditure, keep in mind that intelligence has proven itself to be very unaffected by environmental factors. And if anything order and silence in the class room is probably more important than any other factor when it comes to scholastic performance.

      Sure the data may be flawed but it’s the best data we have to make a comparison with. It may be that China would score much better but it may also be that the test favors East Asians. After all this is a test which is close to IQ a metric correlated to material wealth (and the capacity to spend money on education in the first place) but the very high scores of this region doesn’t come with a similar wealth. There is growth, yes, but China is a long, long way from Canada.

      Is the Gaokao a better proxy for IQ than the Pisa? We’d have to know the correlations to say that. It could be a great test but rely more on conscientiousness, a trait that also affects school performance but which isn’t intelligence.

      • SP says:

        Intelligence is highly correlated to PISA scores, and you’re right that ” it has proven itself to be very unaffected by environmental factors”. Hoever, the correlation is not 1. Other things such as average income, nutritious level, and education spending per capita affect the rest of it as well. At that very high level comparison, a big difference on these “other things” does count a lot on the final results of ranking. Case in point, think about North Vs. South Korea, bearing in mind that on average, North Korea had MUCH more share of elite (IQ-wise) Korean population than the South prior to Korean War in 1950, and logically still has it today. Yet it’s very likely that the North would loss to the South Koreans in PISA test by a country mile, if the North takes part in it.

        On Gaokao: perhaps it is, no, It IS THE most difficult test on the planet earth for pre-university age groups. 🙂 But seriously, it is, because its difficulty level is the only guarantee that China’s educators could possiblely and fairly separate 10s of millions of students every year into different levels to fit into 1000s of different unis and colleges in the country. PISA is nothing compared to Gaokao. So talking about correlations to IQ, I believe that Gaokao is the most correlated test as far as I know. You can actually google some of Gaokao Maths and science exams. Some of them have been partially translated into English.

        Rule of thumb on Gaokao: If Singapore, Hong Kong , South Korea, or Finland take Chinese Gaokao on Maths and Sciences, they would be ranked as China’s national average, or even below average, within China’s 23 or so provinces, almost guaranteed. If Canada takes Gaokao Maths, it would be ranked as even lower. Shanghai has been consistently scoring about average in Gaokao Maths and Sciences for decades. There is absolutely no reason, other than some patriotism, for Shanghai kids to dedicate more of their efforts into some nameless PISA tests than Gaokao, which will hugely affect their immediate life. That’s why when Shanghai scored the best in PISA, there was barely any news report on it in Chinese major newspapers or blogs, as the general Chinese public just thought it’s nothing worth mentioning.

        Thing is that the West ( inclu. Rushton and Lynn) has been consistently underestimating Chinese average IQ. Food for thoughts:

        — Hong Kong Cantonese, average in Gaokao (Canton province of China, the blood-brothers of HK Cantonese score average in Gaokao), yet with avg IQ of 108… ditto Singapore’s 107 ( Cantonese+ Fujianese + Malays+ Indians) … Korea and Japan had been completely dominated by China technologically for thousands of years except the last 100 years or so, only about 3 generations starting form the time of your grandpa.. And Koreans have 107 avg IQ , Japanese have 105 IQ… what Chinese avg should logically have even before reasonable Flynn points?

    • Staffan says:

      Hehe, I wouldn’t know but that certainly looks like you have an associative mind : ) I get the feeling Koreans are a bit Western with a consumption culture and being a bit vain.

  14. SP says:

    “You wouldn’t know if the people of Zheijang are more introverted or differ in other traits, like conscientiousnes?”

    Another answer: from the recent (the last 20 years or so) Chinese immigrants to Europe –

    The Chinese in the UK and Netherlands are mostly 2nd, 3rd and even 4th generation Cantonese. However, vast majority of the Chinese (>80%, in some cases > 90%) in Italy, Spain, Portugal, Germany, Austria, Belgium, Greece, Southern France, and probably Sweden as well (I am not sure on this one though) are in fact from Zhejiang. They are mostly from some impoverished villages in remote moutainous areas of he province. Most of them and their immediate relatives were farmers (grow vegetables and raise pigs & chicken, you know) before came to Europe. Majority of them got no more than high school or junior high school education in China. They came to Europe by means of chain immigration/family reunion. Inspite of the fact that many of them may appear lack of manners & education, often dress oddly, even sometimes smell like sweet & sour chicken of Chinese restaurants where most of them work, they belong to one of the most elite population groups IQ-wise in China (as a group definitely much more elite than Hong Kong Chinese and Singaporean Chinese inspite of being much poorer with much less education), and hence certainly in the world standard as well, make no mistake about it.

    Want to know about the general characteristics of Zhejiang people, Staffen? Bearing in mind their average education level, you can have the first-hand experiences right now in the Western Europe, and watch out for academic performances of their 2nd and 3rd generations in the local schools in the near future.

    • Staffan says:

      I’ll do that, although there aren’t that many of them, but I have noticed a lot at a medical university called Karolinska Institutet. Allegedly several of their Chinese students have returned to China and are now holding top positions at various universities. But they seem less prone to make academic careers of that kind in Sweden. Not sure why, could be the language.

    • LJ says:

      SP, I don’t know How I happened to stumble into this blog but I was really intrigued and impressed by your comments. You display an understanding of China that seems extrmely unlikely for a westerner to have. If I need to guess, I would guess that you are a Chinese Singaporean. Only a Chinese in Singapore is likely to be sufficiently well versed with both the Chinese and Western cultures to write like you did. Am I correct?

  15. Gottlieb says:

    Not so much off-topic

    And this now????

    Ps… I decorated nine numbers, hohoho

    • Staffan says:

      The article is overstating its case. It’s an ongoing debate and the difference between the measures is not very large. But it would be nice to get away from the problem with cultural bias.

      • Gottlieb says:

        After searching a bit I ended up finding it here.
        Especially within the hbd community, there is a belief that IQ tests are perfect. (Even if you make it clear you do not believe it, make no effort to prove otherwise this printing). I think the possibility of refining the search for the trait , primarily bio-based, that can serve as intelligence parameter, increasingly would be the natural and logical process in this matter. The classical tests to tests raven, to increasingly simple exercises, but objectives that serve to encompass much of human abilities serving as a holistic model, and to in the near future in psychometric assessments through studies with computerized images of the brain in action.

  16. Matt says:

    There has been some work I think I can recall (Roe?) that shows extroversion up as a personality factor for effective teachers, while introversion is linked to effective learning.

    That’s interesting to me because it suggests the possibility of educational model of “parasitic introversion” – where introverts test well, due to a combination of their introversion and good teaching, so win academic competitions and end up in teaching positions where they then proceed to offer teaching of a poorer quality than they themselves received (if non-people person introverts tend to make worse teachers).

    You might see this in the performance of introverted nations – somehow they don’t do as well as their co-ethnics in countries where people are more extroverted, because their diasporas get the benefits of relatively more extroverted teachers.

    (I don’t think this has much currency with the Finns, who don’t have a more successful diaspora, but for some other nations it might hold weight).

    Another point about extroversion also tends to link up with better mental health, at least on an intra-population level – if you’re more knowledgeable but also more crazy and depressed, you may be relatively less effective. That knowledge may just end up translating into crazy, wrong theories or be limited by dysfunctional behavior.

    Again, where extroverts have an advantage in offering mental health promoting social support, this could be enhancing to introverts in a way that is effectively “parasitic” on the introverts behalf.

    • Staffan says:

      Interesting and plausible theory. Extraverted teachers should almost by definition be better than their introverted colleagues. It’s hard to tell if this is shown in a diaspora effect though because these minorities may be elites.

      There is perhaps a link to mental health too. If you look at the WHO epidemiology of schizophrenia it is much more common in the East Asian countries than in the more extraverted West:

      • Staffan says:

        I meant to say a link on the inter-populational level.

      • Matt says:

        Thanks. Btw, for folks like Gottlieb, my reference to “Parasitic Introversion” by me were really just an analogy to the idea of “Parasitic Liberalism”, an idea where promoting the most liberal people and ideas in society is theorized to eat away at the conditions that a liberal society possible at all.

        In a similar way, the analogy was that a strategy of promoting the most individually academically able, who would tend to be introverts, as teachers without controlling for their introversion would eat away at the teaching quality, which would tend to be higher among the less academically successful extroverts, leaving everyone less academically able successful.

        I certainly don’t think of introverts as “parasites” in society!

        Good point re diasporas Steffan, although rather than diasporas being generally elite, I think it’s more the case that the odds ratios for, for instance masters degree holders to migrate are much higher than average. This doesn’t affect the average much, because masters degree holders are a pretty small percentage of the population, but because they’re very smart and even with regression their kids are very smart, you can get big differences at the high end tails, especially when this compounds an already extant mean population difference.

        E.g. when you consider Terence Tao, a Chinese Australian, for an example, he’s not probable at all from a population of 866,205 (the Australian demographic) average Chinese, with a normal SD, or which probably far less than half are kids. But does become probable once you consider how smart his parents are and that regression of IQ is a lesser factor in the Chinese.

        The freaky high standard deviations found in Asian Americans, but not their native countries, still seem to persist when single Asian ethnic groups are considered alone, so I think this seems plausible.

        On reflection I do think my introversion-teaching model may hold in reality to some extent, but thinking about it carefully it’s hard to get out of the thicket – for instance East Asians educational obsession, perhaps driven in part by their introversion, might motivate them to give higher status to teachers (as is reported in China and Korea) and get better candidates, even though their introverted personality types all things equal be more suited to study than teaching.

  17. Gottlieb says:

    Matt, I did not quite understand his theory, the translation was imperfect through the translator. One way or another way, I totally disagree, if that’s what he tried to say that extroverts would be more intelligent and that introverts would be prone to a kind of parasitic behavior.
    In fact, between teachers, and between journalists and psychologists, there is a positive relationship between extraversion and professional success. Indeed, in much of the professions it happens, but do not necessarily mean that there is some kind of major fault between introverts. For obvious reasons, the ability to socialize that this happens.
    I agree with Staffan that there may be a relationship between physical health and personality and that one of the likely explanations for the introvert behavior, low risk , is precisely the most fragile group health assumes a strategy of low micro – bacterial exposure . However , I believe that among the highest levels of intelligence , both relatively fragile health , as introversion , are extremely common . I’ve read about studies that found a relationship between allergies , myopia and high technical intelligence convergent , ie iq .
    The high-risk behaviors are almost always related to low intelligence and most of them have great chances to result in crime .
    As a contribution of my analytical personal experience so far , although I agree that I am not as neutral for this purpose , but anyway , I live the more extrovert country in the world ( I believe) and I have no doubt that most of these people are well characterized as far from being bright.

    The extraversion is very common among ordinary people and the higher is the intellectual level of the social group , the lower the presence of extroverts . It’s not difficult for you to see for example a concert of classical music versus a show of Rhiana .
    Also for very obvious reasons , introverted people tend to avoid major exhibition in social – subjective dynamics , replace the time that would be socializing for intellectual engagement .
    Contrary to what seemed to me that you have suggested , are extroverts who often end up adopting a parasitic behavior. It is also not very difficult to recognize this pattern . The nerds and too talented , invent the world while extraverts besides demean their proper values ​​, yet using all the tech geek to continue to propagate their genes for superficial friendship.
    Much of all major cultural , social and scientific achievements of mankind , were not only made ​​by introverts but also imposing their way.
    Extroverts are overrepresented among the tribe-cultural groups such as the politically correct leftists. Much of the extreme human oppression against deviant groups, has given precisely by means of extroverts.
    I’m not being weighted with them, in theory, because it totally believe that the next evolutionary step of mankind will become minority or eliminate this behavior by combining its opposite type, ie introversion.
    I do not see evolution in the world without depth and more seriously, just as I see no future in my country.
    Staffan where that schizophrenia is more common in Asia??

    • Staffan says:

      Just click the Wikipedia-link in my previous comment.

      As for parastic relationships, this is in regards to education. Clearly extraverts will benefit from introverts in other fields, as you suggest.

      • Gottlieb says:

        Yes I know it is only clicking the link that you left available, but I read the other day that at least in the U.S., schizophrenia is more common among african Americans. Interesting Indonesia first. However, it is important to understand how they should make the diagnosis, what criteria are being taken into consideration. Or is that a bio-local phenomenon or may have strong cultural factors that cause this landscape. Me startling possibility that the diagnoses in a demographically large, geographically dispersed, Muslim and poor country may have reliable data on the incidence of any type of mental disorder. It may be that in fact, marriages between cousins​​, more racial mixing factor, may have combined this result or that misconceptions regarding the diagnosis may be being committed successively.
        I did not understand that introverts tend to adopt a parasite behavior in relation to education, education?
        I think you need to show you from my point of view what actually happens in our societies. Well, there are numerous time throughout cold wars between the opposite happen. Blacks against whites (or vice versa) tall people against short people, beautiful people against ugly people, smart against stupid, etc. … introverts against extroverts.
        Most human populations except Asia, are extroverted. In the West, there is a better balance and greater variation between countries where we Finland vs. Italy for example. At the most, extroverts are more successful for both dominate most nations but also to impose their model of behavior. Supposedly this would be a good thing, but it is not. It is good according to the theory of balanced polymorphism. No extroverts, positive traits such as perseverance, good humor will not be distributed by population.
        However, it seems clear to me, especially with the advent of modern psychology, that extroversion is trying to eliminate, introversion of the natural competition.
        That is, the oppressor process of those who can socialize and give great emphasis on this activity, remains free and loose, now not only making the introvert as a pathological being as well as making their lives in all aspects of life , rather negative .
        We know that most of the children suffering from bulliyng in schools are introverted while much of the children who cause it are extroverted .
        In no time , much of the introverts , adopt parasitic behaviors and make them, it is not necessary because these are of the same nature , but because extroverts take account of key positions because of their ability to charismatic leadership and because his great awareness, introverted people are put in positions that reflected in low utilization of its capacities . From this vision , we can interpret this as laziness , which is not at all true .
        I’m currently nearing the situation that needs to be demonstrated socio – subjective and artificial charm skills , I have not , to get good jobs . I’m socio – objective , as most introverts and so a good deal of smart people .

  18. Gottlieb says:

    I do not think the West is an extrovert site, but a mix between introverts and extroverts, just as creative people tend to be. Much more than among blacks and Asians, among whites is not uncommon to find people who experienced periods of intense joy and after that experienced more insight and even tendencies towards depression.
    The explanation for huge success of Western civilization and its people can be summarized by the philosophy of the middle path, will always be the best. However, I still have the idea that the next evolutionary step of mankind is to make the majority of humans like as introverts, but not in plumb level as in Asian nations. So far the greatest prudence and dismay at the next to become a reality, we’ll be a cyber-anthropological level, of course, inventions mostly carried out by the losers of glasses, spine in our face.

  19. paceni says:

    William Stewart had a copy of Dr Morrison’s paper in June 2012. Neither the TES nor OECD Pisa have issued a response pointing out where is wrong yet you cite William Stewart, a reporter as some sort of authority on OECD PIsa. He is simply a vehicle for their propaganda. If you have a mathematical not statistical response to this paper please present it. Otherwise your article like Andreas Schleicher’s entire flawed project is just another opinion.

    A fundamental conundrum in psychology’s standard model of measurement and its consequences for PISA global rankings.

    Dr. Hugh Morrison
    (Formerly)The Queen’s University of Belfast
    ( )


    This paper is concerned with current approaches to measurement in psychology and their use by organisations like the Organisation for Economic Co-operation and Development (OECD) to hold the education systems of nation states to “global” standards. The OECD’s league table – the Programme for International Student Assessment (PISA) – has the potential to throw a country’s education system into crisis. For example, Ertl (2006) documents the effects of so-called “PISA-shock” in Germany, and Takayama (2008) describes a similar reaction in Japan. Given that a country’s PISA ranking can play a role in decisions concerning foreign direct investment, it is important to confirm that the measurement model which produces the ranks is sound. Moreover, the OECD has already spread its remit beyond the PISA league table to include teacher evaluation through its Teaching and Learning International Survey (TALIS). The OECD is currently developing PISA-like tests to facilitate global comparisons of the education on offer in universities through its Assessment of Higher Education Learning Outcomes (AHELO) programme: “Governments and individuals have never invested more in higher education. No reliable international data exists on the outcomes of learning: the few studies that exist are nationally focused” (Rinne & Ozga, 2013, p. 99). Given the sheer global reach of the OECD project, it is important to investigate the coherence of the measurement model which underpins its data.

    At the heart of 21st century approaches to measurement in psychology is the Generalised Linear Item Response Theory (GLIRT) approach (Borsboom, Mellenbergh and Van Heerden, 2003, p. 204) and the OECD uses Item Response Theory (IRT) to generate its PISA ranks. A particular attraction of IRT for the OECD is its claim that estimates of examinee ability are item-independent. This is vital to PISA’s notion of “plausible values” because each examinee only takes a subset of items from the whole “item battery.” Without the Rasch model’s claim to item-independent ability measures, PISA’s assertion that student performance can be reported on common scales, even when these students have taken different subsets of items, would be invalid.

    This paper will focus on the particular IRT model used by OECD, the so-called Rasch model, but the arguments generalise to all IRT models. Proponents of the model portray Rasch as closing the gap between psychological measurement and measurement in the physical sciences. Elliot, Murray and Pearson (1978, pp. 25-26) claim that “Rasch ability scores have many similar characteristics to physical measurement” and Wright (1997, p. 44) argues that the arrival of the Rasch model means that “there is no methodical reason why social science cannot become as stable, as reproducible, and hence as useful as physics.” This paper highlights the incoherence of the model.

    The Rasch model and its paradox

    The Rasch model is defined as follows:

    P(X_is=1 ┤| θ_(s,) β_i)= e^((θ_s-β_i))/(1+ e^((θ_s-β_i)) )

    X_is is the response (X) made by subject s to item i;

    θ_(s )is the trait level of subject s;

    β_i is the difficulty of item i; and

    X_is=1 indicates a correct response to the item.

    On the face of it, the model uses a mathematical function to allow the psychometrician to compute the probability that a randomly selected individual of ability θ will provide the correct response to an item of difficulty β. A particular ability and difficulty value will be chosen for illustration, but the analysis which follows has universal application. When the values θ = 1 and β = 2, for example, are substituted in the Rasch model, a scientific calculator will quickly confirm that the probability that an individual of ability θ = 1 will respond correctly to an item of difficulty β = 2 is given as 0.27 approximately. It follows that if a large sample of individuals, all with this same ability, respond to this item, 27% will give the correct response.

    In the Rasch model “the abilities specified in the model are the only factors influencing examinees’ responses to test items” (Hambleton, Swaminathan & Rogers, 1991, p. 10). This results in a paradox. If a large sample of individuals of exactly the same ability respond to the same item, designed to measure that ability, why would 27% get it right and 73% get it wrong? If the item measures ability and the individuals are all of equal ability, then surely the model must indicate that they all get it right, or they all get it wrong?

    Does the Rasch model really represent an advance on classical test theory?

    The Rasch model is portrayed as a radical advance on what went before – classical test theory (CTT). In classical test theory, “[p]erhaps the most important shortcoming is that examinee characteristics and test characteristics cannot be separated: each can be interpreted only in the context of the other. The examinee characteristic we are interested in is the ‘ability’ measured by the test” (Hambleton, Swaminathan & Rogers, 1991, p. 2).

    An examinee’s ability is defined only in terms of a particular test. When the test is “hard,” the examinee will appear to have low ability; when the test is “easy,” the examinee will appear to have higher ability. What do we mean by “hard” and “easy” tests? The difficulty of a test item is defined as ‘the proportion of examinees in a group of interest who answer the item correctly.’ Whether an item is hard or easy depends on the ability of the examinees being measured, and the ability of the examinees depends on whether the items are hard or easy! (Hambleton, Swaminathan & Rogers, 1991, pp. 2-3)

    Measures of ability in the Rasch model, on the other hand, are claimed to be completely independent of the items used to measure such abilities. This is vital to the computation of plausible values because no student answers more than a fraction of the totality of PISA items.

    A puzzle emerges immediately: if the Rasch model treats as separable what classical test theory treats as profoundly entangled – with Rasch regarded as a significant advance on classical test theory – why does the empirical data not reflect two radically different measurement frameworks? Based on large scale comparisons of item and person statistics, Fan (1998) notes: “These very high correlations indicate that CTT- and IRT-based person ability estimates are very comparable with each other. In other words, regardless of which measurement framework we rely on, the same or very similar conclusions will be drawn regarding the ability levels of individual examinees” (p. 8), and concludes: “the results here would suggest that the Rasch model might not offer any empirical advantage over the much simpler CTT framework” (p. 9). Fan (1998) confirms Thorndike’s (1962, p. 12) pessimism concerning the likely impact of IRT: “For the large bulk of testing, both with locally developed and standardized tests, I doubt that there will be a great deal of change. The items that we select for a test will not be much different, and the resulting tests will have much the same properties.”

    In what follows, the case is made that in the Rasch model, just as in Classical Test Theory, ability cannot be separated from the item used to measure it. Rasch’s model is shown to be incoherent and this has clear consequences for the entire OECD project. Moreover, the arguments presented here undermine psychology’s “standard measurement model” (Borsboom, Mellenbergh & van Heerden, 2003) with implications for all IRT models and Structural Equation Modelling.

    The Rasch model: early indications of incoherence

    The first hints of Rasch’s confusion appear in the early pages of his 1960 treatise which sets out the Rasch model, Probabilistic Models for Some Intelligence and Attainment Tests. Rasch’s lifelong obsession – captured in his closely associated notions of “models of measurement” and “specific objectivity” – with measurement models capable of application to the social and natural sciences can be recognized in his portrayal of the Rasch model. In constructing his model Rasch (1960, p. 10) rejects deterministic Newtonian measurement for the indeterminism of quantum mechanics:

    For the construction of the models referred to I shall take recourse to some points of view … of a more general character. Into the system of classical physics enter a number of fundamental laws, e.g. the Newtonian laws. … A characteristic property of these laws is that they are deterministic. … None the less it should not be overlooked that the laws do not give an accurate picture of nature. … In modern physics … the deterministic view has been abandoned. No deterministic description for e.g. radioactive emission seems within reach, but for the description of such irregularities the theory of probability has proved an extremely valuable tool.

    Rasch (1960, p. 11) likens the unmeasured individual to a radioactive nuclide about to decay. Quantum mechanics teaches that, unlike Newtonian mechanics, if one had complete information about the nuclide, one still couldn’t predict the moment of decay with accuracy. Indeterminism is a constitutive feature of quantum mechanics: one cannot know, even if one had complete knowledge of the universe, what will happen next to a quantum system. Irreducible uncertainty applies. For Rasch (1960, p. 11): “Where it is a question of human beings and their actions, it appears quite hopeless to construct models which will be useful for purposes of prediction in separate cases. On the contrary, what a human being actually does seems quite haphazard, none less than radioactive emission.” Rasch (1960, p. 11) makes clear his rejection of deterministic Newtonian models: “This way of speaking points to the possibility of mapping upon models of a kind different from those used in classical physics, more like the models in modern physics – models that are indeterministic.”

    Quantum indeterminism has implications for Rasch’s “models of measurement.” In quantum mechanics, measurement doesn’t simply produce information about some pre-existing state. Rather, measurement transforms the indeterminate to the determinate. Measurement causes what is indeterminate to take on a determinate value. In the classical model which Rasch rejects, measurement is simply a process of checking up on what pre-existed the act of measurement, while quantum measurement causes the previously indeterminate to take on a definite value. However, latent variable theorists in general, and Rasch in particular, treat “ability” as an intrinsic attribute of the person, and they view measurement as an act of checking up on that attribute.

    The early pages of Rasch’s (1960) text raise doubts about his understanding of the central mathematical conceit of his model: probability. One gets the clear impression that Rasch associates probability with indeterminism. But completely determinate situations can involve probability. The outcome of the toss of a coin is completely determined from the moment the coin leaves the thrower’s hand. If one had knowledge of the initial speed of projection, the angle of inclination of the initial motion to the horizontal, the initial angular momentum, the local acceleration of gravity, and so on, one could use Newtonian mechanics to predict the outcome. Probability is invoked because of the coin-thrower’s ignorance of these parameters. Such probabilities are referred to as subjective probabilities.

    In modern physics, uncertainty is constitutive and not a consequence of the limitations of human beings or their measuring instruments. Quantum physicists deal in objective probability. Finally, the notion of separability or “specific objectivity” as Rasch labelled it, is absolutely central to his thinking: “Rasch’s demand for specific objective measurement means that the measure of a person’s ability must be independent of which items were used” (Rost, 2001, p. 28). However, quantum mechanics is founded on non-separabilty; one cannot break the conceptual link between what is measured and the measuring instrument. The mathematics of the early pages of Rasch (1960) do not auger well for the mathematical coherence of his model, but it is important to set out the case against the model with greater rigour.

    Bohr and Wittgenstein: indeterminism in psychological measurement

    A possible source of Rasch’s efforts to find “models of measurement” which would apply equally to both psychometric measurement and measurement in physics was the writings of Rasch’s famous countryman, Niels Bohr. (Indeed, Rasch attended lecture courses in mathematics given by the great physicist’s brother.) Bohr argued for all of his professional life that there existed a structural similarity between psychological predicates and the attributes of interest to quantum physicists. Although he never published the details, he believed he had identified an “epistemological argument common to both fields” (Bohr, 1958, p. 27). For Bohr, no psychologist has direct access to mind just as no physicist has direct access to the atom. Both disciplines use descriptive language which was developed to make sense of the world of direct experience, to describe what cannot be available to direct experience. Bohr summarized this common challenge in the question, “How does one use concepts acquired through direct experience of the world to describe features of reality beyond direct experience?”

    Given the central preoccupation of this paper, Bohr’s words are particularly striking: “I want to emphasize that what we have learned in physics arose from a situation where we could not neglect the interaction between the measuring instrument and the object. In psychology, we meet the quite similar situation” (Favrholdt, 1999, p. 203). Also, prominent psychologists echo Bohr’s thinking: “The study of the human mind is so difficult, so caught in the dilemma of being both the object and the agent of its own study, that it cannot limit its inquiries to ways of thinking that grew out of yesterday’s physics” (Bruner, 1990, p. xiii). Given that Bohr never developed his ideas for the epistemological argument common to both fields, what follows also addresses en passant a lacuna in Bohr scholarship.

    If all this sounds fanciful (after all, what possible parallels can be drawn between Rasch’s radionuclide on the point of decaying and an individual on the point of answering a question?) it is instructive to return to Rasch’s (1960, p. 11) claim that “what a human being does seems quite haphazard, none less than radioactive emission.” In fact there are striking parallels between the experimenter’s futile attempts to predict the moment of decay and the psychometrician’s attempts to predict the child’s response to a (hitherto unseen) addition problem such as “68 + 57 = ?”

    If one restricts oneself to all of the facts about the nuclide, the outcome is completely indeterminate. Similarly, Wittgenstein’s celebrated rule-following argument (central to his philosophies of mind, mathematics and language), set out in his Philosophical Investigations, makes clear that if one restricts oneself to the totality of facts (inner and outer) about the child, these facts are in accord with the right answer (68 + 57 = 125) and an infinity of wrong answers. Mathematics will be used for illustration but the reasoning applies to all rule-following. The reader interested in an accessible exposition of this claim is directed to the second chapter of Kripke’s (1982) Wittgenstein on Rules and Private Language. (The reader should come to appreciate the power of the rule-following reasoning without being troubled by Kripke’s questionable take on the so-called skeptical argument.) The author will now attempt the barest outlines of Wittgenstein’s writing on rule-following .

    By their nature, human beings are destined to complete only a finite number of arithmetical problems over a lifetime. The child who is about to answer the question “68 + 57 = ?” for the first time has, of necessity, a finite computational history in respect of addition. Through mathematical reasoning which dates back to Leibniz, this finite number of completed addition problems can be brought under an infinite number of different rules, only one of which is the rule for addition. In short, any answer the child gives to the problem can be demonstrated to be in accord with a rule which generates that answer and all of the answers the child gave to all of the problems he or she has tackled to date. If one had access to the totality of facts about the child’s achievements in arithmetic, one couldn’t use these facts to predict the answer the child will give to the novel problem “68 + 57 = ?” because one can always derive a rule which generates the child’s entire past problem-solving history and any particular answer to “68 + 57 = ?”

    Now what of facts concerned with the contents of the child’s mind? Surely an all-seeing God could peer into the child’s mind and determine which rule was guiding the child’s problem-solving? By substituting the numbers 68 and 57 into the rule, God could predict with certainty the child’s response. Alas, having access to inner facts (about the mind or brain) won’t help because having a rule in mind is neither sufficient nor necessary for responding correctly to mathematical problems. Is having a rule in mind sufficient? Clearly not since all pupils taking GCSE mathematics, for example, have access to the quadratic formula and yet only a fraction of these pupils will provide the correct answer to the examination question requiring the application of that formula. Is having the rule in mind necessary? Once again, clearly not because one can be entirely ignorant of the quadratic formula and yet produce the correct answers to algebraic problems involving quadratics using alternative procedures like “completing the square,” graphical methods, the Newton-Raphson procedure, and so on.

    It is important to be clear what is being said here. If one could identify an addition problem beyond the set of problems Einstein had completed during his lifetime, is the claim that one couldn’t predict with certainty Einstein’s response to that problem? Obviously not. But the correct answer and an infinity of incorrect answers are in keeping with all the facts (inner and outer) about Einstein. When one is restricted to these facts, Einstein’s ability to respond correctly is indeterminate. In summary, before the child answers the question “68 + 57 = ?” his or her ability with respect to this question is indeterminate. The moment he or she answers, the child’s ability is determinate with respect to the question (125 is pronounced correct, and all other answers are deemed incorrect). One might portray this as follows: before responding the child is right and wrong and, at the moment of response, he or she is right or wrong.

    The problem with the Rasch model

    Ability only becomes determinate in context of a measurement; it’s indeterminate before the act of measurement. The conclusion is inescapable – ability is a relational property rather than something intrinsic to the individual, as psychology’s standard measurement model would have it. A definite ability cannot be ascribed to an individual prior to measurement. Ability is a joint property of the individual and the measurement instrument; take away the instrument and ability becomes indeterminate. It is difficult to escape the conclusion that ability (and intelligence, and self-concept, and so on) is a property of the interaction between individual and measuring instrument rather than an intrinsic property of the individual. If psychological constructs were viewed as joint properties of individuals and measuring instruments, then intractable questions such as “what is intelligence?”, “what is memory?” need no longer trouble the discipline.

    What can be concluded in respect of Rasch? It is clear that the Rasch model is no more capable of separating ability from the item used to measure it than was its predecessor, classical test theory. Pick up any textbook on IRT and one finds the same assumption stated again and again in model development: individuals carry a determinate ability with them from moment to moment and measurement involves checking up on that ability. The ideas of Bohr and Wittgenstein can be used to reject this; for them, measurement effects a “jump” from the indeterminate to the determinate, transforming a potentiality to an actuality.

    In simple terms it can be argued that ability has two facets; it is indeterminate before measurement and determinate immediately afterwards. The single description of the standard measurement model is replaced by two mutually exclusive descriptions. Ability is indeterminate before measurement and only determinate with respect to a measurement context. Neither of these descriptions can be dispensed with. The indeterminate and the determinate are mutually exclusive facets of one and the same ability.

    Returning to the child who has been taught to add but hasn’t yet encountered the question “68 + 57 = ?” what can be said of his or her ability with respect to this question? When one ponders ability as a thing-in-itself, it’s tempting to think of it as something inner, something that resides in the child prior to being expressed when the child answers. If ability is to be found anywhere, surely it’s to the unmeasured mind one should look? Isn’t it tempting to think of it as something the child “carries” in his or her mind? When the focus is on ability as a thing-in-itself, it seems the child’s eventual answer to the question is somehow inferior; it’s the mere application of the child’s ability rather than the ability itself.

    The concept of causality in classical physics is replaced by the notion of “complementarity” in quantum mechanics. Complementarity treats pre-measurement indeterminism and the determinate outcome of measurement as non-separable. Whitaker (1996, p. 184) portrays complementarity as “mutual exclusion but joint completion.” One cannot meaningfully separate the pre-measurement facet of ability from its measurement-determined counterpart. The analogue of Bohr’s complementarity is what Wittgensteinians refer to as first-person/third-person asymmetry. The first-person facet of ability (characterised by indeterminism) and the third-person measurement perspective cannot be meaningfully separated. Suter (1989, pp. 152-153) distinguished the first-person/third-person symmetry of Newtonian attributes from the first-person/third-person asymmetry of psychological predicates: “This asymmetry in the use of psychological and mental predicates – between the first-person present-tense and second- and third-person present-tense – we may take as one of the special features of the mental.” Nagel (1986, p. 22) notes: “the conditions of first-person and third-person ascription of an experience are inextricably bound together in a single public concept.”

    This non-separability of first-person and third-person perspectives obviates the need to conclude, with Rasch, that the individual’s response need be “haphazard.” The first-person indeterminism detailed earlier seems to indicate that individuals offer responses entirely at random. After all, the totality of facts is in keeping with an infinity of answers, only one of which is correct. But one need only infer “random variation located within the person” (Borsboom, 2005, p. 55) if one mistakenly treats the first-person facet as separable from the third-person. (The author’s earlier practice of stressing the restriction to the totality of facts about the individual was intended to highlight this taken-for-granted separability.) Lord’s (1980) admonition that item response theorists eschew the “stochastic subject” interpretation for the “repeated sampling” interpretation led IRT practitioners astray by purging entirely the first-person facet from an indivisible whole. One only arrives at conclusions that are “absurd in practice” (p. 227) if one follows Lord (1980) and divorces ability from the item which measures it. Like Rasch, Lord failed to grasp that the within-subject and the between-subject aspects of psychological measurement are profoundly entangled.

    Holland, Lord and the ensemble interpretation as the route out of paradox

    Holland (1990) repeats Lord’s error by eschewing the stochastic subject interpretation for the random sampling interpretation, despite acknowledging “that most users think intuitively about IRT models in terms of stochastic subjects” (p. 584). The stochastic subject rationale traces the probabilities of the Rasch model to randomness in the individual subject:

    Even if we know a person to be very capable, we cannot be sure that he will solve a certain difficult problem, not even a much easier one. There is always a possibility that he fails – he may be tired or his attention is led astray, or some other excuse may be given. And a person of slight ability may hit upon the correct solution to a difficult problem. Furthermore, if the problem is neither “too easy” nor “too difficult” for a certain person, the outcome is quite unpredictable. (Rasch, 1960, p. 73)

    Rasch is proposing what quantum physicists call a “local hidden variables” measurement model. While Wittgenstein argues that ability is indefinite before the act of measurement (an act which effects a” jump” from indefinite to definite), psychometricians in general and Rasch in particular, treat ability as definite before measurement. The local hidden variables of the Rasch model are variables such as examinee fatigue, degree of distraction, and any other influence militating against his or her capacity to provide a correct answer. Rasch is suggesting that if one had complete information concerning the examinee’s ability, his or her level of fatigue, propensity for distraction, and so on, one could predict, in principle, the examinee’s response with a high degree of confidence. It is the absence of variables capable of capturing fatigue, attention, and so on, from the Rasch algorithm, that makes its probabilistic nature inevitable. In this local hidden variable model, probability is being invoked because of the measurer’s ignorance of the effects of fatigue, attention loss, and so on.

    But Bell (1964) proved beyond doubt that local hidden variables models are impossible in quantum measurement. One can avoid the difficulties thrown up by Bell’s celebrated inequalities by treating unmeasured predicates as indefinite (Fuchs, 2011). This would have profound implications for how one conceives of latent variables in the Rasch model. If local hidden variables are ruled out, latent variables could not be assigned investigation-independent values. Ability only takes on a definite value in a measurement context. IRT can no more separate these two entities (ability and the item used to measure it) than could classical test theory. The “random sampling” approach that Holland (1990) recommends is a so-called “ensemble” interpretation. The definitive text on ensembles – Home and Whitaker (1992) – finds ensembles illegitimate because they mistakenly replace “superpositions” by “mixtures” (Whitaker, 2012, p. 279).

    One gets the distinct impression from the IRT literature that the random sampling method is being urged on the field because of embarrassments that lurk in the stochastic subject model. For example Lord (1980, p. 228) refers to the later as “unsuitable”:

    The trouble comes from an unsuitable interpretation of the practical meaning of the item response function … If we try to interpret Pi(A) as the probability that a particular examinee A will answer a particular item i correctly, we are likely to reach absurd conclusions. (Lord, 1980, p. 228)

    Lord (1980) and Holland (1990) both attempt to avoid embarrassment by taking the simple step of ignoring the stochastic subject for the comfort of an ensemble interpretation. Home and Whitaker (1992) close their text with the words: “[W]e see the ensemble interpretation as the “comfortable” option, creating the illusion that all difficulties may be removed by taking one simple step” (p. 311).

    What of the paradox identified earlier?

    It is now possible to address the paradox presented earlier. Here is a restatement: If a large sample of individuals of exactly the same ability respond to the same item, designed to measure that ability, why would 27% get it right and 73% get it wrong? Suppose a large number of individuals answer a question (labelled Q1), and, of those who give the correct answer, 100 individuals, say, are posed a second question (labelled Q2). When these 100 individuals respond to Q2, 27% give the correct answer and 73% respond with the wrong answer. What can be said about the ability of each individual immediately after answering Q1 but before answering Q2? Given the natural tendency to think of ability as an attribute of mind, it seems reasonable to focus on the individual’s ability “between questions” as it were.

    Poised between questions, each individual’s ability with respect to Q1 is determinate; they have answered Q1 correctly moments before. What of their ability with respect to Q2, the question they have yet to encounter? According to the reasoning presented above, all the facts are in keeping with both a correct and an incorrect answer. The individual’s ability relative to Q2 is indeterminate. Quantum mechanics portrays such states as “superpositions” – the individuals all have the same indefinite ability characterised as: “correct with probability 27% and incorrect with probability 73%.” It is easy to see why 100 individuals each with an ability characterised in this way could be portrayed as subsequently producing 27 correct responses and 73 incorrect responses to Q2.

    In this approach the paradox dissolves. All 100 individuals have definite abilities (as measured by Q1), but only 27% go on to answer Q2 correctly. But note the crucial step in the logic required to dissolve the paradox: each individual’s ability is simultaneously determinate with respect to Q1 and indeterminate with respect to Q2. A change in question (from Q1 to Q2) effects a radical change from indeterminate to determinate. It is therefore only meaningful to talk about a definite ability in relation to a measurement context. Ability is a joint property of the individual and the item; pace Rasch they cannot be construed as separable! It follows therefore that the examiner (the person who selects the item) participates in the ability manifest in a response to that item. Pace Rasch measurement in education and psychology is a more dynamic affair than measurement in classical physics. The former is dynamic while the latter is merely a matter of checking up on what’s already there. Because that which is measured is inseparable from the question posed, the measurer participates in what he or she “sees.” Newtonian detachment is as unattainable in psychology and education as it is in quantum theory.


    Returning to the real life consequences of this refutation of latent variable modelling in general and Rasch modelling in particular, one cannot escape the conclusion that the OECD’s claims in respect of its PISA project have scant validity given the central dependence of these claims on the clear separability of ability from the items designed to measure that ability.


    Bell, J.S. (1964). On the Einstein-Podolsky-Rosin paradox. Physics, 1, 195-200.
    Bohr, N. (1929/1987). The philosophical writings of Niels Bohr: Volume 1 – Atomic theory and the description of nature. Woodbridge: Ox Bow Press.
    Bohr, N. (1958/1987). The philosophical writings of Niels Bohr: Volume 2 – Essays 1933 – 1957 on atomic physics and human knowledge. Woodbridge: Ox Bow Press.
    Borsboom, D. (2005). Measuring the mind: conceptual issues in contemporary psychometrics. Cambridge: Cambridge University Press.
    Borsboom, D., Mellenbergh, G.J., & van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110 (2), 203-219.
    Bruner, J.S. (1990). Acts of meaning. Cambridge, MA: Harvard University Press.
    Davies, E.B. (2003). Science in the looking glass. Oxford: Oxford University Press.
    Davies, E.B. (2010). Why beliefs matter. Oxford: Oxford University Press.
    Elliot, C.D., Murray, D., & Pearson, L.S. (1978). The British ability scales. Windsor: National Foundation for Educational Research.
    Ertl, H. (2006). Educational standards and the changing discourse on education: the reception and consequences of the PISA study in Germany. Oxford Review of Education, 32(5), 619-634.
    Fan, X. (1998). Item response theory and classical test theory: an empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58(3), 357-381.
    Favrholdt, D. (Ed.). (1999). Niels Bohr collected works (Volume 10). Amsterdam: Elsevier Science B.V.
    Fuchs, C.A. (2011). Coming of age with quantum information: Notes on a Paulian idea. Cambridge: Cambridge University Press.
    Hacker, P.M.S. (1993). Wittgenstein, mind and meaning – Part 1 Essays. Oxford: Blackwell.
    Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamental of item response theory. Newbury Park, CA: Sage Publications.
    Hark ter, M.R.M. (1990). Beyond the inner and the outer. Dordrecht: Kluwer Academic Publishers.
    Holland, P.W. (1990). On the sampling theory foundations of item response theory models. Psychometrika, 55(4), 577-601.
    Home, D., & Whitaker, M.A.B. (1992). Ensemble interpretation of quantum mechanics. A modern perspective. Physics Reports (Review section of Physics Letters), 210 (4), 223-317.
    Jöreskog, K.G., & Sörbom, D. (1993). LISREL 8 user’s reference guide. Chicago: Scientific Software International.
    Kalckar, J. (Ed.). (1985). Niels Bohr collected works (Volume 6). Amsterdam: Elsevier Science B.V.
    Kripke, S.A. (1982). Wittgenstein on rules and private language. Oxford: Blackwell.
    Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.
    Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88, 355-383.
    Nagel, T. (1986). The view from nowhere. New York: Oxford University Press.
    Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Paedagogiske Institut.
    Rinne, R., & Ozga, J. (2013). The OECD and the global re-regulation of teacher’s work: Knowledge-based regulation tools and teachers in Finland. In T. Seddon & J.S. Levin Eds.), World yearbook of education (pp. 97-116). London: Routledge.
    Rost, J. (2001). The growing family of Rasch models. In A. Boomsma, M.A.J. van Duijn, & T.A.B. Snijders (Eds.), Essays on item response theory (pp. 25-42). New York: Springer.
    Sobel, M.E. (1994). Causal inference in latent variable models. In A. von Eye & C.C. Clogg (Eds.), Latent variable analysis (pp. 3-35). Thousand Oakes: Sage.
    Suter, R. (1989). Interpreting Wittgenstein: A cloud of philosophy, a drop of grammar. Philadelphia: Temple University Press.
    Takayama, K. (2008). The politics of international league tables: PISA in Japan’s achievement crisis debate. Comparative Education, 44(4), 387-407.
    Thorndike, R.L. (1982). Educational measurement: Theory and practice. In D. Spearritt (Ed.), The improvement of measurement in education and psychology: Contributions of latent trait theory (pp. 3-13). Melbourne: Australian Council for Educational Research.
    Whitaker, A. (1996). Einstein, Bohr and the quantum dilemma. Cambridge: Cambridge University Press.
    Whitaker, A. (2012). The new quantum age. Oxford: Oxford University Press.
    Wittgenstein, L. (1953). Philosophical Investigations. G.E.M. Anscombe, & R. Rhees (Eds.), G.E.M. Anscombe (Tr.). Oxford: Blackwell.
    Wittgenstein, L. (1980a). Remarks on the philosophy of psychology Volume 1 (Edited by G.E.M. Anscombe & G.H. von Wright; translated by G.E.M. Anscombe). Oxford: Basil Blackwell.
    Wittgenstein, L. (1980b). Remarks on the philosophy of psychology Volume 2 (Edited by G.H. von Wright & H. Nyman; translated by C.G. Luckhardt & M.A.E. Aue). Oxford: Basil Blackwell.
    Wright, B.D. (1997). A history of social science measurement. Educational Measurement: Issues and Practice, 16(4), 33-52
    Wright, C. (2001). Rails to infinity. Cambridge, MA: Harvard University Press.

    • Staffan says:

      “William Stewart had a copy of Dr Morrison’s paper in June 2012. Neither the TES nor OECD Pisa have issued a response pointing out where is wrong yet you cite William Stewart, a reporter as some sort of authority on OECD PIsa. He is simply a vehicle for their propaganda. If you have a mathematical not statistical response to this paper please present it. Otherwise your article like Andreas Schleicher’s entire flawed project is just another opinion.”

      You seem to have misunderstood what my post is about. It’s not a discussion about statistics per se, but about the reaction from underachieving countries – questioning the entire survey based on answers from principals that may or may not be correct and that have nothing to do with how the children did on the test. And how everyone agrees with one researcher who slams Pisa and give him an award, but without discussing what Schleicher and others think. It seems like sour grapes to me.

      Also, William Stewart is criticizing the Pisa survey so he is hardly a vehicle of OECD propaganda. I mention him as an example of the sour grapes attitude.

      • paceni says:

        Andreas Schleicher is Pisa.
        It is clear that you do not read the Times Educational Supplement or have read the subsequent articles on Pisa written by William Stewart of Tes since July 2013. His latest article was published on November 29th 2013 characterising those who withdrew from Pisa asunderachievers. Remember to read the latest post from paceni on measurement.

    • SP says:

      While the author of the essay is right to point out the logical incohenrance of Rasch model, there’s no clear cut at all on any alternatives, because they, too, suffer the similar fate.

      Even though I agree with the author that fundamentally speaking Rasch model can not claim that it’s a clear progress over the classic model hence can not be completely independent of the items used to measure such abilities, the classic determinstic models however, at least as far as I am aware, offer no decent approach either. The author’s argument in itself also contains ambiguous claims based on some not exactly proven assumptions (e.g. the assumption that Q1 and Q2 must be absolutelly identical, which is hardly the case).

      In short, both sides are perfectly entitled to point finger at each other , as they are doing, and argue from Newton to Einstein back to physics then to psychometric followed by Bohr or Bell… until the cow goes home. Yet both ensemble interpretation and stochastic process they represent suck BIG TIME both in theory and in practice, in emtremely high standard. I say that from the angle of being a quant-fund guy playing around with all these models for years.

      No perfect apporach is available under the sun as the current sciences go!

      Rasch is not good enough? Fine. What model the author sugggests then instead? Name it. I am sure it will be ridiculed just like the same way as Rasch’s by the quantum entanglement camp. Even I can do that without being a mathematician by training. Fortunately, they are about the similar in case of PISA ranking results.

      BTW, If I am forced to choose 1 between stochastic and random sampling for general selections such as PISA , I’d go for the latter any day of a week.

      • Steve Sailer says:

        My vague hunch is that modern Item Response Theory testing, of which the Rasch Model is an example, allows testers to say, much like movie directors of sloppy productions: “We’ll fix it in post.”

        For example, how can the PISA people be sure ahead of time that their Portuguese translations are just as accurate as their Spanish translations?

        Well, they can’t. But, when they see the results come in, they can notice that, say, smart kids in both Brazil and Portugal did relatively badly on question 11, which suggests the translation is ambiguous, so we’ll drop #11 from the scoring in those two countries. But, in the Spanish-speaking countries, this anomaly doesn’t show up in the results, so maybe we’ll count Question 11 for those countries.

        This kind of post-hoc flexibility allows PISA to wring a lot out of their data. On the other hand, it’s also a little scary.

    • SP says:

      Oh, paceni, why don’t you post the same essay to Steve Hsu’s blog and see what he has to say?

  20. paceni says:

    Why Michael Gove should follow India’s lead and
    detach himself from PISA

    Just ahead of the publication of PISA league tables on 3rd December, India has withdrawn from the list of countries which will feature in the tables. The Education Secretary, Michael Gove, on the other hand, seems determined to stick with PISA despite recent concerns about the global league table published in the Times Educational Supplement in July of this year.

    Mr Gove’s Department reiterated its support for PISA in a recently aired Radio 4 programme entitled “PISA – Global Education Tables Tested.” That programme illustrated the dangers inherent in critiquing PISA in statistical terms. Statistical modellers have made life too easy for PISA because they simply accept the PISA interpretation of the construct “ability.” PISA lays claim to measure the relative qualities of education systems around the world, and it is only when the focus moves to measurement that the profound difficulties inherent in Pisa become clear.

    Niels Bohr is ranked among the ten greatest physicists of all time. The father of quantum measurement taught that “unambiguous communication” was the hallmark of measurement in physics. Importantly, Bohr traced measurement in quantum mechanics and measurement in psychology to a common source, which he referred to as “subject/object holism.” The physicist cannot have direct experience of the atom, just as the teacher cannot have direct experience of the child’s mind. Both are forced to describe what is beyond direct experience using the language of everyday experience. Bohr demonstrated that measurement in quantum physics and in psychology share a common inescapable constraint, namely, one cannot communicate unambiguously about measurement in either realm without factoring in the measuring instrument. Wittgenstein’s writings also support this argument.

    The lesson we learn from Bohr is that in all psychological measurement, the entity measured cannot be divorced from the measuring instrument. When this central tenet of measurement is broken, nonsense always ensues. The so-called Rasch model, which produces the PISA ranks, offends against this central measurement principle and therefore the ranks it generates are suspect at best. (The Rasch model is a member of a family of models which all treat what is measured as independent of the measuring tool. Given that these models underpin both computer adaptive testing and the navigation systems of the newly developed MOOCS of higher education, the implications of Bohr’s thinking are clearly far-reaching.)

    The following simple illustration will help make Bohr’s point. Suppose Einstein and a GCSE pupil both produce a perfect score on a GCSE paper. Surely to claim that the pupil has the same mathematical ability as Einstein is to communicate ambiguously? However, unambiguous communication can be restored if we simply take account of the measuring instrument and say, “Einstein and the pupil have the same mathematical ability relative to this particular GCSE paper.” Mathematical ability, indeed any ability, is not an intrinsic property of the individual; rather, it’s a joint property of the individual and the measuring instrument.

    In short, ability isn’t a property of the person being measured; it’s a property of the interaction of the person with the measuring instrument. One is concerned with the between rather than the within. It’s hard to imagine a more stark contrast. Statistical modelling critiques of PISA, however, have missed this conceptual error entirely. My bookshelves are groaning with books concerned with the wide-ranging debates around the notion of intelligence. All of these debates dissolve away when one eschews the twin notions that intelligence is either a property of the person or is an ensemble property, for the simple definition that intelligence is a property of the interaction between person and intelligence test. To say “John has an IQ of 104” is to communicate ambiguously. Furthermore, clarification of the nature of measurement in psychology and education has implications for the UK’s approach to school inspection and serves as a challenge for those who reject, out of hand, “teaching to the test.”

    In closing, when the PISA critique shifts from statistical modelling to measurement, the profound nature of PISA’s error becomes clear. I trust this essay will be a comfort to those responsible for removing India from PISA, and hope it will prompt a similar decision in the UK.

    Dr Hugh Morrison
    Belfast 9

    • SP says:

      “The following simple illustration will help make Bohr’s point. Suppose Einstein and a GCSE pupil both produce a perfect score on a GCSE paper. Surely to claim that the pupil has the same mathematical ability as Einstein is to communicate ambiguously? However, unambiguous communication can be restored if we simply take account of the measuring instrument and say, “Einstein and the pupil have the same mathematical ability relative to this particular GCSE paper.” Mathematical ability, indeed any ability, is not an intrinsic property of the individual; rather, it’s a joint property of the individual and the measuring instrument. ” (Dr Huge Morrison)

      Let’s take a look at this “simple” illustration:

      The above statement automatically leads to that depending on different types/qualities/or timing etc external properties, a person’s intrinsic property may vary?

      Therefore after some steps of linear logic from there, we’ll eventually have a claim that a person’s intrinsic property can not be measured at all?

      In light of this, Dr Morrison’s statement that ” “Einstein and the pupil have the same mathematical ability relative to this particular GCSE paper.” is not entirely true, even it were said by Bohr himself. Why so? because what if the day when Einstein took the exam it was a 30-degree hot humid summer day, while the GCSE stud took it in a dry Winter day, and he ate an extra portion of his usual breakfast on top of it…and so on so forth. As long as you put quantum entanglement into the statement, then there’ll be no end and no clear cut even in your own logic. Isn’t this also a very common logical fallacy from Dr Morrison?

      To go 1 step further, in that what is 68+57 example, what makes Dr Morrison to assume that Q1 and Q2 are of the “same” quality questions that should have been answered identically correctly in the first place? What of they are not as “same” as it “looks” like to him, from the point of view of either the intrinsic quality of the Q1 and Q2, or the possiblely different sets of “external properties” that each of 100 participant experienced prior to answering, during answering Q1 and Q2, and in the timespan between answering Q1 and Q2? Obviously Dr Morrison has no control and pre-knowledge on all of those at all. Why he is so sure that the scores are only relative to the questions themselves, but no others?

      Furthermore, D Morrison assume that “if one had access to the totality of facts about the child’s achievements in arithmetic, one couldn’t use these facts to predict the answer the child will give to the novel problem “68 + 57 = ?” This assumption contains a multitude of degrees of problems:

      1. he doesn’t have a definition of what is “totality”, how to measure that it’s “totality”
      2. even he has it, the child’s achievement in arithmetic is , in his own logic, can not be verified purely due to unverifiable and unknown external properties beyond each of the tests he took previously on the test dates, and/or prior to the test dates.
      3. he assume, well, again, that “one counldn’t predict the answer the child will give to the novel problem “68 + 57 =?”. This is actually the entire purpose of all exercises here. What one can’t predict today with 1 model doesn’t mean one can not do it with another model. Even though we can not predict it today with all models within our knowledge, it doesn’t mean one or all of the models are completely wrong. It doesn’t mean either that we can not predict it in the future when the current knowledge is expanded.

      I can go on…

      The end result would appear to be the same thing I posted earlier: Dr Morrison is right to point out some imperfections of Rasch, yet the alternatives are not fallacy-free either.

      So, it seems that this “simple” illustration is not that “simple” at all, doesn’t it?

      • paceni says:

        So you accept that PISA’s use of Rasch is flawed and offer no counter to the useless value of their rankings based upon “plausible values” yet continue to refer to OECD Pisa as if their rankings are reliable. Are you joking?

      • SP says:

        Paceni, Your entries confuse me. Are you Dr Huge Morrison yourself? or you’re just Paceni quoting Dr Morrison?

        You asked if am I joking? No, Mr. I don’t know in which field Dr Morrison specialises in. Nonetheless even though I don’t have Dr in Maths, I do have an advanced post grad degree . So I happened to be a Dr, too, 🙂 and a darn good one in my field I can assure you. Have you actually read for a sec on what I generally responded to Dr Morrison?

        Theoretically speaking, any standard IQ test would be just as flawed as Rasch’s PISA ranking. However, that doesn’t mean that the general world IQ map we have should be thrown into a dustbin. Flawed doesn’t necessarily mean largely wrong, or completely wrong, or misleading results beyond tolerance level.

        My personal arguments against Rasch used in PISA are

        i. It overstates its quality by claiming it is a huge progress over the classic models. It is not. And
        ii. PISA discriminated Zhejiang Province ( likely some other provinces, too) on sampling in 2009. Yet Schleicher could argue back that since they are not official and undisclosed so he could play around in whatever ways he likes. Fair enough.

        The PISA ranking using Rasch, flawed as it is ( as Morrison rightly and easily points out) and as any model will be, is actually quite accurate in terms of results (since it correlates well with general IQ), even though it doesn’t have the pinpoint precision. No model has.

        Theoretically and practically speaking, any alternative model that Dr Morrison could possiblely propose to the table would do no noticeablely better to drastically affect the results of PISA ranking, as far as I am aware. FYI, I am a quant hedge fund manager and I’ve been playing with many kinds of models for most of my career.

    • SP says:

      “Just ahead of the publication of PISA league tables on 3rd December, India has withdrawn from the list of countries which will feature in the tables”.

      No, India’s embarrassing chicken-out happened before the 2012 test. It had nothing to do with any implied “fury” on the “unfairness” of PISA tests. Go check Indiatimes. Both Steve Tailer and Anatoly Karlin had an entry on it as well.

  21. Staffan says:

    “Andreas Schleicher is Pisa.
    It is clear that you do not read the Times Educational Supplement or have read the subsequent articles on Pisa written by William Stewart of Tes since July 2013. His latest article was published on November 29th 2013 characterising those who withdrew from Pisa asunderachievers. Remember to read the latest post from paceni on measurement.”

    I quoted Stewart as an example of the sour grapes attitude and it illustrates this attitude well. Him changing his mind (if that is what happened) doesn’t retroactively invalidate the quote. That’s like saying Bill Clinton never had sex with that woman because now he isn’t.

    (But I would be interested in that article – not that it changes anything I said here – so a link would be great. I searched “pisa” on TES and sorted them by most recent and the latest by Stewart was from the 14th.)

  22. SP says:

    Some info of incoming 2012 PISA results are trailed through UK’s shadow education minister Tristram Hunt in today’s Sunday Press:

    — UK did badly: about the average

    — New Zeeland went down considerablely in the rankings

    I wonder what’s the ethnicity composition of New Zeeland students in 2012.

    Furthermore, a Guardians colomnist let loose today and worte a piece in CIF openly accusing East Asian countries such as China and Korea (i.e. he conveniently left Japan alone somwhow) of “gaming” and “cheating” on PISA scores.

    • Staffan says:

      I can’t find any outright accusations in the article, although we may be talking about different articles,

      But it’s full of the usual stuff about how dubious Pisa is – in spite of correlating to similar tests and reflecting the economic wealth and/or growth of the countries in question. The author also contradicts himself in saying Asian kids get good results because they are being pushed too hard. So then the scores reflect some kind of reality after all?

      • SP says:

        It’s the same article.

        “Problems also arise from different cultures, and different attitudes to education in general and tests in particular. For example, French students won’t guess the answers to multiple-choice questions; they decline to answer, though a guess gives at least a 25% chance of being right. East Asian countries always do well; critics argue that’s not because their schools are brilliant but because of deference to authority and an anxiety for success that leads parents to seek intensive out-of-school tutoring. There’s scope for gaming, even cheating, since the tests are supervised by research institutes in each country. Some countries are suspected of excluding their weaker performers.”

        Peter Wilby doesn’t need to put a clear sticker that “East Asians are cheaters” as the title though. Let’s see the key phrases here:

        “Problems also arise from…different attitudes…tests in particular, for example…East Asian countries always do well; critics argue…there’s scope for gaming, even cheating…suspected of excluding their weaker performers”

        1. I dunno why that “East Asian countries always do well” can be categorised automatically as a “problem” in the lead statement of Peter Wilby. Anyway, it’s clear that when East Asians always do well, it is then “ problems also arise” as he concludes.

        2. The typical British courtesy soon followed is to have added “critics argue”. However, in its immediate context obviously they are just explanations from Peter Wilby to support his argument why he thinks it is a problem, by quoting critics.

        3. Finland, Canada, etc always do pretty well, too. Yet Peter Wilby’s view is clear and targeted only at the East Asians, reflected in the fact that whenever the East Asians are mentioned, “problems arise”, while whenever Finland, Canada or New Zeeland are mentioned, no problem whatsoever.

        I mean wow, if this is not a blatant open accusation, I don’t know what is.

      • paceni says:

        Has anyone on this thread actually read the paper critical of OECD Pisa and if they have considered the argument, do they have an informed response?
        Pisa is flawed and useless until proven otherwise. No amount of rhetoric will dissolve the problem for any jurisdiction or country absent a proven rebutall of the attack on the Rasch model used by OECD

      • Staffan says:

        Like I said earlier, this thread is mainly about the reactions from countries that are under-performing on the test, not about statistics per se. The sour grapes attitude is telling on how the West is coping with societal problems in general, and it’s not based on expertise since their are statisticians on both sides of the fence.

        But anyone can see how this test correlates to similar measures, IQ tests and economic wealth and growth. So it’s hardly a useless test. A more fair critique would be to say that it is imperfect or derivative.

      • Matt says:

        Ethnicity decomposed scores for PISA 2009 New Zealand groups are here – – maths – science – reading

        No overall Asian group advantage in this country, but a wider distribution.

        Not decomposed by ethnicity, but Asian ethnicity in NZ breaks down to around 47% Chinese, 33% Indian, 10% Korean, 4% Japanese and 6% Filipino (and negligible smaller groups).

        You might argue Indians are dragging down the NZ Asian score, but seem to be a high skilled workstream, judging by foreign qualifications (and most media reports about them). Anecdotally, Indians don’t show much regression to a common Indian mean due to caste structures. People could argue Indian qualifications aren’t worth much, but think it would be bad practice to argue that they cheat.

        Indian performance in the British achievement tests is intermediate between White British and Chinese. It seems likely to me to be similar in New Zealand. (lots of data on this here –, The migrant stream to New Zealand may be similar.

        The TIMSS does in fact have an Asian New Zealand Maths advantage but more noticeable there is the very wide distribution – Again the wide distribution may be due to ethnic structure, but it seems more likely to me to be due to migrants not being an even sample of their home nation, even though they tend towards similar means.

      • Staffan says:

        The fact that Indian children in the UK perform better than White British children indicates that it’s an elite, given India’s national IQ. But is the same Indian elite moving to New Zealand too? It seems New Zealand counts a lot of people with partly African and Middle East origin as Indian. Could still be an elite but the groups look different,

        It would have been really interesting to see stats on each specific group.

  23. SP says:

    The real “problem arised” for Wilby is that why the East Asians in the US, Canada, New Zeeland, Australia, Netherlands, the UK, Belgium…etc also game and cheat PISA, lifting the average scores of these countries as a by-product, because obviously they are always doing pretty well there, too. And are the govenements of these countries are involved in facilitating such a gaming and cheating behaviours of the East Asians there as well? If not, why not?

    And I wonder what’s the comment from Wilby on the same “problem arised” particularly in Singapore, Hong Kong, Taiwan, and Japan …

  24. SP says:

    Seems that there’s a new global campaign going on to discredit PISA. Is that becoming political? China vs.US? The latest by Valerie Strass :

  25. Gottlieb says:

    Matt , I think I had already made ​​it clear that he did not understand his theory . At the most , I can see two probabilities . First , we know that in the U.S. at least , the states where the introverted personality predominates , ie , the green coast (California , Oregon and Washington ) and northeast ( Yankedoom and New England ) , it is clear that liberalism not only socio- cultural is prevalent are also just as their areas of export to other American regions . While (somewhat to my surprise ) the states of Red America are those where extroverts dominate.
    Introverts tend to be represented for example by philosophers , scientists , poets , short, creative people and as we all know , these bio – cultural phenotypes are strongly prone to abstract thinking as well as severe existential crisis , especially in contrast with the normative amenity that religion and homogeneous community offers .Highly intelligent people can be so, tendencies towards types of thoughts that are totally logical for the conceptual – abstract logic , but do not make sense for the majority of ordinary people really.
    The ideas that the government is corrupt , mankind is miserable or America is a vampire who has claimed millions of lives , are typical of introverts and say more , make sense , but not in a practical sense .
    Therefore , we will find , usually from very smart people, this whole framework of behavior , which in many cases can result in a form , do not know if the term parasite would be more correct , but complacency or despair. From there , we provided them, careerists geeks , even the neo -hippies .
    The idea of parasitism is not pleasant to me to refer to this situation , despite its logic . But for example, seeing the behavior of artistic media class , who are predominantly outgoing and communicative , I see much more parasitism than the need for a larger liberal state.
    I think liberalism is much liked by introverts than collectivism , precisely because the latter requires a large stake in the community , extensive socialization and suppression of individuality, the famous sacrifice for the group.
    In this case , struggling with a distinct cultural and biological reality probably introverts among Americans and Chinese for example. Probably more for cultural reasons than genetic , Chinese introverts , who represent the majority of the population , chosen either by coercion or force group , a collectivist society . In return introverts Americans and Canadians probably opt for individualistic societies . Taking into account that the final stage of socialism is precisely the ideological collectivism , then, much less for reasons of direct causality and more by indirect causality or coincidence that introverts tend , I think, to choose liberalism as culture and politics . They choose more as opposed to cultural collectivism than necessarily because they like absolutely liberalism . Well, that is what I think .
    To end on introversion in East and West , I think , not only for cultural reasons but also for biological reasons that Eastern introverts should be relatively different from their Western namesakes . I believe there are a set of fixed to personality and cognitive styles that were selected , then at the very beginning of the formation of the two racial groups genes . These fixed genes , not necessarily by natural or sexual selection , but also by the size of their respective populations , are responsible for behavioral changes that make an introvert Chinese relatively different from a white American and introverted city of Portland.
    If much of the eastern population is actually introverted , then there was a distinct combination of traits between them compared to the eastern and results in collectivist societies . It is also important to question how far the genetic similarity of the landscape will interfere with formation of cultures . I mean, if you ‘re inserted into a bio – cultural group and therefore has traits similar personalities , so it may be that , by a simple coincidence of clustering of phenotypically similar individuals can play a monocultural society.
    The modern liberal societies seem to me even with Eastern societies in many ways . I remember being sent to the HBD Chick , video on bio- cultural differences between Eastern and Western perspectives and among the Orientals, the philosophy of the interrelationship between people and events was defined as the primary trace of them . For me , it’s clear that this philosophy is the same as that employed by liberals , the idea that environmental events have the ability to cause changes in people ( poverty breeds violence ) and that everything is directly related to one another , the idea of interdependence . This eastern philosophy of centuries may have resulted precisely in collectivist and egalitarian societies , we see among the Orientals. And that could explain why overall, the per capita income of these companies is lower than in the West . The Orientals are currently viewed as competitive , but it is interesting to think that the amount of historical conflicts between them may have been lower than those recorded in Europe , which has always been in a constant state of war . The size of the population that Chinese society mainly pulled , also denotes a probable lower incidence of conflict that may eventually result in genocide .
    Anyway , thoughts are jumbled , sorry for the huge comment.

  26. Staffan says:


    Forgive me for missing that in Wilby’s article. I have the flu as my excuse, not a bias againt East Asia : ) Although it isn’t outright, his insinuations are much worse than my sleep deprived brain registered last night.

    Judging by Strauss’ article, the sour grapes attitude is strong in America too. Her experts main objection is that results should be adjusted for social background. Although I agree with you that there may be some environmental factor in this, such an adjustment would mainly be a matter of adjusting for differences in intelligence. Like giving everybody who scores 85 on an IQ test an extra 15 points. The old Blank Slate.

    • SP says:

      It’s oke Staffan.

      Some evironmental factors I meant were only for the similar IQ bracket, in order to level the playing fields. e.g. you and I both have the same IQ, but if you’re from upper middle class in a city while I am dirt poor, so is my school alongwith backward teaching materials , and I have to work on the fields growing vegetables everyday after classes to help my family( as many poor rural Chinese kids do), then I find it logical that you could score, on average, severel points more than I in tests. Or put it into another way, then I become comparatively disvantaged therefore should be “compensated” for several points (only) in tests compared to you only, to make it fair.

      Down with flu? I hope you get better soon! Cheers!

  27. SP says:

    I wonder what’s the ethnicty compostions of the students in Canada, Australia and New Zeeland (perhaps also Sweden) in 2012 vs 2009. Guess the impact of the Indians was larger than I expected…

    Vietnam has bested Australia in maths! Surprise? No. Northern Vietnam has considerable amount of mingled Chinese genes due to historic reasons. Vietnam also scored pretty decent in IMO.

    If Shanghai scored 613 at Maths, I’d expect Jiangsu Province and Zhejiang Province scoring between 630 to 650 range (that’s the differences amongst them in Gaokao), if PISA uses the same sampling method in these two provinces as the rest.

    Nonethelss, the US is still sleeping at the moment. I am expecting a tsunami of sour grapes from the US media to discredit PISA in the coming hours, days and weeks…:-)

    • Staffan says:

      The US is a mix of sour grapes and genuine concern it seems. And a weird fear of China, but I guess it’s their competitive nature. It would be great if the West could realize their own immigration policy is the enemy, not China or other East Asian countries.

      Sweden fell dramatically but living in this country I’m not that surprised. A relaxed school policy and plenty of recent immigrants from the Middle East will do that.

      The high-ranking countries of the Anglosphere have also been demoted by Chinese and other East Asian populations, although less dramatically.

      Vietnam, as you mentioned, is the best low IQ nation, no doubt benefitting from improved conditions in recent time. On a similar IQ level Ireland continues to do pretty well. Ireland has always been a bit of a mystery in this regard. Perhaps they are introverted?

      UK and Denmark held up surprisingly well. Perhaps their immigration has slowed down.

      • SP says:

        Vietnam is not a low IQ country in global standard. It’s avg IQ is about 95 with the northern part even higher. So everything else being equal, it should be competitive with Australia (100 whites majority + 105 East Asians+ 85 Indians/south asians depending on the mix), bearing in mind that the avg income and education spending in Vietnam are only a tiny fraction of Aus.

        On Sweden, UK and Demark, perhaps for 2012 vintage Sweden had more mid age-family immigrants than the latter two, where the immigrants at the time were more young people/singles who didn’t have 15-yr-old kids, yet.

      • Staffan says:

        I’m sorry, I didn’t mean low IQ globally, just low in comparison to how well they performed on the test.

        Sweden and Norway both dropped dramatically, especially Sweden. I don’t have any stats on the demographics of the immigrants but it’s possible that you’re right. But I’m guessing they have tried to restrict their immigration. The Swedish government recently declared that everyone from Syria – no exceptions – will be granted asylum. We have the intelligence but judgment is a different thing altogether.

  28. Staffan says:

    As I predicted small countries form Eastern Europe continue to do well, except for Croatia which dissapointed.

    Another finding is the America has been dropping a lot. Again, most likely immigration. I’m betting California did really bad.

    Just looked at the Swedish media. Some blame the test, some the government. Not a single word about immigration, as if a person with an IQ of 85 could be transmogrified by just crossing the border.

  29. SP says:

    Staffen, the scores of China’s Zhejiang province were just leaked!

    If standalone, Zhejiang Provnce of China would be world’s new #1 so far (still waiting for scores of other 10 provinces of China. Jiangsu and Shangdong provinces have a good chance beating Zhejiang according to Gaokao record).

    Maths: Shanghai 613; Zhejiang: 623

    Reading: Shanghai 570; Zhejiang: 570

    Science: Shanghai 580, Zhejiang: 582

    Zhejiang Province: population 54 million. GDP about 10,000 USD per cap, a lot poorer than Shanghai.

    However, 2 PISA discriminations against Zhejiang’s scores:

    A, just like 2009, 80% of Zhejiang’s 2012 scores have been taken out of its poor rural regions (Zhejiang’s urbanisation rate is 62% however).

    B, it was just confirmed that all elite schools of Zhejiang were excluded from PISA selection.

    Similar to what I predicted, If PISA had randomly selectedly students from Zhejiang ( instead of 80% rural) and from all its schools (both bad, regular, and elites) like it did with all other non-Chinese countries/regions, Zhejiang’s Maths scores could have been in the ballpark of 650 to be a bit conservative, and Zhejiang’s overall scores would have made Shanghai look like an amatuer – as it always does in China’s Gaokao.

    • Staffan says:


      You have so much interesting information I’m beginning to think you should start a blog : )

      I wonder if the Chinese government wanted to keep a lid on this because it indicates the importance of genes. The government can’t take credit for the DNA of their people and the Pisa folks might play along because those results diminish the survey as an indicator of good school policy.

      • SP says:

        Thanks Stafeen, but blogging is not my thing, too much time involved…

        Zhejiang has been one of the most elite areas of China traditionally, alongwith Jiangsu and Shangdong regions.

        What the West generally assumes is that China’s inland poorer (compared to the East coast) provinces must score much lower than Shanghai, because it fits right into the Western liberal dogma that low scores are mainly due to poorer environment. False! China’s provincial PISA scores refute such a nonsense. Poorer kids score even better! … now I guess that this is just another reason that give OECD PISA organisers a nightmare because it’s non-PC. 🙂

        Shanghai’s Maths and Sciences scores of its middle schools and high schools rank usually average or slightly above average within China, even though Shanghai people is one of the elites (i.e.I’ve explained the reasons in my previous posts here). If Shanghai scores #1 in PISA, you bet that usually there’re at least 7 or 8 inland provinces (having much larger population each) beat Shanghai, with some by a considerable margin.

  30. SP says:

    Re-post of my comment left in Steve Sailer’s blog:

    Below is 2012 score scaling of Chinese national Gaokao exam. Since almost every student get tested, it is far more accurate than the picture PISA paints.

    A full score of 100 degree in University Entrance test- Shanghai students, for example, got 67.5 to enter a university, ranking only 17th of 33 provinces within China in the important Science and Maths.

    Note that Gaokao separates into Science and Liberal Arts 2 big categories. Both test maths, but maths tested in Science is far more difficult.

    The West thinks Shanghai is the best of China? ROFL. Have fun!

    For Science(as full score=100)

    1.Guangdong Province 76.5 (a big surprise that the Cantonese did exceptional well last year)
    2.Zhejiang Province 76.2
    3.Sichuan Province 74.9
    4.Shandong Province 73.9
    5.Beijing 73.3
    6.Hebei Province 71.7
    7.Liaoning Province 71.7

    8.Jilin Province 71.3 (with a sizeable ethnic Korean population)

    9.Jiangsu Province 70.4
    10.Hubei Province 70.3
    11. Heilongjiang Province 70.3
    12.Tianjin 69.5
    13.Chongqing 69.3
    14.Jiangxi Province 68.9
    15.Guangxi (Minorities + Han)region68.0
    16.Hainan Province 67.6

    17.Shanghai 67.5

    18.Henan Province 67.3

    19.Fujian Province 66.8 (a proxy for Taiwan, and Singapore. Shanghai beats them both in PISA. Surprise?)

    20.Hunan Province 66.0
    21.Yunnan (Minority region)66.0
    22.Shanxi Province 65.7
    23.Anhui Province 65.3
    24.Gansu Province 65.2 (sparsely populated minority region)
    25.Shanxi Province 64.7
    26.Inner Mongolia 64.3
    27.Tibet Autonomous Region 62.7
    28.Ningxia (Muslim Region)60.7
    29.Guizhou Province(heavily minority region)59.9
    30.Xinjiang (Muslim region)59.1
    31.Qinghai (sparsely populated. largely Tibetans)51.1

    For Liberal arts(as full score=100)

    1.Guangdong 79.2 (a very big surprise as well)
    2,Zhejiang 76.4
    3.Shandong 76.0
    4.Sichuan 75.6
    5.Hebei 74.8

    6.Shanghai 74.7

    7.Hunan 74.3
    8.Chongqing 74.1
    9.Hainan 74.1
    10.Liaoning 73.9
    11.Beijing 73.2
    12.Guangxi 72.1
    13.Anhui 72.0
    14.Shanxi 72.0
    15.Tianjin 71.1
    16.Jiangxi 70.9
    17.Hubei 70.8
    18.Guizhou 69.6
    19.Yunnan 69.3
    20.Henan 69.2
    21.Fujian 68.4
    22.Jiangsu 68.3
    23.Jilin 68.0
    24.Shanxi 67.6
    25.Heilongjiang 67.2
    26.Gansu 67.1 (minority region)
    27.Ningxia 64.5 (muslim region)
    28.Tibet 64.0 (minority region)
    29.Inner Mongolia 63.2 (minority region)
    30.Xinjiang 61.3 (muslim region)
    31.Qinghai 58.0 (minority region)

    • Staffan says:


      It seems to correlate fairly well with the Slitty Eye’s IQ data. It’s interesting to see the sharp contrast between the neighboring provinces of Gansu and Qinghai. Perhaps the Uyghur presence in Qinghai…

      • SP says:

        Gansu Province is in remote western China right next to the deserts. Nothing one can do there, except some remote farming counties around. It lies at the western border of the Great Wall. Its population are Northern Han + a considerable amount of Hui Chinese Muslims ( a cross btw Chinese and historical Persian/Arabian merchants on the Silk Road). So Slitty Eye’s map is wrong right here. The province is generally considered having lower average IQ instead of high. Its capital city is Lanzhou, the HQ of 1 of 7 China’s military areas. The most famous dish of Lanzhou is called “Beef Lamen”- Beef Noodles.

        There’s an American guy teaching English in a university there loading up many videos of Lanzhou on youtube recently, also including Beef noodles section. Quite a fun. You can go check it out the area:

        No, there’re no Uyghurs in Qinghai. Qinghai basically can be called “Tibet 2.0”., but with lower attitudes hence more Han Chinese living there than Tibet.

        Uyghurs live exclusively in the Southern part of Xinjiang Autonomous Region (also called Xinjiang, in Chinese it means “New Territory”, named by PR of China in 1949 to show some face value “Communist solidarity” with Uyghers). However, old Imperial China since Han Dynasty 2,000 years ago has called this region its province, named as “Xi Yu”- meaning Western Border).

        Since Xinjiang is next to Stanland, for its own geo-political goals, CIA-inspired Western media has long been drumming up “Uyghur Independence” line to destabilise China from the Western border, accompanied by false propaganda that Uyghurs are the indigenous people of Xjijiang while Han Chinese are new immigrants there trying to suppress them. LOL. Regular Joes in the streets in the West have no clue but believe these lies. The truth is just the opposite, can you believe it?

        Xijiang has always been separated into 2 parts historically: the Northern part and Southern part. Since Han Dynasty (about 2,000 years ago) the Han Chinese have already settled into the Northern part to protect the Silk Road – this part has always been inside imperial China’s national map since then for > 2,000 years.

        The indigenous tribe of the Southern part was killed off by the Mongol invasion 500 years ago. Uyghurs were first brought into the Southern part of Xinjiang from Mongolia about 400 years ago after the collapse of the Mongol Empire. They are in fact relatively new immigrants to the region compared to the Han Chinese, who, with 1,500 year more hisotyr on their belt, are relatively speaking one of the indigenous people there!

        After Uyghurs moved into the Southen Xinjiang they have multiplied their population much faster than the Han Chinese in the Northern part – the same legendary story you know everywhere from Germany to Sweden today… CNN and BBC are just lying to the teeth calling it opposite, the same line of calling the Swedes new immigrants to Scandinavia but calling Muslim immigrants in Sweden its indigenous people ,just because perhaps 50 years down the line, Muslims are more numerous than Swedes in Sweden. Total madness.

        Back to Xinjiang Gaokao results, as you can see now, it’s always at the bottom of the ranking, because Xinjiang’s score = (scores of Han Chinese at the Northern part + scores of numerically more Uyghurs at the Southern part) / 2.

  31. Staffan says:

    Thanks for clarifying, although the significant gap between Gansu and Qinghai suggests that there is a sharp difference in IQ between the two neighbors. The difference between Zhejiang and Shanghai is also pretty hefty. You know any studies of correlation between this Gaokao and IQ scores? Perhaps the former test relies more on conscientiousness than intelligence. This wouldn’t be all that surprising given how East Asian countries put more emphasis on rote learning.

    Regarding the Uyghurs, I got Qinghai and Xinjian mixed up for some reason.

    As for the politicized media, people here love to talk about how the Chinese government is manipulative – and I’m sure they are – but our own media call Gypsies “Romanians”, “Travellers” or even “Campers.” Anything but the truth. Like the Bible says, it’s easier to see the speck of dust in your brother’s eye than the log of wood in your own.

  32. SP says:

    You’re right the difference btw Gansu and Qinghai is big: the former is primarily Chinese (Hui) Muslim while the latter is largely Tibetan.

    Zhejiang-Shanghai difference, I think, is more due to differnet environement with different opportunities and mindsets. Gaokao is the main mean for Zhejiang rural kids to get a better job/life and they study hard for that, while their genetical counterparts in Shanghai are on average more spoiled brats due to big-apple-more-other-chances environment, e.g. Bill Gates didn’t fancy study hard at all, in fact he dropped off, because he knew that as even bigger “Shanghai”, the US could offer him plenty of other means and chances to get rich. Shanghai opens a big world full of alternatives for those city kids, so relatively they don’t care that much about Gaokao or PISA .

    I don’t have any Gaokao-IQ stats. I think that the correlation should be quite high like any other standard exams such as SAT. At the end of the day, even good rote learning requires high IQ since it at least directly tests both long-term and short-term memory which are a part of IQ, no? Higher IQ could also offer finding better tricks, better short-cuts etc. during doing rote learning. So even assume that Gapkao is entirely rote-learning, if Gaokao is the only way out for a high IQ kid and a low IQ kid, I bet that the high IQ kid will do better, on average, in Gaokao hence rote learning., perhaps even with less time used (higher efficiency). In reality Gaokao Science and Maths, for instance, are quite similar to PISA, but 10X harder.

  33. SP says:

    The newest:

    A Zhejiang newspaper says that if standalone Zhejiang would outscore Shanghai and get No 2 in PISA ranking.

    So another participating Chinese province in 2012 beats even Zhejiang to No 1.

    It is highly likely that that province is Jiangsu. Detail scores are unknown yet.

  34. EC says:

    Very interesting to see this discussion about PISA that looks at it from several angles that I hadn’t given much thought to (IQ, immigration, etc.). For what it’s worth, the big factor that usually goes undiscussed in articles about PISA results, especially in the US, is poverty/inequality. PISA breaks its scores down in interesting ways, and if you look at the scores coming out of US schools with low poverty rates (using FRPL as a proxy), those scores are really quite high, while the scores of students in schools with lots of poor kids are quite lousy. My default assumption has been that if the US could significantly decrease its poverty rate, it would significantly increase its PISA ranking.

    • Staffan says:

      Keep in mind that Pisa is very similar to an IQ test, although it must never be called that out loud : ) Anyway, these kind of test usually have an heritability of around 0.75 and aren’t much affected by environmental influence. But given what poverty looks like in some parts of America it maybe a factor independent of inherited intelligence. At least, I wouldn’t rule it out.

  35. […] See on […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: