Hidden Variables: Part III

What does data mean?  Well the rules have changed. OFSTED does not expect a standard method of reporting on pupil progress in all schools and the infamous key stage levels have been replaced with… well, for many schools, the old KS3 levels, because it’s too hard to come up with better measures. OFSTED does however use RAISE online data to track pupil progress from KS2 to GCSE in secondary schools, and this in turn can raise problems for data based management within schools if the implications are not properly thought through.

The RAISE data deals in statistical averages and expectations across the country. Across the thousands of children in a cohort we expect all of those who achieved level 4 in English at KS2 to achieve C or better at GCSE and of those who achieved level 5 or above to achieve B or better at GCSE. Across 700,000 of children, the expectations are fulfilled by a substantial majority. Clearly it is possible to identify those schools whose candidates rise above and those whose candidates fall below the expectation. From this we find preliminary grounds for judgment about the quality of education provided in each institution.

The problem is that a story that may be statistically or probabilistically true for 700,000 people has a decreasing chance of being so true as the numbers considered diminish through 100,000 to 200 to 30 and to 1.   This is because statistics do not immediately tell us anything about causes.   There is no direct causal link between scoring 4 at KS2 and C at GCSE. There will be multiple factors contributing to an individual educational result, some of which (like ‘quality of teaching’) will be common to all individuals, some of which will be unique to each individual, some common to a smaller group of individuals.   When numbers are very large, we assume that the unique and limited causes pushing candidates up or down cancel each other out, allowing us to tell a story around the most obvious common cause – the quality of teaching.  The story might run like this:

“Nationwide, the majority of students achieved their expected grade or better. This shows that for the majority of students, the quality of teaching that they received was at least reasonable.   For some the quality of teaching was inadequate.”

Such statements have a reasonable probability of being true.

We can now start using the national statistic to adapt this narrative to, say, regions of 100,000. We may find that some regions are below the national average and some are above. We can tell the story that in some regions the quality of teaching was less adequate than in others. Already we might start to have concerns that other, local causalities may be starting to have an impact.   This can be partially balanced out by looking at schools within a region.   Again we work by analogy and observe so many schools that are above the national average and so many below.   But by this stage, other very local factors (like school location, neighbourhood and parent profile) come into play. By now the probability of ‘the teaching was inadequate’ being the whole truth or even the most useful bit of the truth is already reduced. The only self-evident truth is that learning has not matched statistical expectations.

When we start looking at individual classes and individual students, we move into the area of real histories, personal development, the accidents of life and the struggles of individual teachers to channel the raw adolescent energy in their care in productive directions.   Clearly the most likely explanation for things going wrong with a class will have something to do with the teacher, but there will often be deeper causes that have to do with the support systems in the school and the character and histories of the individuals in the group. Even more so when evaluating why any one individual falls below the line.

So the big statistical story is useful as a tool for raising the question ‘why’, and this is well and good, so long as it does not already presume a simple answer. The investigator must include in her examination the local and individual factors that are ironed out in the national statistic.

When schools make use of the RAISE data, the benign purpose of this is to anticipate problems with teaching and learning before GCSE results day. A student is falling below his/her expected level of attainment, we ask the question, why? And we attempt to address the issue once we have an answer. This is a perfectly sound procedure in principle. However considerable subtlety is required the moment we start applying it to individual cases.

In the RAISE framework it is assumed (at least in all the schools where I have worked) that all students receiving a reasonable level of teaching should improve linearly in each subject up to GCSE.   But what is a reasonable assumption averaged out over 200 students becomes deeply questionable when considering an individual student. Wise learning leaders know this.   Unwise ones, who fall in love too readily with the magic of statistics and charts often fail to understand it – and this can lead to unhappiness and misunderstanding all round when it comes to putting the boot in over an individual teacher’s (or student’s) performance.

The main reason for the problem (I suggest) is that the subject learning curve, as with language learning, is nonlinear.   It will include long, flat periods, where the learner adjusts to new material, it will include spurts of growth, where familiarity with a set of building blocks allows new levels of creative thought, it may include downward slopes where a sudden onset of complex material confuses, undermines memory and reduces motivation (probably the key factor in exam success). It seems likely (this may be controversial) that each person’s curve may have an asymptotic limit (as in sport or music, I can always get a bit better, but I am unlikely to get to that level, at least at this point). Over time, if someone is good at 11, then they are probably going to do all right at 16, but it is not rational to expect everyone to move up a sub-level each term as the standard linear model supposes.

Related to the problem of learning curves, is the problem of criteria for measuring progress. The level descriptions that have been ingeniously developed in agonizing detail for each subject give generic descriptions for the skills that each student should be able to demonstrate if they are working to a particular level.   They can be very useful tools for guiding students in how to improve their work and helpful motivators during KS3. However they are given as a global assessment of quality of work, rather than as an exam mark and so do not actually fit on the line between KS2 and GCSE, which are sudden-death exams artificially welded to the levelling framework. This may be one of the reasons why levels have been abolished for official purposes.

A second (and more subtle) problem with the generic descriptions as measures of progress is that achieving a level is context dependent.   A task appropriate to 11 year olds may enable half the class to achieve a deserved level 5, according to the level description.   Two years later, those achievers may find themselves still scoring level 5, because the material for the task appropriate to their age is exponentially (rather than linearly) more challenging.   The problem is compounded in subjects like language, English, music and art where the skills sets across a level are often only loosely related and development in one set of skills does not necessarily go with development in another. This takes us back to the theme of learning curves.  In other words, scoring a level 5 at age 13 means something different from scoring a level 5 at age 11.

Unless learning leaders have thought through the complexities (not all do) there will be many unnecessary tears – as well as (and this happens) a lot of ultimately unhelpful dishonesty all round.   If the system is to be data driven, let the data at least be intelligent.

This entry was posted in Uncategorized. Bookmark the permalink.