Robin Upton's Research- Some Problems of Thinking By Numbers  


Today's paper informed me that "Family doctors [in UK] make up to 2.8 million medical errors every year" but didn't explain what actually constitutes a 'medical error'. This journalistic sloppiness exemplifies a general rise in use of statistics, often for their own sake. The UK government, for example, has been systematically implementing them at many levels. So-called 'league tables' are now compiled about many aspects of public life, especially high visibility services such as health, education and law and order. The government attaches great weight to them, but rarely admits that they are any less objective than their sporting counterparts. This essay will examine some of the problems of the increasingly casual use of statistical performance measures as part of the decision making process.

The advent of computers has been accompanied by a sharp rise in both the supply of and the demand for statistics. Computer users have almost unwittingly adopted a 'computer mentality' of focusing on simple, abstracted, numerical models instead of on real world problems. One of the most appealing features of substituting a real problem with a statistical one is that the latter can be completely solved. Statistical rules can effectively automate decision-making in areas such as quality control of industrial processes. However, most real life situations are rather less tidy. The difference is becoming increasingly important as statistics are applied to more and more complex, human based situations.

Let us compare a factory and a school. A factory's output is sold at a rate that, ultimately, reflects the success of the manufacturing process, permitting a relatively straightforward statistical model. A statistically based approach to 'optimising' the school would, until recently, have been deemed at best questionable, at worst futile or even unethical. Nowadays though, such modelling is seen not only as feasible but almost inevitable. Modern ideals of 'efficiency', 'transparency' and 'replicability' mean that more and more time and effort is spent on quantifying criteria that were formerly assessed qualitatively. Each UK schoolchild, for example, is tested annually with nationally standardized tests so that education can be fitted into the marketplace model. This model has schools competing with each other for position on a 'value-added' table which purports to show the relative value of the education they provide.

A Formula for Beauty

Another story in today's paper concerned a 3-year research project carried out by a Welsh University. Their Environmental Studies department devised a formula to assess a coastline's beauty. The professor who led the project reports, "the formula will tell you whether a beach is good, bad or ugly, and in which areas it is deficientů It struck me that all studies of coasts and beach were subjective - they lacked a scientific base". It is a sign of our times that he felt that such a base could be provided by a formula that "breaks down the landscape into 28 elements, from the colour of the sand to the height of the cliffs".

The issues here are not merely technical. To reduce a real-world problem to statistics is to ignore any elements not expressed numerically. A lot of factory work is well suited to statistical treatment, but what about the work done in a school? Inculcation of a set of facts might be measurable, but is this the true meaning of education? Dare we put a value on human personality, a child's creativity, friendliness and love of life? The hard fact of today's 'scientific' systems is that whilst any teacher can testify to the importance of these things, the statistically based assessments assume that they are of no value. National targets have unfortunately had such a high profile that they are coming to dominate the teaching process, sometimes overshadowing even personal relationships. With funding dependent on schools' positions in league tables this is hardly surprising.

Tacitly ignoring what is not measured should be of great concern, as Statistics cannot model many important aspects of our lives, and are likely to remain unable to do so at least for the foreseeable future. Since they arose out of modelling people en masse, it is hardly surprising that statistical methods have little power when applied to the individual. If we structure our public services around statistics, we cannot reasonably expect them to respect people's individuality.

Weaknesses of Summary Statistics

Statistics summarizes reality by reducing complex happenings to simple numbers. Its power comes at a price, since focusing attention on a few areas is often misleading. Consider, for example, how you might feel to read that a cure rate for a particular disease jumped from 33% to 67%. The rate itself is only part of the picture - it makes a difference whether the number of cases was 3 or 3000! Rumours abound of health service practices that affect patients but are invisible to the current statistical summaries. For example, if patient waiting times are categorized as '<6 months', '6-12 months' and '>12 months' then we might expect to see a lot of patients getting treated after 25 and 51 weeks. If funding is dependent upon efficiency as gauged by such statistical indicators, then this practice is simple common sense for hospital managers trying to improve services to their patients by maximizing the funds available.

Senior management may not feel that they complexity and range of the tasks they have to manage does allow any feasible alternative to the modern style of 'management by statistics', although I suspect that this will be truer of the next generation of managers. Another possibility is that they are so aloof as to be actually unaware of the abuse potential of statistical targets - in which case they are kidding themselves with statistics. A more critical interpretation is that the emphasis on statistical targets is an attempt by senior management both to underline the importance of statistics in the public's mind and simultaneously to manipulate lower management into engineering apparently positive statistics for political gain.

Disingenuous policies are by no means confined to numbers games - waiting time statistics can be improved in all sorts of ways, such as encouraging patients to cancel by giving them inconvenient appointment times at short notice, letting them wait in ambulances, or having an unofficial 'waiting list to get on the waiting list'. The possibilities are endless, and whether or not these practices stay within the letter of the law, their net effect is the same; the 'accountability culture' of statistics means that doctor's views about clinical priorities end up being compromised.

Once any set of statistics is chosen, wily practitioners can and do evolve methods to subvert it, resulting in a continual - and exhausting - game of cat and mouse. If they are not tempered with other methods, statistics can never be expected to give a reliable picture of complex goings on.

Data Collection and The Hawthorn Effect

Data collection is another potentially difficult issue. Don't think of a scientist taking readings in a lab - the collection process often influences what it is attempting to monitor. The Hawthorn Effect was identified by a research study in the 1920s, which found that factory output increased when lighting was improved but also that when lighting was decreased to the original levels productivity increased still further. The study showed that personal factors were important influences in the factory - people, unlike machines, behave differently when they are being watched. For work of a more personal nature factors such as motivation, appreciation and morale are clearly of greater importance. This may seem obvious, but does not seem to have prevented UK politicians' phrases such as 'tracking down rogue teachers' and 'naming and shaming bad schools'.

Peer review could provide a much better understanding of local factors and a cheaper, less intrusive method of data collection, but runs counter to the current 'one-size fits all' culture of nationally imposed solutions. More seriously still, in the current adversarial culture it could not be expected to provide a credible assessment.

The solution used in the UK is to have schools laboriously examining children according to nationally chosen criteria, while external regulatory bodies control these assessments as well as rating the schools. Requirements for national standardization allow schools little ownership of the process. The regulation process entails losses that are more than financial; it is seldom without acrimony and the associated stress has been linked to the current crisis in the teaching profession.

Rejection of subjective in favour of 'objective' criteria

The difficulties of data collection are no less real for being qualitative rather than quantitative. However, to the numerically focused managerial mind they are almost invisible as they are hard to model and effectively unauditable. Overuse of statistics arises from and contributes to the present social climate of distrust - resulting in a strong bias towards the systematic and the impersonal. Any decisions which rest on subjective interpretations are seen as risky since they may be considered 'unprofessional' by others. If organizations are to continue to increase in size and involve individuals with an ever wider ranges of skills and beliefs, such a trend of 'depersonalisation' may seem necessary to ensure their smooth running. I do not, however, believe such a trend is inevitable or even desirable; organizations that eschew the qualitative will tend to exhibit increasingly mechanistic behaviour and become less responsive to human influence either from within or without.

The objectivity of statistical methods should also not be allowed to go unexamined. The beach beauty formula was made after interviews with 1000 holidaymakers in Wales, Turkey and Malta, so its application to other beaches - or to other people's perceptions of beauty, for that matter - would have to be carefully justified. The potential statistical pitfalls of such studies are manifold but are likely to be ignored in the quest to derive a method of making easy comparisons between beaches across the world.

The eminent scientist, Lord Kelvin, said that "When you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind". This could be a motto for our times, inspiring research into such notions as 'objective beauty' and the numerical value of an education. Mathematical judgments may be more convenient than aesthetic or personal ones, but they are neither inherently superior, nor universally suitable. We must remember that there are contrasting approaches, recalling another remark of Lord Kelvin's that "High Heaven rejects the lore of nicely calculated less or more."

When cerulean blue glaze became available to Victorian potters, it became very popular and was used more for its novelty than for its aesthetic value. This same 'magpie effect' is currently true of 3-D graphs, numerical methods and statistics, since computer software has only made them easily accessible in the last decade or so. In the minds of many they are still associated with the academic establishment and more recently with the world of big business. The veneer of professionalism attached to statistics may actually be a more important cause of their overuse than either a love of tidiness or laziness of thought.

Thoughtfully used, statistics are powerful tools for abstracting information from complex situations. However, they have become increasingly prominent in many areas of life, including ones for which they were formerly considered unhelpful and in which their use is highly questionable. Often unwittingly, they serve to obfuscate the very topics they purport to illuminate, in many cases covering over a lack of methodology or real understanding. Their collection and interpretation are serious and poorly addressed issues. Policy makers in the UK have shown themselves unaware of the invidious nature of statistically based systems, by actions such as the forcing in of national league tables against bitter opposition from the professionals involved. Focusing attention away from the qualitative towards the quantitative may prove to have long-term social consequences that though subtle are nevertheless grave. Powerful though they can be, statistics have serious limitations; as abstractions they inevitably take us away from the realities of individual circumstances and particular cases. They are no substitute for sensitive judgments made by caring people and statistics should never be afforded attention due to people.