This article is more than 6 years old

by John MacInnes

9/07/18

Subject TEF and the question of numbers

Chartered statistician John MacInnes introduces us to some of the major statistical problems presented by the design of subject TEF.

This article is more than 6 years old

by John MacInnes

Analysis

9/07/18

wonkhe-random-groupings — Image: Shutterstock

John MacInnes

Professor of Sociology

by David Kernohan

staff

6/07/18

John MacInnes is Professor of Sociology at the University of Edinburgh and a Chartered Statistician.

A short but necessary lesson in statistical logic

Random variation plagues the measurement of small groups, and is compounded when we want to make comparisons between these groups. This is because our real interest isn’t the groups themselves, but what they might tell us about future similar groups. Thus, if I evaluate a hospital unit by counting patient outcomes, my interest is in predicting the prospects of future patients. The larger the number of past patients whose experience of facing similar conditions I can measure, the more likely it is that they will be representative of future patients. Indeed, this number is the only guide I have, since I cannot ever know just what ‘representativeness’ might comprise.

This is where the paradox of randomness strikes. If I can measure 1,000 people, I can make estimates with a margin of error of only a few percentage points. The ‘noise’ of variation is small compared to the strength of any ‘signal’ about their experience. Indeed almost everything we know about modern economy and society comes from this logic, perfected in Britain a century ago.

Conversely, measurements of small numbers of people, no matter how carefully made, and no matter how ‘typical’ we might hope these people to be, tell us disappointingly little, because random variation swamps any signal. With 100 people my margin of error rises to around +/- 10 percentage points. With 15 people the noise becomes as strong as the signal. With fewer, noise is about all we get.

Why this matters for the subject level TEF

UK Universities currently offer about 37,000 undergraduate degree programmes. Half a million students enter annually, giving a mean course cohort of 15 students.

Thus the Subject level TEF proposes to measure 35 CAH2 ‘subject instances’ rather than degree courses. This raises the average size of units to about 115 students, and evades the problem that universities divide disciplinary boundaries in diverse ways.

Four key problems

Unfortunately this will not fix the problem, for four reasons. First, we know that for many of the NSS metrics on which TEF relies, the variance of student opinion and behaviour within the same department and provider is large compared to that between departments and providers. Analysing early NSS satisfaction data Marsh and Cheung (2008) found that most of the variance in the results occurs at individual level, with only a little over 10% being attributable to some combination of provider and department. This increases the size of the unit we need to distinguish underlying unit performance.

Second, ‘subject instances’ will vary in size. Many will still be smaller than any threshold needed to find any signal amongst the noise. But if the aim of the TEF is to inform stakeholders, there will be a perverse incentive to report some signal, lest providers disappear from league tables or other commentaries. Thus a highly likely unintended consequence of the subject level TEF will be the rolling up of specialist departments or niche degrees into larger ‘one size fits all’ programmes.

Third, random variation also hobbles the business of making comparisons between units. The larger the number of potential comparisons, the greater is the risk that any individual one is merely the result of random variation. With a couple of hundred universities this can be managed, although even here it is hard to do more than identify a handful of providers that do better or worse than the average. However this has not stopped the widespread abuse of such metrics to construct league tables or ‘identify’ failing units by assuming the numbers to be far more reliable than they in fact are.

But what comparisons might students make when choosing between several thousand degree courses? We do not know. Although statistical techniques exist to mitigate this problem, it is difficult to envisage any system that could avoid conflating random variation with real differences on the one hand, but do more than distinguish the brilliant from the abysmal on the other.

Finally, subject instances are a queer fish. The DfE consultation document asserts that not only the 35 CAH2 subjects but also the seven ‘broad’ subject groupings into which they will be sorted ‘are likely to have similar teaching practices, teaching quality and student outcomes.’ No evidence is offered to support this questionable claim. Is teaching in Computer Science and Civil Engineering similar, or Maths and Agriculture, or Architecture and Politics or Archaeology and French language? The unit that is of interest to most students is the degree course, or the subject area or department teaching it. This is also typically the lowest unit through which university governance and compliance processes operate and for good reason: the demands of teaching organisation, delivery and assessment are typically subject specific.

What next?

In order to judge the potential viability of the Subject level TEF, DfE ought therefore to supply some basic information, including:

the distribution of subject instance sizes
NSS and DLHE metric variance between and within subject instances
the associated standard errors and their means of calculation
the mitigation strategy for dealing with multiple comparisons.

The Office for National Statistics asked for an independent review of benchmarking. None has yet taken place. We also need an account of benchmarking that can be understood by stakeholders. Without it, results based upon it are likely to be abused by appraisers in the same way as previous performance indicators.

The Scylla and Charybdis of the subject level TEF is that the aggregation of students into groups large enough to make meaningful statistical analysis possible debases the validity of the analysis by treating disparate groups of students with a variety of educational experiences, studying different subjects, located in disparate units of university governance, as if they were in fact homogeneous. Randomness is not something the Office for Students or the Department for Education can change. Without a robust account of how they intend to deal with it, the prospects for a viable subject level TEF look poor.

festival side Festival side

TFOHE_2024_Website_Column_1000x1680_Book@2x

View here

by Mark Leach

featured message

19/05/23

post list Latest articles

Firefly A University campus shaped and designed like a fruit trifle 25985 — Image: Adobe

You can’t make a digital strategy without digital staff and student skills

by David Minahan

Comment

12/08/24

Without further funding the government needs to be clearer about what it expects from universities

by Lily Bull

Comment

8/08/24

tape-measure-colours-wonkhe — Image: Shutterstock

Student outcomes data for partnership provision needs proper transparency and context

by Ellen Engstrom

Comment

6/08/24

shutterstock_2147873183 — Image: Shutterstock

Breaking the cycle of despair through tracking student engagement

by Rachel Maxwell

Comment

5/08/24

Bringing international trust back into the English quality assurance system

by Vicki Stott

Comment

4/08/24

The independent review of the OfS gets its completely wrong on quality

by Paul Ashwin

Comment

3/08/24

wonkhe-coloured-pencils — Image: Shutterstock

The end of BTECs has been paused, so now what?

by Alice Wilby

Comment

2/08/24

How to implement the new OfS requirements on harassment and sexual misconduct

by Anna Bull

Comment

2/08/24

St Andrews discharges its rector over Israel/Gaza

by Jim Dickinson

Long read

1/08/24

What’s missing from OfS’ new approach on harassment and sexual misconduct?

by Jim Dickinson

Comment

1/08/24

3 responses to “Subject TEF and the question of numbers”

Andrew Fisher says:

Jul 10 2018 at 10:06 am

‘With a couple of hundred universities this can be managed, although even here it is hard to do more than identify a handful of providers that do better or worse than the average.’

This is the nub, isn’t it. For years the published data have shown that a small group of institutions consistently do worse than the average; but rather than acting on those data we have chosen to prioritise finding a way to rank the great majority which are about average.

Reply
Bradbury Smith says:

Jul 10 2018 at 10:49 am

Very informative post, thanks John.

Reply
lizmorrish says:

Jul 10 2018 at 11:41 am

Excellent assessment. DofE and OfS now need to respond to this. My guess is they won’t, because they can’t without losing face.

Reply

A short but necessary lesson in statistical logic

Why this matters for the subject level TEF

Four key problems

What next?

Share

Share

festival side Festival side

post list Latest articles

3 responses to “Subject TEF and the question of numbers”

Leave a ReplyCancel reply