Subject TEF is no more.
After two year-long pilots, the government has decided that:
we do not want the OfS to proceed with any form of subject-level assessments as part of TEF at this time.
Instead, OfS is asked to develop a “revised and invigorated” provider-level TEF which will run every four to five years, with the first group of assessments completed and published by 2022.
There will be four levels, with existing Bronze, Silver, and Gold awards scrapped. There will be yet another “strategic guidance letter” coming from the Secretary of State to be issued “shortly” that will expand on government views set out in a document spanning just three substantive pages released today purporting to respond to the newly released Pearce Review of TEF.
What’s TEF for?
The Pearce review is excellent – genuinely so. The government “mostly agrees” with her high-level recommendations – but as I’m focused on what the 2022 newTEF will look like I’m going to look particularly at areas where the government diverges.
Interestingly, the first of these is the very premise of TEF. Pearce begins with an examination of what TEF ratings might be used for – in the past the government has oscillated between informing student choices, raising esteem for teaching at university level, recognising excellent teaching, and meeting the needs of employers and industry. Drawing on surveys of stakeholder groups, it quickly becomes very clear that neither students nor employers have any interest in using TEF ratings for anything. Neither of these are surprising findings – various reports have reached similar conclusions, and frankly anyone with an understanding of higher education policy could have predicted as much. Damningly, even the small proportion of students who knew about and claimed to understand provider TEF ratings ranked them at the bottom of 15 potential sources of information.
For Pearce, the TEF points to what excellent teaching looks like, and this spotlight drives enhancement. To be clear, this view is not shared by academics (less than half of the 85 who responded to the review were in support of TEF). The availability and use of data on groups of students within larger provision was seen as helpful, as was the benchmarking of this data.
The government position is that the primary purpose of TEF is the enhancement of quality, and to that end “it should be more clearly part of OfS’s regulatory framework” because regulation is a great way to run enhancement programmes. But the assertion that “a secondary purpose” of TEF is to inform student choice flies in the face of all the assembled evidence. So we start from a place where one of the main aims of an intervention is to do something we already know it won’t do.
Subject to debate
I imagine few mourn the passing of subject TEF. The government’s stated reason for this decision is burden – a similar argument is used as the impetus to make TEF a REF like quinquennial engagement. And this is unarguable – data released alongside the report and response suggests a mid point estimate for the cost of a four year cycle of subject TEF to the sector is an eye-watering £110m. However, the mid-point total sector cost of provider level TEF run every five years still works out at £65m – £20m spent by the OfS and £40m by institutions in the sector. We’ve an evaluation of the second Subject TEF pilot out today too – it is written in characteristically upbeat style, but the takeaways remain that it is expensive, complex, and the data quality is very variable.
However, the Government goes further than Pearce in saying that “we do not want the OfS to proceed with any form of subject-level assessments as part of TEF at this time”. Pearce actually found that the subject-level metrics were incredibly useful to providers – while she is clear that ratings should not be awarded at subject level, she does call for the publication of subject TEF metrics as official statistics – benchmarked and absolute, with splits. The report suggests that these metrics should be available to and used by an assessment panel in a similar way to the other split metrics for groups of subjects, and that “failure to demonstrate that the institution is delivering enhancement actions which are suffciently (sic) addressing poor performing subjects should act as a limiting factor in the ratings”.
This is an elegant and straightforward way of incorporating quality concerns at a subject level in the main exercise. If an assessor is concerned about a group of students – be they part-time students, Asian students, or nursing students – they can drill down into the data, which can be caveated appropriately where small numbers are involved. Giving providers access to this data ahead of the assessment allows – vitally – problems to be addressed early, and details of these mitigations shared in narrative form with the panel. DfE has very much dropped the ball on this one.
Qual and Quant
When we think of TEF we think of the metrics, flags, benchmarks, z-scores, absolute markers and other statistical flotsam. But TEF has always had at least a qualitative input, in the form of the much derided provider submissions. In the practice of TEF these became vestigial organs – existing merely as a whispered caveat to the orchestral roar of the data. This goes right back to the original green paper vision – where DfE recognised that:
that these metrics are largely proxies rather than direct measures of quality and learning gain and there are issues around how robust they are. To balance this we propose that the TEF assessment will consider institutional evidence, setting out their evidence for their excellent teaching.
Noting at length the statistical weaknesses in TEF – Pearce proposes two dimensions of excellence, each with two aspects. A pair of these – student satisfaction and graduate outcomes – student satisfaction (read NSS!) and graduate outcomes (read, well, Graduate Outcomes – along with LEO, continuation, and degree attainment) are backed by a combination of nationally available metrics and provider-submitted evidence, the remaining two – educational gains and teaching and learning environment – are backed by provider-submitted evidence alone.
This pattern of assessment would mean it would no longer be possible to calculate a likely award from metrics alone. As Pearce notes, we are bringing back peer review. And we lose grade inflation as a supplementary metric – “as we are concerned that this does not meet the principle of relevance” in a framework focused on teaching quality enhancement. In other words, the TEF should not be a dumping ground for fashionable concerns that could best be investigated elsewhere.
The government, continuing its utterly nonsensical war on the NSS, is unconvinced by “student satisfaction” as a measure of excellence – suggesting instead “student academic experience”. While recognising there is a “place for students’ feedback on the quality of their teaching and learning experience” it is not clear that the NSS will do this in the “radical root and branch review” form. We also get the addition of “limiting factors” (remember, Pearce suggested these for groups of students, particularly in poorly performing subject areas) based entirely on student outcomes.
In deep with the data
There may be another multi-year multi-mode attempt to demonstrate that “learning gain” is not a reliably measurable facet of the higher education experience – clearly things may have changed since the OfS reported on this in 2019 following a four year, £4m, programme of research.
But the big data “ask” for the OfS is to take into account the concerns of the ONS in reviewing the use of metrics and data in TEF. Today saw the release of a 129 page monster of a review from the national statistics body with 33 recommendations – enough detail to keep quantitative wonks busy for weeks. I’m not going to be able to to the full thing justice in this overview article, but just to give you an idea:
- The historical nature of the metrics relate to different students from different periods.
- The benchmarking of statistics is necessarily limited to the definitions of the benchmarking groups – there are potentially many other non-teaching related aspects of the experiences that would have an impact on statistics.
- There are many suggestions for addressing the running sore that is the use of significance flags that show differences very close to the materiality and statistical significance thresholds.
- ONS recommends all modes of study are included in the overall metrics used in developing an initial hypothesis.
- There are concerns about the defensibility of the use of “high and low absolute values” (very small providers are less likely to show either).
- Split metrics involve very small sample sizes.
- There is a fundamental concern about how well the existing metrics actually show differences under the control of providers.
- Changes to underlying metrics (for example the move between SOC2010 and SOC2020, or between JACS and CAH)
- There’s a strong recommendation that KEF like benchmarking by provider size or similarity is considered.
The ONS report presents a lot of work – I don’t feel that it will be easy to do all of this work and consult on a new framework in time for a 2022 TEF (assuming similar timeframes) to take place.
Gold command
One immediately visible change is the demise of the Gold, Silver, Bronze and Provisional ratings. Instead we will see a new four-level set of descriptors – Pearce suggests “pending” makes more sense than a Provisional award, and the olympics-derived main ranking shifts to (from the bottom):
- Meets UK quality requirements (and just to note Pearce is clear on the need for UK quality requirements throughout)
- Commended
- Highly Commended
- Outstanding
She suggested ratings will exist at institutional level, backed by a rating linked to each of the four aspects above. It feels a bit Ofsted-like; in the same way a school can be “Good” but have “Outstanding” elements a university could be “Commended” for quality but have “Outstanding” student satisfaction. She also recommends the clearly-written-by-algorithm narrative feedback should offer more detail that is of value to providers.
The government agrees on scrapping the medal table, and that the top three of four new ratings should be “signifiers of excellence to various degrees”. However, the new “bottom category” should “capture those providers failing to show sufficient evidence of excellence”.
Evoking the dreaded Ofsted “requires improvement” rating, this fourth category flies in the face of the original purpose of (and stated new vision for) TEF, but potentially makes sense in the context of OfS’ plans to pay attention to providers whose students outcomes are hovering only just above the approved benchmarks, while not being officially beyond the pale. It looks like a way of de facto raising the baseline quality threshold and disincentivising the provision of some courses without actively restricting their provision.
What might new TEF look like?
TEF isn’t going away, but what we do get in 2022 will not look much like what we have now. Pearce suggests a structure, and the government response immediately adapts it. I’m also not convinced that Pearce’s approach to qualitative, peer-review assessment is one that will last through development by a regulator that seems determined to used metrics (and unbenchmarked metrics at that) as the basis for pretty much anything quality related.
Pearce’s two dimensions and four aspects probably underweight graduate outcomes for the current regulatory climate. The incorporation of provider evidence on the learning and teaching environment feels like a hangover from another era, and the use of provider evidence on educational gain looks like it would be more likely to become something more like the Guardian league-tables “value added” metric.
As regards metrics themselves, clearly the NSS is in flux but it was cheering to see DfE recognition that some kind of student experience measure based on asking students about their actual experience should remain. The regional benchmarking in LEO looks like a reasonable way forward on graduate outcomes provided we use the highly-skilled marker (with appropriate caveats) rather than raw salary data – and I’d like to see the reflective questions in Graduate Outcomes play a part too. Continuation will remain as is (though as is pointed out the high current weighting of this could offer a perverse recruitment incentive – students from disadvantaged backgrounds are more likely to leave their course early, and for non-academic reasons).
On graduate attainment, this makes sense if we do the “value added” thing – less so if we look at degree classifications alone (and the removal of the “grade inflation” contextual metric will probably happen as the issue moves into regulation.
I think the overall and aspect-level ratings idea is a sensible one and will probably remain – the challenge will be communicating the meaning of these ratings.
So, my prediction:
Education Experience will cover:
- Academic Experience, as measured by a NSS replacement and possibly the reflective questions from Graduate Outcomes
- Continuation – the existing HESA KPI
Educational Outcomes will cover:
- Graduate outcomes – highly skilled employment or further study from LEO, benchmarked to regions of graduate domicile.
- Education gains – something a bit like the Guardian value added metric using entry qualifications against graduate outcomes
Data will be made available by all the usual splits, plus subject (CAH top level) where possible. The provider statement will continue to play a limited role in assessment. And prospective students still won’t use TEF.
Isn’t highly skilled employment/further study from DLHE not TEF? Don’t think there is a distinction in the LEO data, or am I missing something?
Sorry I meant “DLHE not LEO” (too many acronyms!)
Hi Connor – the distinction has been available through LEO in the past, though I don’t know if it has ever been used in regulation. It’s arguably a better use of LEO than the salary data. Previously, data on this has come from DLHE – clearly this wouldn’t be useable in a new TEF as it has now been replaced by Graduate Outcomes. And there won’t be enough usable years of that by January 2022.
How far do you have to go back to find the distinction in LEO? Doesn’t appear to be in any of the most recent public data.