Roll up, roll up for subject level TEF

The Department for Education is seeking volunteers for the pilot of its TEF subject-level rating. On Thursday, the government published the specification for the trial, which provides our first glimpse at how a subject-level TEF could work if rolled out in future years in line with the plan for the exercise.

The trial will be for 15 institutions willing to model a ‘by exception’ approach which looks in more detail at subject areas deviating from the institution’s average (model A) and a further 15 willing to look across all subjects to produce a ‘bottom up’ aggregated score. And then it wants another ten – call them the gluttons for punishment – to trial both (model B). No results are going to be made public as this is just developmental for the exercise. There’s a prize for participants in getting ahead of the game, though the burden of participation isn’t likely to be small.

The underlying design for subject-level TEF is the same as TEF2. The core metrics are the same, and there will be the split. TEF’s Deputy Chair, Janice Kay, Provost at the University of Exeter, will lead the subject-level panel, though she’s the only person to work on both the subject pilots and the main exercise for TEF3. There will be assessors and panels as in TEF2 who cover the two pilot models grouped into seven subject areas.

One of the most interesting additions is trialling a measure of ‘teaching intensity’. This will be based on a provider’s self-report and a student survey. By contrast to the TEF’s measures so far, which have focused on assessing the ‘outcomes’, this is designed to measure an ‘input’. Quite how this distinction will be reconciled will need careful thought. What happens when there’s ‘poor quality’ input, but great outcomes?

Teaching intensity

As has been common in recent DfE documents, there’s reliance on the HEA-HEPI survey to provide justification for the policy. The key to the TEF subject-level approach is Graham Gibbs’s ‘Dimensions of quality’, which found that large class sizes had a negative impact on contact intensity. Therefore, teaching intensity will be measured across two dimensions; number of hours and size of group. A provider will declare contact hours, which will be weighted by student:staff ratios. A new student survey will ask students about contact hours and self-directed study. This element will be subject to piloting in the following subjects: nursing, physics and astronomy, creative arts and design, history and archaeology and law.

The teaching intensity measure will be a ‘supplementary metric’ and not, we’re told, used in the ‘initial hypothesis’ which is the first part of the TEF judgement-making process. Teaching intensity will be derived by looking at the number of students per teaching staff members and then multiplying by hours. Small groups with lots of hours do well, large groups with few hours score worse. While it seems simple, and the DfE guidance goes to great lengths to explain why this is an important measure, it isn’t a perfect measure of the quality of those interactions.

And in a bold statement likely to raise the hackles of fans of institutional autonomy, the document states that the government considers “…that excellent teaching is likely to demand a sufficient level of teaching intensity to provide a high-quality experience for the student. Providers should, therefore, be investing resources into teaching, measurable through the volume of contact time provided, the small sizes of the classes in which teaching is delivered, or a combination of the two.” You’ve been told.

There’s even a new three-letter-acronym, the Gross Teaching Quotient (GTQ). In the calculation proposed for provider-submitted responses, the weighting is done on a banding basis, and not on the precise number of students in a class, there are some cliff-edges where there’s effectively a penalty for tripping over the boundary: if you have a class of 21, you might as well (for the purposes of this calculation) have a class of 40. Both of those classes will be weighted equally. Teaching intensity will be presented to TEF assessors for three years of study (it’s not clear what happens if the programme isn’t three years in length…).

The survey of students on teaching intensity also throws up another interesting point about the data in TEF. For most of the metrics, the data used are historical – i.e. past students’ NSS scores and their employment. This has led to questions about the currency of the results. This looks to be the first time in which current students’ views will feed into the exercise. Likely to cause plenty of debate, we’ll need to see what the pilots throw up on teaching intensity, how the measure develops, and how institutions’ gaming of the measure develops.

Model A or B?

For many, how the provider rating is decided in model B’s ‘bottom up’ approach will be the main priority. Many will be pleased to hear this is a mix of provider-level and subject-level outcomes. The subject-level ratings will form the initial hypothesis, which will be weighted by the number of students on each subject. The provider-level panel will then consider this alongside the provider-level metrics and submissions.

The most significant difference between the two models is the level at which subjects are grouped. Model A, ‘by exception’ analyses the 35 subject groups outlined at level two of the Common Aggregation Hierarchy (CAH). Model B will combine these 35 subjects into seven different groups, which will be used for submissions. There is some flexibility in the seven groups to move subjects, but using the higher-level aggregation at this level means that it’s harder to move a curriculum area from one code to another to bundle it differently. Not following? You need Annex C in the specification to see what the seven groups are and the 35 subject areas.

In model A, each ‘exception’ subject will have a five-page written submission alongside the 15-page provider submission. In model B, you only enter a submission for each subject group and the more groups, the longer the submission. Simples.

Metrics, benchmarking and flags

One of the criticisms of TEF2 has been the issue with non-reportable metrics and how this affected the decision-making process, with colleges and alternative providers appearing to suffer for having limited data available. Reassuringly, the pilot specification is very open about the limits of the metrics broken into 35 subject areas, particularly as the plan is to use the same benchmark and flagging system as in provider-level TEF. The model B pilot will have all data open for contextual information and will explore how panel members make decisions with this data.

Notable in its absence is the use of salary data from the Longitudinal Education Outcome (LEO) release. Following much speculation within the sector about its use in TEF, Jo Johnson confirmed that LEO will be included in provider-level TEF in its next iteration (TEF3). However, the pilot specification for subject-level TEF explains it will be investigating how to control high absolute values and clustered values, i.e. for medical students for whom highly skilled employment levels don’t vary much, and nor do salaries. Refinement here could prepare the ground for LEO’s inclusion in later iterations.

Keen to answer the call? You’ve got until 25th September to apply to join the pilot. And it’s not just those who take up the challenge who get all the fun: there will also be a technical consultation on subject-level TEF “later in the year.” There’s also going to be a “representative student poll” to test students’ views of subject-level TEF. Inevitably, the results of the lessons learned exercise from TEF2 will also impact on subject-level TEF’s design and further development. There will also be opportunities for 110 new TEF assessors, pooled into the seven main subject groupings.

Gold, Silver or Bronze?

It’s too soon to tell how well the subject pilots will work, so we only give the exercise a Provisional rating for now. A lot depends, as it did in TEF2, on the engagement of providers with their submissions and on the operation of the panel. We’ll surely hear it repeated that “it’s a pilot” and we should treat it as such. There’s plenty to comment on about the design so far, and we expect interesting responses to the consultation in the autumn. For eager TEF-watchers, it looks like the fun’s not going to stop anytime soon.

Additional reporting: Team Wonkhe