Where’s the 2022-23 student data?
David Kernohan is Deputy Editor of Wonkhe
Tags
HESA Student data is a little like coffee, in that its existence underpins pretty much all intelligent conversation about the size and shape of our higher education sector.
But, also like coffee, it is currently subject to rising costs, significant delays in delivery, and a noticeable reduction in quality. We learned yesterday that this year’s annual open data release, covering the 2022-23 academic year – usually a first release in early January followed by the complete dataset later that same month – has been postponed until August.
Eagle-eyed readers will no doubt have spotted that this is round about the end of the following academic year – indeed, preparations for the 2023-24 data collection exercise are already underway. The late launch is down to quality concerns (and allowing providers the chance to correct their initial submissions – the deadline for such corrections was yesterday) linked to our old friend, Data Futures.
Back at the end of the 2023 calendar year student record officers struggled with a system that spat out unexpected and inapplicable error codes seemingly at random. This was the end point of a multi-year procurement and delivery process that ended with Jisc (the sector technology agency that now owns HESA) delivering the new HESA Data Platform (HDP).
HDP related woes will not end any time soon. In a letter released yesterday HESA notes that:
“The nature of the issues experienced last year means that we do not believe this year’s collection will be totally issue free. The list of additional features and requests we have received from the sector is also long and we will not be able to deliver them all.
The long Data Futures process resulted in many providers upgrading or switching their student data systems, a stressful decision at the best of times which was made worse by multiple changes in the data specification (bluntly, what is to be returned and how) made often without warning and largely at the instigation of the the Office for Students (with the other UK funding bodies, one of the four main customers of HESA data).
Any changes in data collection make for a decline in data quality – if providers are not used to returning data in a particular way the first attempts are more likely to include mistakes as processes and definitions become embedded. The particularly stressful gestation of the 2022-23 collection meant that the usual bulwark against this – a robust, automated, checking system – could not be trusted to report correctly.
OfS and the others use student data for funding and regulatory purposes. Though they get far more detailed data from the collection than mere mortals like ourselves, the availability of open data allows for a measure of transparency as regards decisions. The general principle is that data used in decision making should be easily accessible (or derivable) from the HESA Open Data release – this helps the sector understand regulatory and funding activity, allows for an extra round of checking, and – frankly – keeps policy making honest.
If I publish data derived from HESA with unlikely outliers on Wonkhe, providers are quick to contact the designated data body to correct it. You can believe that the calculations of the OfS are checked by the sector with far more care. And – let us be honest here – if the broad brush outputs won’t be ready for open publication until August, there will be nothing of regulatory quality available much before then. Bid farewell to this year’s updates for OfS’ fancy-pants dashboards.
With an election in the offing the higher education and skills sector is seemingly doomed once again to function as a political football. In most years, the availability of public data means we can speedily rebut lazy presumptions and dubious conclusions – the late delivery of 2022-23 data means, from international students to access and participation to student mental health, we are in the dark. A promised “independent review” of the series of calamities that led us to this point was promised by OfS at the start of the year – we are yet to see even a remit or cast of characters.
There’s no quick fix here – we can’t magic the data into a releasable format, or wind back the clock to recollect the data using the older methods. We are doomed, most likely, to several years of delayed and low quality student data. The national debate, and regulation, will suffer as a result.
I would be interested to know exactly which changes the following refers to: “multiple changes in the data specification (bluntly, what is to be returned and how) made often without warning and largely at the instigation of the the Office for Students”, I simply can’t recall any but then I left OfS in January 2023. Ideally the OfS would pull its finger out and make progress on the review so that lessons can be learned.
Ideally OfS would have pulled its finger out some time ago and completed the scheduled review of the DDB (which never materialised while it waived through the shift to JISC) which might have exposed some of the failings and risks that have led to this mess.
Richard makes a fair point. The issue is not changes to the data spec but a lack of shared understanding as to how the new fields worked and what the purpose of them was. There was certainly a lot of conflicting messages and guidance given by HESA Liaison which made life difficult and the software suppliers generally were pretty poor. Where the OfS are to blame is for the lack of engagement with the sector. Richard was always very good at attending sector events and explaining the thinking behind what was being done, there was no presence at all from the OfS last year, so they were not hearing the queries that we all had nor where they explaining what they were going to do with the data. As a consequence there will have been a lot of different interpretations, so good luck to anyone trying to make sense of sector data
In terms of change, this year may not see “official” changes, but we will have the addition of some 65-70 quality rules that there was not time to finalise last year. Those will undoubtedly add to the workload. For the next three years we are told we will see students at validated partners brought into coverage, those taught overseas transferred from the aggregate headcount return to the main return and the introduction of an additional in-year submission. We are unlikely to see a return which has the same requirements as the previous year until 2027/28. And that’s without any changes that may be made as a result of the independent review. I’m not sure there’s a grasp of whether the sector has the capacity to manage this amount of change over what’s a short period in relative terms.
I agree with David and Paul that the need to explain how the data will be used including finalising things like validation rules are all critical aspects of building a shared understanding of the data which is critical to allowing universities to return consistent and accurate data.
I wanted to respond on data quality, the evidence is actually positive on data quality. We have received some specific amendments from a small number of providers up to 30 April. These amendments will be included in the overall dataset, now due for publication in August. Giving some providers the opportunity to submit amendments up to the 30 April is intended to address some very specific issues with the data. This process works to ensure that the data can be used for the full range of regulatory and funding purposes. We will provide information on data quality when we publish.
Frankly anyone who thinks that the quality of data in the 22/23 HESA return is going to be close to the quality of previous returns is deluding themselves or other people. It is very clear that a lot of institutions did what they had to do to get their data signed off for 22/23. There are 2 big risks that arise from this, firstly as the data is used for different purposes then what emerges from that use and whether the compromises people made will now come back to haunt them and secondly what emerges when 23/24 returns are made and continuity checks are in force based on the 22/23 return. Plus as Paul mentioned all of the additional quality rules that had to be binned last year because of time.
It will be 2-3 years before this settles down into anything like normal. One hopes the promised OfS review will unpick what a shambles this has been.
David I think you are doing everybody who worked on this return a huge disservice. People worked very hard to get their data right and that is reflected in the quality of the data we have received. I am also not sure how you are in a position to comment, Jisc has all the data and we have run extensive quality checks on it. The data is not perfect, it never is, but to assert we are deluded is offensive.
Where I do agree with you is that the new data model and collection system will take time to bed in. If you want to discuss this matter further, can I suggest you email me.
Heidi – there is a world of difference, and always has been, between data passing HESA data quality checks and it actually being right. If that was not the case then the data amendment process would not exist, as all of these returns would have been signed off by HESA. What has happened in 2022/23 is that a lot of the checks that people would normally do have not been done as people were focused on just trying to get the return in and signed off.
In most years people would have been using the OfS and other outputs to analyse and check the data as well as running their own internal reviews as to the impacts on KPI’s/League tables/funding before final submission and sign off. That will not have happened this year. As my earlier post said a lot of these issues will not become apparent until people start to see some of the outputs that come back to them and the continuity checks kick in.
As people have struggled to understand the purposes of some of the new fields, there was, and still is, a greater level of uncertainty about how to return them. With the compromises people made to get a return in and signed off, then this will also raise further issues.
Having spent the last 18 months in a lot of different forums with people who are at the sharp end of this, I have no doubt about the amount of work that people put in, and the impact that has had on a lot of people’s health and wellbeing, (I know the impact it had on mine).