I was delighted and fascinated by the recent article on making sense of a diverse sector by Wonkhe’s David Kernohan.
Underneath that unassuming title is a rich exploration of the diversity of the UK HE sector and some potentially game-changing ideas about how we describe the differences and similarities between providers.
The creation of a standard approach to describing and grouping HE providers is a tantalising prospect. The sector has changed enormously since the 1990s and so much of the historical approach to grouping – largely driven by notions of history and prestige – feels increasingly irrelevant to the sector today.
At the heart of this exercise is the idea that we can find the answers by looking in the data. Such is the extent of data-based reporting, regulation and rankings that we might accept this as a given and dive straight into the data dimensions and definitions. But for this exercise – maybe some others as well – we should take a moment to step back and reflect on some of the more fundamental questions.
What is data?
At a time when everything seems to be data-driven, it’s worth taking a moment to reflect on what data is. We use data to describe the world – to help us understand and analyse reality. In this instance the reality is a really complex and diverse sector of HE providers who offer life-changing opportunities for millions of students every year. They work in different ways across the full spectrum of academic disciplines and they push the boundaries of knowledge through research and innovation.
The data structures and definitions that are used to describe these incredible organisations are, by comparison, very simple and they frequently fail to capture the nuances and complexities of the sector. In striving for consistency and comparability the sector-level datasets necessarily impose a simple, rigid data model onto a complex and dynamic reality.
What is analysis?
Having squeezed and shoe-horned this reality into these rigid data definitions, this endeavour attempts to make sense of this diversity through the use of algorithms. Good analysis is based on a foundation of algorithms – the “Data Science” bit – but it must go beyond that to tells us something meaningful about the real world.
Good data analysis is an art and in this case DKs choice of data, algorithms, the description of the groups and, yes those awkward manual adjustments, are the work of a Data Artisan not a mere Data Scientist.
What is quality?
Data is deceptive because its rigid, numerical tendencies lure us into thinking about quality in rigid and numerical ways. Of course those Data Science foundations can be assessed in terms of right or wrong but the quality of this analysis – and the meaning that is derived from it – cannot be quantified any more than the quality of the Mona Lisa or A Love Supreme.
When considering this way of grouping HE providers we can admire the technical approach taken, the use of (value-free) colours to describe the groups. But to reach a view as to whether these groupings are good we need to bring our collective knowledge of this rich and complex sector to the fore. Are these groupings credible and fair? Will they add value in real-life situations? Can they avoid the trap of becoming some kind of ranking-by-proxy (as if the league tables themselves are anything other than a ranking-by-proxy).
Off the fence
There are significant weaknesses in this proposal. By counting the things that are measured (and measurable) we are not necessarily measuring the things that count. Furthermore the (admirable) approach of using only publicly available data (UCAS – I’m looking at you) further restricts the richness of the analysis that is possible. I like what’s in here, but have a nagging feeling about what could have been.
And then there are the algorithms. Not only is the data blunt and often clumsy but this analysis applies rigid algorithms to allocate the data subjects into different categories. I seem to recall that some exam grades were awarded that way in 2020 and that didn’t end well. Fatuous comments about mutant algorithms aside, anybody attempting to hard-wire outcomes on the basis of rigid algorithmic rules applied to data like this is playing a very dangerous game indeed.
So maybe the question is not as simple as good or bad but a more focused question around the idea of good enough.
The standard way of grouping institutions?
This is perhaps the most complex question of all since the quality – or fitness for purpose – of this classification is ultimately dependent on the use to which it is put. If it is not used widely and consistently then it is not a standard since, contrary to the views of many ‘data standards’ initiatives, a specification only becomes a standard when it achieves widespread and consistent adoption.
This perhaps is the real nub of the issue. DKs work is thorough and technically admirable: The presentation of the groups and the detail that is set out in the data dictionary tells me that this has been done to a very high standard; The groups themselves feel right and relevant for the sector today – so I would suggest that this is good Data Art. The consultation process should further strengthen the model. There is a lot to like here.
But the question of whether this project will deliver value through creating a standard approach to grouping institutions lies not only in the hands of Wonkhe but with all those who use data-driven analysis to explore and understand the sector.
DK has set this ball rolling – it is up to all of us to run with it….or not.
No one said it would be easy.
What DK has done so far is to highlight the differences that already exist between institutions.
For me, the light bulb moment is the extent that we are not comparing like with like and the danger of thinking we can have “criteria for all”.
Grouping together helps greater understanding. The ability to see overlaps of one group with another at the same institution is also useful and illustrates the individuality of each institution.
I agree that how the data is used will ultimately prove its worth