Euro visions: Getting mouthy in Belgium

Marking the performance of Eurovision entrants is a tricky business – and it’s arguably become much trickier in recent years.

Back when national juries would sit and watch an orchestra attempt to make Ooh Ah Just a Little Bit sound like a pop song, they’d be listening closely to Gina G’s vocals to work out if her performance was up to the standard indicated on the EBU marking rubric.

These days phone voters have as much of a say as experts – causing national delegations to weigh up what they need to do to generate pissed-up popular appeal from televoters, while still taking steps to appeal to the “experts” across the continent.

But what’s really made it difficult in recent years has been the backing vocals thing.

As recently as 2019, countries entering the Eurovision had to supply all heard vocals live, themselves – with the standard Eurovision rule of seven folks on stage kicking in to prevent any of those BGT-style choirs stealing the show.

But these days recorded backing vocals are allowed. It’s almost certainly a better show for it – but clever countries find ways to make it look like their artist is dancing and singing all at the same time when they’re not, really.

And both international juries and folks at home find it harder to work out what’s being done by live humans, and what’s being done and perfected beforehand – either by multiple attempts or artificial intelligence.

Vub si zdub

I was thinking a lot about this back in January when we visited Vrije Universiteit Brussels, a university that for more than 50 years has been “committed to providing solutions to the challenges of tomorrow.”

There we were, enjoying a panini in the cafe bar of the arts centre on campus, when the part-time student leader that had agreed to meet us warned us that he couldn’t stay – because he had an exam that afternoon.

An oral exam! An undergraduate!

The coach that had transported our delegates around the city had been abuzz with conversations about generative AI, and the way in which students were already using it both to become more efficient in their academic efforts, and in some cases shortcut some of the effort that might have usually been expected from a student without access to such tools.

As we’ve noted before here, much undergraduate assessment involves the production of digital assets to be marked in bulk, later, as a proxy for learning. But what if it’s harder than ever to determine what was human-produced and what was AI assisted?

When we posed the puzzler to the student rep sharing a latte with us – how is VUB coping with tools that make it hard to determine what is a student’s work and what is assisted by AI – our host was clear:

We are talking about it. But I can’t cheat this afternoon. It’s me or nothing.

Pressure on

It being the end of the semester, Alexander was in the middle of the established assessment period – and had a number of exams to sit, almost all of around 30 minutes or so, and involving answering questions from an academic.

It turns out that the Flanders region of Belgium in particular has a tradition of oral examinations – often with written preparation. In some universities those traditional exams might also be supplemented by continuous assessment of students’ participation in classes (sometimes a test at the end of an hour of teaching) or presentations. There’s still written work, for sure – but watching people doing things is peppered through programmes in a way that felt alien to the student leaders on our trip.

Oral exams used to be a thing here too, and as ever we have Oxbridge to credit/blame for their abandonment. This fascinating paper points out that in 16th century England, university examinations were conducted in public, orally and in Latin, but “because of the domination of its curriculum by Newtonian mathematics”, Cambridge led the way in dropping the practice, with Oxford adopting its usual rapid changes to respond about 400 years later.

Four factors are identified in the paper as crucial in the oral/written shift:

the move from group socio-moral to individual cognitive assessment in the later 18th century; the differential difficulty of oral testing in different subjects; the impact of increased student numbers; the internal politics of Oxford and Cambridge.

Was the move away all about expansion and efficiency? As Molly Woerthen points out in this New York Times piece, upon their introduction written exams carried an aura of rigour, objectivity and modernity and a permanent record of performance – all while allowing asynchronous mass marking:

France’s baccalaureate exam, for example, culminates in the Grand Oral, a 20-minute session in which a panel of teachers examines a student on a topic that he or she has researched. In Norway, all students take three to four oral exams by the end of secondary school. If it’s possible for other countries to administer oral exams for millions of high school students each year, we can safely say that the reason this form of assessment has fallen out of favor in America has little to do with economics or keeping pace with something called progress. The reasons are cultural.

Performance anxiety

As usual, no silver bullets are available – and our friend in Flanders was denying any of the established downsides of oral exams.

As a student rep he was, in fact, only too aware of the extent to which oral assessment might exacerbate anxiety in students, make some worry that “performance” would overshadow or obscure their learning and understanding, and was alive to the idea that “oracy” wasn’t necessarily a necessary skill for all subject areas – or one to be treasured for many for disabled students. There were also, he said, ongoing concerns about the way in which dress, or accent, or ethnicity might get in the way of the judgement on offer from a marker.

Overcoming these issues doesn’t feel insurmountable. This paper presents a formative oral assessment format developed to provide feedback to student groups and large cohorts. This paper argues that oral assessments provide academic staff with a “more complete picture” of students’ understanding, going on to identify mitigations for the downsides. And this paper argues that the way in which oral exams change study habits is crucial – with students “overwhelmingly” recommending the use of oral exams because “they recognise the value of communication and teamwork in their future careers”.

Our visit had backed this up too. Having carried out both surveys and focus groups work, Alexander was a passionate advocate for the method – both because his view was that cheating in such assessments was almost impossible, and because it was clear to him that presenting and oral reasoning were important to most employers that his academic society had interacted with:

All assessment is performance, Jim. This assessment is the performance that is useful. We sit around in groups and practice. We enjoy that.”

Again, the evidence bears Alexander out. Some argue that oral exams are better at offering students the opportunity to explain and clarify what they have submitted. Some suggest that for small class sizes, orals can offer benefits that outweigh the potential disadvantages. And multiple lit reviews finds papers that champion oral exams and their educational impact, validity, reliability, acceptability and feasibility.

Some do find that students argue the format compromises their ability to perform – but it is arguably this group of students who might benefit most from practising verbal skills via the oral exam format. Some just find that they’re useful. Others like avoiding lengthy “production” proxies and prefer just being able to demonstrate understanding for 120 minutes.

But what was really baffling for Alexander was when we challenged him on the scalability of the model.

Scaling up

I could see workarounds for the danger of bias in assessment – with particular reference to EDI characteristics. I could see ways to reduce anxiety in students, and find ways to maximise their chances of doing well when asked to speak or clarify something.

But entirely innocently, I couldn’t work out how many academic departments might find enough capacity to undertake oral assessments for a large cohort of students. Was something going on in the funding model?

It was a tough discussion. Alexander struggled to understand the problem I was setting out, and I struggled to grasp the ease with which he thought a solution might be deployed. But in the end, we realised that we really, deep down, were having a chat about workload models:

You are saying that a student might complete the semester and officially the way they are marked is that they spend 15 hours on something and someone is given 15 minutes to look at their work and mark it and give feedback? I think students here would see that as an insult.

Multiple models are available – but if a summative assessment involves, say, 20 minutes of Q&A and ten minutes write up, there are few who would argue that 30 minutes is too long to be making a (usually partial) judgement about a student’s summative performance.

In other words, maybe it’s not that oral assessment doesn’t work because it can’t be scaled, and more that our broken workload model hides how little time we officially give to the assessment of each student’s performance. Bringing that out in the open would help everyone in the end.