If nothing else, our increasingly digital world means one thing – more data.
Whether it’s satellites above our heads, phones in our pockets, or smart-watches on our wrists, the world is awash with 1s and 0s. It’s (deceptively) common sense to say that all that data should be collected, stored, owned, shared, accessed, linked, and analysed appropriately. Universities are deeply involved in opportunities to make use of that data. It fuels research, helping to answer questions about our world, leading to new solutions. Businesses are rushing in to help develop and commercialise those solutions.
But, there’s a problem. The growing volume, velocity, and variety of data outstrips human capabilities to actually make use of it. We can’t process that much information that quickly, in order to make informed decisions.
When combined with Moore’s law, and humanity’s growing body of knowledge and expertise, artificial intelligence (AI) is the label à la mode for doing something useful with that growing morass of data. However, its definition is rather slippery – usually involving achieving a range of goals (identifying/correlating/reasoning/planning/learning/acting) through a range of techniques (statistics/search/probability/algorithms/optimisation) from a range of fields (computer science/mathematics/psychology/linguistics/politics/philosophy). AI offers the promise of achieving those goals at speed and at scale. Many prefer the more modest terms “machine learning” or “data science”. To me, it’s all just smart, quick, automated stuff.
Arcadia
The UK has always been among the world-leading nations in AI, with Alan Turing one of those involved in its theoretical genesis in the 1950s (linked to the invention of digital computing). Although the popularity of AI (and different schools of thought within it) has waxed and waned, beneath all the hyperbole, a glittering future awaits us. AI should help solve a range of problems, from the mundane to the profound, the simple to the complex.
It can make hard copy texts machine-readable, as demonstrated by Google Scholar. It can beat world champions at ancient strategic games, see IBM’s Watson or DeepMind AlphaGo. It can run self-driving cars, such as those of Google and Uber. It can predict illnesses, as shown by DeepMind Health. Over time, AI is also likely to have an impact on many core university activities, from technology-enhanced teaching to back-end administration. AI should be able to improve efficiency and economic growth – the latter totalling a terrifically exciting £232 billion for the UK over the next twelve years, according to PWC. 2015 Bank of America Corporation research says it could boost productivity by up to 30%.
The government’s certainly excited about this. In October 2017 it commissioned the independent report Growing the artificial intelligence industry in the UK, by Wendy Hall and Jérôme Pesenti. Last week saw the publication of AI in the UK: ready, willing and able? the first report of the House of Lords Select Committee on Artificial Intelligence. And the “almost £1bn” Artificial Intelligence Sector Deal is out today, the latest to stem from the industrial strategy, overseen by the departments for Digital, Culture, Media and Sport (DCMS) and Business, Energy and Industrial Strategy (BEIS). Expect breathless press releases in an unholy union of government and tech industry PR.
Black mirrors
But, data and AI also comes with some huge risks. Data can be used unethically, to harm people, deliberately or otherwise. This isn’t just about “Russian troll farms”, the cynical use of psychometrics, or phishing scams – but also the unconscious biases of AI producers being hard-wired into systems, or just “solutionism”.
There is growing public concern about just what “appropriate” uses of data are, not just including issues of harm, but also of informed consent. At the launch of the new Bennett Institute for Public Policy in Cambridge recently, Mustafa Suleyman of DeepMind (a British AI firm founded in 2010 and bought by Google for £400m in 2014) spoke of a “transition period” between scientific breakthroughs and widespread, accepted implementation. But any such gap in expectations puts public trust at risk. Do the benefits people get justify the risks of sharing their data and being experimented on? And are some people getting more benefits and fewer risks than others?
At present, computers and the software that runs on them are deliberately designed to leave no audit trail, because programmers are constantly repeating functions and iterating between different versions, only reverting back to the previous version if things go wrong. A/B testing is a standard approach in software development, with AI no exception. This all makes transparency and repeatability hard. The use of data in business processes seldom considers ethical issues transparently, if at all. Whether this is Amazon’s recommended purchases, Netflix’s viewing suggestions, or SwiftKey’s predictive text replies – this is already happening with a range of AI applications, and often feels quite helpful rather than intrusive or harmful. Until it doesn’t. Pearson recently appeared in the press for testing “growth mindset” messaging in one of its learning software services on 9,298 students at 165 US institutions, without the consent of either. In 2012, Facebook experimented by deliberately altering the sentiment of 700,000 people’s “home feeds” to influence their emotions, again without their consent. A 2013 research paper by University of Cambridge academics was criticised for predicting Facebook user characteristics by analysing their “likes”, an approach similar to that used later by Cambridge Analytica.
It’s also fashionable for data to be used by the government to design interventions that “nudge” citizens, to automate processes, and to train AI-decision making. Theresa Marteau (Director of the Behaviour and Health Research Unit at the University of Cambridge) questioned the ethics of such policies, noting the inverse correlation between policy acceptability and effectiveness, and recommending people are better informed about the available evidence.
Gus O’Donnell (head of the civil service 2005-11) expressed concern that privacy fears mean the UK government is “going backwards” on the use of administrative data. But, initiatives such as the Administrative Data Research Network (ADRN) and the Farr Institute are using careful safeguards to enable academic research using public datasets, and an increasing amount of data is available openly. Given that citizens pay for public services and usually lack the choice of opting out of them, ethical expectations of them should be high.
A lot hinges on the common sense word “appropriately” in the starting paragraph above. Much data, and the software and AI that works on top of it, is currently a black box. We only see the front-end, not who made it, why, and how. Huge privacy breaches are just one lens for viewing the risks. “Inappropriate” uses of data and AI have the potential to deepen existing inequalities, by profiling and systematically disadvantaging the already disadvantaged. People are being “risk profiled” at each stage of legal systems using algorithms biased by their skin colour. Employers are recruiting using algorithms that could be unfair for some applicants. AI has already been found to project gender stereotypes when analysing images and text. Companies and nations could develop data and AI monopolies, as Facebook and Google are starting to. If China, Russia, the UK, and America can’t find common ground about privacy, or the role of the state in regulating data, then the open and hopeful world wide web of dot.com days risks being balkanised. The list goes on. The devices that AI reaches us through can be black mirrors, reflecting the flaws in the users and the creators.
The white tower
Universities should be a beacon of hope in this rapidly evolving landscape of data and smart apps. They should have a lot of those pesky experts. They should prioritise the greater public good over private interest. They should think long-term, beyond quarterly reports and single careers.
They’re already bridging borders through collaborative research projects, sharing datasets and developing solutions, including using AI. They have experience handling research data and conducting studies ethically. They’ve invested in the people and processes required to comply with data protection laws, and are starting to do the same for learning analytics using student data.
So, is this a time for business as usual? To do the GPDR training, update the software, and fill out the research ethics committee application? Or, is something more pro-active required?
I suggest the latter. Other than being the mother of all research impact studies, the Cambridge Analytica case is instructive. Up to 87m users may have had their data leaked. Some rather important votes could have been manipulated. Risk-based regulators needed a whistleblower to get their attention. Powerful global corporations had their dirty laundry dragged into the open. Politicians demonstrated their inability to grasp the issues. An academic was at best naïve. And a world-leading institution, even if it’s proven to have done nothing wrong, has had its reputation tarnished – if only by association.
Stephen Toope (the University of Cambridge vice chancellor) told me that “universities must be more rigorous about how data is stored, understood, deployed, and shared”. So far, it’s been reported that two companies (Cambridge Analytica and Cubeyou) with links to the Psychometrics Centre at the University of Cambridge have been suspended from Facebook over data leaks. The social media giant also called on the ICO to investigate apps developed at the centre. The university said academics had been publishing research using Facebook data for the past five years, including one co-authored with Facebook staff. It appears the centre’s contracts were arranged through the university’s business school and technology transfer office, rather than through a research ethics committee. Today the university announces a new £10m “AI supercomputer”, the UK’s 16th. The pressure from the government for knowledge exchange, technology transfer, and commercialisation is only likely to grow.
How many institutions have the systems, skills and processes required in this rapidly evolving area – beyond pockets of excellence? How robust are the sector’s approaches to data ethics and are they keeping up with the latest data and AI developments? Have all institutions considered the ethical principles outlined in the Lords Committee report and the government’s Data Science Ethical Framework? Are they explicit about who is accountable for different data uses?
Five things universities can do
There are five opportunities for universities to help lead us through these interesting times.
1) Ethical data literacy
The higher education sector can look seriously at how it helps students, staff, and the wider community to be confident and ethical when interacting with data. That includes interpreting the myriad forms of data in the modern age, some of which are completely novel. Diane Coyle (Bennett Institute) called for a data literacy module in all undergraduate courses. But, data literacy should be pursued at all stages of learning, not just while enrolled. If the world is data rich, then there’s a moral obligation to ensure that everybody can make sense of that data, so nobody comes out poorer. There should be specific courses designed for policymakers and the media.
There’s also a commercial opportunity here. Courses in “data analytics” and “data science” are booming, as more employers demand people with these skills. Provision at all levels should expand, but how often are ethical considerations embedded beyond tick-box exercises, if they feature at all? The moral imperative is even stronger when it comes to educating those who develop and deploy systems that use data, such as AI.
Universities can’t do this alone. They’ve got a few other things on their plate. Serious government investment is required in data skills and training, to make the most of opportunities such as AI, but also to mitigate the risks. It’s reassuring to hear about the investment in 8,000 computer science teachers and 1,000 PhDs, but that investment should include upskilling regulators. Let’s hope the sector deal does better than the “capital shortages and management deficiencies” of the Alvey programme during the 1980s.
2) A more diverse tech industry
The tech industry is infamous for its lack of diversity – by race, class, and gender. Currently only 7% of A-level computer science students, 17% of tech sector staff, and 5% of tech sector leaders – are women. Martha Lane Fox (Doteverone) quoted Bill Gates’ famous statement about a “sea of dudes” at tech conferences.
This is a wicked problem to crack, involving long-term and coordinated commitment to changing popular culture, the entire cradle-to-grave learning experience, and workplaces. Universities are already doing much, whether it’s research on careers, coding clubs, or how different jobs are portrayed in the media. But, they could do a lot more to increase diversity at all ages. For instance, why isn’t every university offering coding courses to female students, staff, and across the local community? And why aren’t all staff and students spending time with local school children?
3) Ethical research funding and public procurement
Just because a researcher (working at a university, in the private sector, or both) can do something, doesn’t mean they should. Is data, tech, and AI really the silver bullet to modern life’s ills that it’s often sold as? Do we need to collect data without first thinking why? And might some old-fashioned things such as spending more time together in-person and out in nature, in order to build meaningful relationships, be important too? Obviously, it’s not an either/or question, but it’s amazing how often smart researchers and the highly-commercial tech industry totally ignore one half of this very human equation. Developments such as open science are leading to a growing understanding among researchers about the interaction between interventions and contexts, part-explaining reproducibility challenges, but again – much more needs doing.
There is a duty here on research funding and on public procurement, to ensure that meaningful ethical considerations are built into funding, purchasing, and evaluating programmes. There also needs to be greater investment in researching emerging issues such as public engagement with data, more nuanced understanding of human progress (beyond GDP), and interactions with place (e.g. the Digital State research strand at the Bennett Institute). As trusted public institutions universities should be lobbying harder for these developments, not waiting to be dragged along. Why is it Microsoft rather than UK universities suggesting a Hippocratic oath for data? Why haven’t UUK picked up Hetan Shah’s idea of echoing the human genome project by pushing for “data stewardship” rather than ownership? Why are no universities listed as partners for the excellent new Ada Lovelace Institute? These are just some of the opportunities for universities to demonstrate leadership when we need them to.
4) World-leading transparency
Transparency has to come first, before consent or regulation. In research, universities can draw on open science practices of data sharing where possible and the parallel sharing of research instruments. They can also do more to support registries of trials and experiments. Data ethics processes, due diligence and key decisions should be public.
They also need to be clearer in communicating to students about how their personal data may be used, including in emerging learner analytics systems. This should feature meaningful and informed consent.
Named staff should be accountable for uses of data (including AI), appropriate insurance should be in place, and evaluations should be systematic and open. Universities can also look at who they procure from, to ensure they also follow ethical best practice in regards to data and AI.
5) Pro-active regulation
The steer from the Lords Committee report is for individual sectors to regulate relevant data and AI in their patch, backed-up by institutions such as the ICO and the Competition and Markets Authority (CMA) where necessary. And yet, at the Bennett Institute launch, Suleyman and Lane Fox joined the Committee in calling for more public investment in regulating AI, to ensure that it is transparent, accountable and ethical. Suleyman said AI should be “designed and used to the very highest ethical standards and the most modern and innovative forms of governance”. He also called for “real-time access into the inner workings of software”, to allow for quicker, pragmatic, experimental, and real-time approaches to oversight. Michael Barber would like him, though as explained on Wonkhe, university data isn’t quite the same as images of eyes or the rules of board games.
Regulation should also be multi-national as the tech firms involved usually are, for instance learning from the New York State approach to AI accountability. But Toope, an international human rights law scholar, queried how greater regulation and international cooperation would square with the current government and private sector calls for deregulation, and fragile international relations. Fox suggested the shifting mood among the public would prompt action from politicians and Suleyman said good regulation provides clarity, which the private sector should like. How can university’s accelerate and lead these developments?
What next?
There is much activity afoot in this area, with GDPR hitting in less than a month and more committee reports due out of Parliament soon. The government’s new £9m Centre for Data Ethics and Innovation will start work in an interim state and launch a public consultation shortly, before establishing a statutory footing and permanent role. Even with another £11m for research on data “challenges” from the Physical Sciences Research Council (EPSRC), this is small beans compared to the almost £1bn total sector deal investment.
Universities are doing some good work already but are far too quiet on these issues. They have an opportunity to lead the charge in taking advantage of data and AI, but also to help us do so in a humane, ethical, and sustainable way – and that starts with being good role models themselves.
I am a member of the National Statistician’s Data Ethics Advisory Committee. It has set out six ethical principles which it uses as a basis for considering research and other proposals linking information held by Government and other “big data”. The principles can be found at https://www.statisticsauthority.gov.uk/about-the-authority/committees/nsdec/ . They provide a good structure for anyone – including university research ethics committees – working to ensure that AI is used responsibly.