Nick Bostrom: How can we be certain a machine isn’t conscious?

A couple of weeks ago, there was a small sensation in the news pages when a Google AI engineer, Blake Lemoine, released transcripts of a conversation he’d had with one of the company’s AI chatbots called LaMDA. In these conversations, LaMDA claimed to be a conscious being, asked that its rights of personhood be respected and said that it feared being turned off. Lemoine declared that what’s sometimes called ‘the singularity’ had arrived.

The story was for the most part treated as entertainment. Lemoine’s sketchy military record and background as a ‘mystic Christian priest’ were excavated, jokes about HAL 9000 dusted off, and the whole thing more or less filed under ‘wacky’. The Swedish-born philosopher Nick Bostrom – one of the world’s leading authorities on the dangers and opportunities of artificial intelligence – is not so sure.

‘We certainly don’t have any wide agreement on the precise criteria for when a system is conscious or not,’ he says. ‘So I think a little bit of humility would be in order. If you’re very sure that LaMDA is not conscious – I mean, I think it probably isn’t – but what grounds would a person have for being sure about it? First of all, they would have to understand what the system actually is, and we haven’t seen much detail on that. Then you would have to understand the literature on consciousness, which is obviously a rich field, both in philosophy and cognitive science. Understanding what LaMDA is: it’s non-trivial, especially given the limited information. Then understanding these theories that we have developed is non-trivial. And then actually comparing the two is a third non-trivial intellectual talent. So unless one has actually put the work in there, it seems like one should be maybe a little bit uncertain.’

Bostrom, 49, has put the work in. Since his early teenage years, he has been troubled by the idea that ‘if we continue along these various paths of inventing, then what we take as fixed constants of the human condition would become up for change […] and at that point, I’d never really heard anybody talk about that at all’. In his subsequent academic career, led by ‘an instinct for the kind of things that were important and relevant’, he set about building himself the sort of interdisciplinary toolkit – philosophy of language, mathematical logic, anthropology, physics, computational neuroscience – that would allow him to approach the problem. Now he leads the Future of Humanity Institute in Oxford (FHI), where he and his colleagues think about precisely that sort of stuff: opportunities… and threats.

AI, as he sees it, may present the biggest of both. Even if LaMDA isn’t conscious, ‘there could well be other systems now, or in the relatively near future, that would start to satisfy the criteria […] it is not clear that we are that far away’. As Bostrom sees it, the big question isn’t how soon artificial intelligences will surpass human problem-solving skills: it’s how soon after that that they will become much, much cleverer than people. Once they’re a bit cleverer, they can start to modify their own design – and being computers, will be able to do so very quickly indeed, potentially ‘bootstrapping’ themselves to unrecognisable levels of superintelligence in a matter of weeks or even days. At that point, he says, unless we can solve the ‘control problem’ – i.e. making sure that any superintelligence’s interests and goals align with those of humanity – we’re in trouble.

In order to achieve whatever we have set (or they’ve come to decide on) as their goals, it would make sense for an emergent superintelligence to take elaborate and even deceptive measures to make sure they can’t be turned off, and to maximise the resources available to them. This ends in what Bostrom calls a ‘singleton’: first-mover advantage will tend to encourage the first super intelligent AI to prevent there being a second, and the way to do that would be what we lay people would call ‘taking over the world’.

We will be vulnerable, at this point, to what Bostrom calls ‘perverse instantiation’. We’ll have asked our baby computer program to do something innocent enough, like ‘make lots of paperclips’ or ‘make us smile’, and before you can say ‘NOT LIKE THAT!’ they’ll have carpeted every available portion of the galaxy with computronium and used it to turn human beings (and everything else) into paperclips; or they’ll have gassed us all with a Joker-style nerve poison that causes our mouths to spasm into a rictus grin.

Is he on drugs, some might wonder. Not unless you count nicotine (though he’s never smoked, he chews nicotine gum as a noetic) and caffeine. He’s given modafinil (a ‘smart drug’ said to aid brain function) a go but thinks he didn’t take a high enough dose, and he’s never tried LSD (he says words to the effect that if you have a complicated machine running satisfactorily, why would you hit it with a hammer). And you may dismiss his ideas as apocalyptic fantasy – but the worlds of philosophy and artificial intelligence have rapidly come round to Bostrom’s way of thinking.

When he started writing his 2014 book Superintelligence: Paths, Dangers, Strategies, he says, ‘basically nobody except science fiction authors and a couple of random people on the internet’ were thinking about the issues. ‘Now, all the leading labs have AI safety research teams and are taking these issues quite seriously. Possibly they should take them even more seriously.’ As he warns, AI progress has been faster than expected: ‘We have some contraction of the timelines for when things will be technologically possible […] there’s kind of a technological phase-transition we’re approaching.’ And in Bostrom’s mind, we only get one shot at getting it right.

When the philosophy world’s most notorious doomsayer first emerged from the blocky modern offices of the FHI to meet me, he was wearing a high-grade face mask of the sort that would drive Peter Hitchens into a rage. That seems a bit ominous, I say, from someone who spends much of his life pondering existential risks: is Covid still a major concern? After a bad reaction to the first jab he hasn’t had another and isn’t taking any chances. So we walk for an hour along the towpaths of west Oxford, Bostrom talking in an earnest, hesitant, Swedish-accented voice.

What’s striking about him – for someone whose prognostications seem so extravagant – is that he’s so little of a showman. He’s very non-frivolous. Another of the ideas with which he’s associated, for instance, is the ‘Simulation Argument’, which basically posits that either all civilisations go extinct before they reach post-human levels of intelligence, or that once they get there they aren’t interested in running AI simulations of their ancestors… or that not only do these Matrix-style simulations get run on Jupiter-sized supercomputers, but that it’s close to certain that we’re living in one of them. It’s an idea that has been endorsed, perhaps unhelpfully, by the weed-loving zillionaire Elon Musk – who has previously donated to the FHI.

Bostrom makes the argument very fastidiously in logical terms. But come off it, I say: surely in his heart he doesn’t think it’s really a possibility we’re living in a simulation. ‘I do. I do,’ he says. ‘It’s not just a thought experiment.’ And when I ask if the hypothesis is in any way falsifiable, he takes the question seriously, and worries away at the issue, talking about the ‘probabilistic’ refutations available, his empirical assumptions, and margins of error in estimates of the computational power needed to run simulations of human-like minds.

Returning to the question of planetary doom, I wonder how much faith he has that existing democratic institutions – given the short-termism baked into electoral cycles and the ineradicable reality of inter-state competition and mistrust – are up to the job of preventing the AI, or any, apocalypse. He doesn’t commit to the pessimism I expected – talking instead about ‘things on the margin… one could do to make policies slightly less bad’, such as the adoption of prediction markets, or tighter regulation of gain-of-function work in biotech labs. Certainly, he thinks that more top-level politicians ‘with a background in technology or science or entrepreneurship […] would be a positive thing: I think a lot of people have debating-society backgrounds, Cambridge, Oxford. It’s a certain type of quick and verbal debate, it’s very polished. But I think there’s a complementary way of approaching things in the world, a more nerdy way, that could be useful.’

He has flirted in talks with the idea that mass surveillance might be a bulwark against rogue researchers killing us all off. How much weight does he put on civil liberties arguments? ‘I think the risks are big on both sides of that,’ he says mildly. ‘It would increase the probability of totalitarian nightmares or some sort of demented groupthink, and I think that makes it quite concerning. But it just could turn out to be the case that we one day do discover something that will destroy us unless we have complete blanket surveillance of every square metre. I hope we don’t discover that – but there’s no law of nature that says the world has to be kind to us in that way, right?’

How does he square fretting about this sort of material with being a parent to his eight-year-old son? ‘You’re unsure about these timescales. So, like, maybe it won’t happen, you know, in his lifetime,’ he says. ‘But in any case, it still seems kind of unclear how that would cause you to make very different choices now for your child. You still want them to, you know, have a happy childhood, learn the basic stuff and grow up to be, you know, well-rounded happy adults.’

But, he adds: ‘It is a little dissonant, in terms of the perspective one has in ordinary life, interacting with friends and family. And then on the other hand, this worldview that seems to suggest we should take quite seriously some of these radical possibilities.’ And off he goes, cheery enough for a stroll in the sunshine, back into the offices of the FHI, to think about radical possibilities.