Some technologists hope that one day we will develop a superintelligent AI system that people will be able to have conversations with. Ask it a question, and it will offer an answer that sounds like something composed by a human expert. You could use it to ask for medical advice, or to help plan a holiday. Well, that’s the idea, at least.
In reality, we’re still a long way away from that. Even the most sophisticated systems of today are pretty dumb. I once got Meta’s AI chatbot BlenderBot to tell me that a prominent Dutch politician was a terrorist. In experiments where AI-powered chatbots were used to offer medical advice, they told pretend patients to kill themselves. Doesn’t fill you with a lot of optimism, does it?
That’s why AI labs are working hard to make their conversational AIs safer and more helpful before turning them loose in the real world. I just published a story about Alphabet-owned AI lab DeepMind’s latest effort: a new chatbot called Sparrow.
DeepMind’s new trick to making a good AI-powered chatbot was to have humans tell it how to behave—and force it to back up its claims using Google search. Human participants were then asked to evaluate how plausible the AI system’s answers were. The idea is to keep training the AI using dialogue between humans and machines.
In reporting the story, I spoke to Sara Hooker, who leads Cohere for AI, a nonprofit AI research lab.
She told me that one of the biggest hurdles in safely deploying conversational AI systems is their brittleness, meaning they perform brilliantly until they are taken to unfamiliar territory, which makes them behave unpredictably.
“It is also a difficult problem to solve because any two people might disagree on whether a conversation is inappropriate. And even if we agree that something is appropriate right now, this may change over time, or rely on shared context that can be subjective,” Hooker says.
Despite that, DeepMind’s findings underline that AI safety is not just a technical fix. You need humans in the loop.
Uber’s facial recognition is locking Indian drivers out of their accounts
Uber checks that a driver’s face matches what the company has on file through a program called “Real-Time ID Check.” It was rolled out in the US in 2016, in India in 2017, and then in other markets. “This prevents fraud and protects drivers’ accounts from being compromised. It also protects riders by building another layer of accountability into the app to ensure the right person is behind the wheel,” Joe Sullivan, Uber’s chief security officer, said in a statement in 2017.
But the company’s driver verification procedures are far from seamless. Adnan Taqi, an Uber driver in Mumbai, ran into trouble with it when the app prompted him to take a selfie around dusk. He was locked out for 48 hours, a big dent in his work schedule—he says he drives 18 hours straight, sometimes as much as 24 hours, to be able to make a living. Days later, he took a selfie that locked him out of his account again, this time for a whole week. That time, Taqi suspects, it came down to hair: “I hadn’t shaved for a few days and my hair had also grown out a bit,” he says.
More than a dozen drivers interviewed for this story detailed instances of having to find better lighting to avoid being locked out of their Uber accounts. “Whenever Uber asks for a selfie in the evenings or at night, I’ve had to pull over and go under a streetlight to click a clear picture—otherwise there are chances of getting rejected,” said Santosh Kumar, an Uber driver from Hyderabad.
Others have struggled with scratches on their cameras and low-budget smartphones. The problem isn’t unique to Uber. Drivers with Ola, which is backed by SoftBank, face similar issues.
Some of these struggles can be explained by natural limitations in face recognition technology. The software starts by converting your face into a set of points, explains Jernej Kavka, an independent technology consultant with access to Microsoft’s Face API, which is what Uber uses to power Real-Time ID Check.
“With excessive facial hair, the points change and it may not recognize where the chin is,” Kavka says. The same thing happens when there is low lighting or the phone’s camera doesn’t have a good contrast. “This makes it difficult for the computer to detect edges,” he explains.
But the software may be especially brittle in India. In December 2021, tech policy researchers Smriti Parsheera (a fellow with the CyberBRICS project) and Gaurav Jain (an economist with the International Finance Corporation) posted a preprint paper that audited four commercial facial processing tools—Amazon’s Rekognition, Microsoft Azure’s Face, Face++, and FaceX—for their performance on Indian faces. When the software was applied to a database of 32,184 election candidates, Microsoft’s Face failed to even detect the presence of a face in more than 1,000 images, throwing an error rate of more than 3%—the worst among the four.
It could be that the Uber app is failing drivers because its software was not trained on a diverse range of Indian faces, Parsheera says. But she says there may be other issues at play as well. “There could be a number of other contributing factors like lighting, angle, effects of aging, etc.,” she explained in writing. “But the lack of transparency surrounding the use of such systems makes it hard to provide a more concrete explanation.”
The Download: Uber’s flawed facial recognition, and police drones
One evening in February last year, a 23-year-old Uber driver named Niradi Srikanth was getting ready to start another shift, ferrying passengers around the south Indian city of Hyderabad. He pointed the phone at his face to take a selfie to verify his identity. The process usually worked seamlessly. But this time he was unable to log in.
Srikanth suspected it was because he had recently shaved his head. After further attempts to log in were rejected, Uber informed him that his account had been blocked. He is not alone. In a survey conducted by MIT Technology Review of 150 Uber drivers in the country, almost half had been either temporarily or permanently locked out of their accounts because of problems with their selfie.
Hundreds of thousands of India’s gig economy workers are at the mercy of facial recognition technology, with few legal, policy or regulatory protections. For workers like Srikanth, getting blocked from or kicked off a platform can have devastating consequences. Read the full story.
I met a police drone in VR—and hated it
Police departments across the world are embracing drones, deploying them for everything from surveillance and intelligence gathering to even chasing criminals. Yet none of them seem to be trying to find out how encounters with drones leave people feeling—or whether the technology will help or hinder policing work.
A team from University College London and the London School of Economics is filling in the gaps, studying how people react when meeting police drones in virtual reality, and whether they come away feeling more or less trusting of the police.
MIT Technology Review’s Melissa Heikkilä came away from her encounter with a VR police drone feeling unnerved. If others feel the same way, the big question is whether these drones are effective tools for policing in the first place. Read the full story.
Melissa’s story is from The Algorithm, her weekly newsletter covering AI and its effects on society. Sign up to receive it in your inbox every Monday.
I met a police drone in VR—and hated it
It’s important because police departments are racing way ahead and starting to use drones anyway, for everything from surveillance and intelligence gathering to chasing criminals.
Last week, San Francisco approved the use of robots, including drones that can kill people in certain emergencies, such as when dealing with a mass shooter. In the UK most police drones have thermal cameras that can be used to detect how many people are inside houses, says Pósch. This has been used for all sorts of things: catching human traffickers or rogue landlords, and even targeting people holding suspected parties during covid-19 lockdowns.
Virtual reality will let the researchers test the technology in a controlled, safe way among lots of test subjects, Pósch says.
Even though I knew I was in a VR environment, I found the encounter with the drone unnerving. My opinion of these drones did not improve, even though I’d met a supposedly polite, human-operated one (there are even more aggressive modes for the experiment, which I did not experience.)
Ultimately, it may not make much difference whether drones are “polite” or “rude” , says Christian Enemark, a professor at the University of Southampton, who specializes in the ethics of war and drones and is not involved in the research. That’s because the use of drones itself is a “reminder that the police are not here, whether they’re not bothering to be here or they’re too afraid to be here,” he says.
“So maybe there’s something fundamentally disrespectful about any encounter.”
GPT-4 is coming, but OpenAI is still fixing GPT-3
The internet is abuzz with excitement about AI lab OpenAI’s latest iteration of its famous large language model, GPT-3. The latest demo, ChatGPT, answers people’s questions via back-and-forth dialogue. Since its launch last Wednesday, the demo has crossed over 1 million users. Read Will Douglas Heaven’s story here.
GPT-3 is a confident bullshitter and can easily be prompted to say toxic things. OpenAI says it has fixed a lot of these problems with ChatGPT, which answers follow-up questions, admits its mistakes, challenges incorrect premises, and rejects inappropriate requests. It even refuses to answer some questions, such as how to be evil, or how to break into someone’s house.