AI voice actors sound more human than ever—and they’re ready to hire
The company blog post drips with the enthusiasm of a ’90s US infomercial. WellSaid Labs describes what clients can expect from its “eight new digital voice actors!” Tobin is “energetic and insightful.” Paige is “poised and expressive.” Ava is “polished, self-assured, and professional.”
Each one is based on a real voice actor, whose likeness (with consent) has been preserved using AI. Companies can now license these voices to say whatever they need. They simply feed some text into the voice engine, and out will spool a crisp audio clip of a natural-sounding performance.
WellSaid Labs, a Seattle-based startup that spun out of the research nonprofit Allen Institute of Artificial Intelligence, is the latest firm offering AI voices to clients. For now, it specializes in voices for corporate e-learning videos. Other startups make voices for digital assistants, call center operators, and even video-game characters.
Not too long ago, such deepfake voices had something of a lousy reputation for their use in scam calls and internet trickery. But their improving quality has since piqued the interest of a growing number of companies. Recent breakthroughs in deep learning have made it possible to replicate many of the subtleties of human speech. These voices pause and breathe in all the right places. They can change their style or emotion. You can spot the trick if they speak for too long, but in short audio clips, some have become indistinguishable from humans.
AI voices are also cheap, scalable, and easy to work with. Unlike a recording of a human voice actor, synthetic voices can also update their script in real time, opening up new opportunities to personalize advertising.
But the rise of hyperrealistic fake voices isn’t consequence-free. Human voice actors, in particular, have been left to wonder what this means for their livelihoods.
How to fake a voice
Synthetic voices have been around for a while. But the old ones, including the voices of the original Siri and Alexa, simply glued together words and sounds to achieve a clunky, robotic effect. Getting them to sound any more natural was a laborious manual task.
Deep learning changed that. Voice developers no longer needed to dictate the exact pacing, pronunciation, or intonation of the generated speech. Instead, they could feed a few hours of audio into an algorithm and have the algorithm learn those patterns on its own.
“If I’m Pizza Hut, I certainly can’t sound like Domino’s, and I certainly can’t sound like Papa John’s.”
Rupal Patel, founder and CEO of VocaliD
Over the years, researchers have used this basic idea to build voice engines that are more and more sophisticated. The one WellSaid Labs constructed, for example, uses two primary deep-learning models. The first predicts, from a passage of text, the broad strokes of what a speaker will sound like—including accent, pitch, and timbre. The second fills in the details, including breaths and the way the voice resonates in its environment.
Making a convincing synthetic voice takes more than just pressing a button, however. Part of what makes a human voice so human is its inconsistency, expressiveness, and ability to deliver the same lines in completely different styles, depending on the context.
Capturing these nuances involves finding the right voice actors to supply the appropriate training data and fine-tune the deep-learning models. WellSaid says the process requires at least an hour or two of audio and a few weeks of labor to develop a realistic-sounding synthetic replica.
AI voices have grown particularly popular among brands looking to maintain a consistent sound in millions of interactions with customers. With the ubiquity of smart speakers today, and the rise of automated customer service agents as well as digital assistants embedded in cars and smart devices, brands may need to produce upwards of a hundred hours of audio a month. But they also no longer want to use the generic voices offered by traditional text-to-speech technology—a trend that accelerated during the pandemic as more and more customers skipped in-store interactions to engage with companies virtually.
“If I’m Pizza Hut, I certainly can’t sound like Domino’s, and I certainly can’t sound like Papa John’s,” says Rupal Patel, a professor at Northeastern University and the founder and CEO of VocaliD, which promises to build custom voices that match a company’s brand identity. “These brands have thought about their colors. They’ve thought about their fonts. Now they’ve got to start thinking about the way their voice sounds as well.”
Whereas companies used to have to hire different voice actors for different markets—the Northeast versus Southern US, or France versus Mexico—some voice AI firms can manipulate the accent or switch the language of a single voice in different ways. This opens up the possibility of adapting ads on streaming platforms depending on who is listening, changing not just the characteristics of the voice but also the words being spoken. A beer ad could tell a listener to stop by a different pub depending on whether it’s playing in New York or Toronto, for example. Resemble.ai, which designs voices for ads and smart assistants, says it’s already working with clients to launch such personalized audio ads on Spotify and Pandora.
The gaming and entertainment industries are also seeing the benefits. Sonantic, a firm that specializes in emotive voices that can laugh and cry or whisper and shout, works with video-game makers and animation studios to supply the voice-overs for their characters. Many of its clients use the synthesized voices only in pre-production and switch to real voice actors for the final production. But Sonantic says a few have started using them throughout the process, perhaps for characters with fewer lines. Resemble.ai and others have also worked with film and TV shows to patch up actors’ performances when words get garbled or mispronounced.
Newly revealed coronavirus data has reignited a debate over the virus’s origins
Data collected in 2020—and kept from public view since then—potentially adds weight to the animal theory. It highlights a potential suspect: the raccoon dog. But exactly how much weight it adds depends on who you ask. New analyses of the data have only reignited the debate, and stirred up some serious drama.
The current ruckus starts with a study shared by Chinese scientists back in February 2022. In a preprint (a scientific paper that has not yet been peer-reviewed or published in a journal), George Gao of the Chinese Center for Disease Control and Prevention (CCDC) and his colleagues described how they collected and analyzed 1,380 samples from the Huanan Seafood Market.
These samples were collected between January and March 2020, just after the market was closed. At the time, the team wrote that they only found coronavirus in samples alongside genetic material from people.
There were a lot of animals on sale at this market, which sold more than just seafood. The Gao paper features a long list, including chickens, ducks, geese, pheasants, doves, deer, badgers, rabbits, bamboo rats, porcupines, hedgehogs, crocodiles, snakes, and salamanders. And that list is not exhaustive—there are reports of other animals being traded there, including raccoon dogs. We’ll come back to them later.
But Gao and his colleagues reported that they didn’t find the coronavirus in any of the 18 species of animal they looked at. They suggested that it was humans who most likely brought the virus to the market, which ended up being the first known epicenter of the outbreak.
Fast-forward to March 2023. On March 4, Florence Débarre, an evolutionary biologist at Sorbonne University in Paris, spotted some data that had been uploaded to GISAID, a website that allows researchers to share genetic data to help them study and track viruses that cause infectious diseases. The data appeared to have been uploaded in June 2022. It seemed to have been collected by Gao and his colleagues for their February 2022 study, although it had not been included in the actual paper.
Fostering innovation through a culture of curiosity
And so I think a big part of it as a company, by setting these ambitious goals, it forces us to say if we want to be number one, if we want to be top tier in these areas, if we want to continue to generate results, how do we get there using technology? And so that really forces us to throw away our assumptions because you can’t follow somebody, if you want to be number one you can’t follow someone to become number one. And so we understand that the path to get there, it’s through, of course, technology and the software and the enablement and the investment, but it really is by becoming goal-oriented. And if we look at these examples of how do we create the infrastructure on the technology side to support these ambitious goals, we ourselves have to be ambitious in turn because if we bring a solution that’s also a me too, that’s a copycat, that doesn’t have differentiation, that’s not going to propel us, for example, to be a top 10 supply chain. It just doesn’t pass muster.
So I think at the top level, it starts with the business ambition. And then from there we can organize ourselves at the intersection of the business ambition and the technology trends to have those very rich discussions and being the glue of how do we put together so many moving pieces because we’re constantly scanning the technology landscape for new advancing and emerging technologies that can come in and be a part of achieving that mission. And so that’s how we set it up on the process side. As an example, I think one of the things, and it’s also innovation, but it doesn’t get talked about as much, but for the community out there, I think it’s going to be very relevant is, how do we stay on top of the data sovereignty questions and data localization? There’s a lot of work that needs to go into rethinking what your cloud, private, public, edge, on-premise look like going forward so that we can remain cutting edge and competitive in each of our markets while meeting the increasing guidance that we’re getting from countries and regulatory agencies about data localization and data sovereignty.
And so in our case, as a global company that’s listed in Hong Kong and we operate all around the world, we’ve had to really think deeply about the architecture of our solutions and apply innovation in how we can architect for a longer term growth, but in a world that’s increasingly uncertain. So I think there’s a lot of drivers in some sense, which is our corporate aspirations, our operating environment, which has continued to have a lot of uncertainty, and that really forces us to take a very sharp lens on what cutting edge looks like. And it’s not always the bright and shiny technology. Cutting edge could mean going to the executive committee and saying, Hey, we’re going to face a challenge about compliance. Here’s the innovation we’re bringing about architecture so that we can handle not just the next country or regulatory regime that we have to comply with, but the next 10, the next 50.
Laurel: Well, and to follow up with a bit more of a specific example, how does R&D help improve manufacturing in the software supply chain as well as emerging technologies like artificial intelligence and the industrial metaverse?
Art: Oh, I love this one because this is the perfect example of there’s a lot happening in the technology industry and there’s so much back to the earlier point of applied curiosity and how we can try this. So specifically around artificial intelligence and industrial metaverse, I think those go really well together with what are Lenovo’s natural strengths. Our heritage is as a leading global manufacturer, and now we’re looking to also transition to services-led, but applying AI and technologies like the metaverse to our factories. I think it’s almost easier to talk about the inverse, Laurel, which is if we… Because, and I remember very clearly we’ve mapped this out, there’s no area within the supply chain and manufacturing that is not touched by these areas. If I think about an example, actually, it’s very timely that we’re having this discussion. Lenovo was recognized just a few weeks ago at the World Economic Forum as part of the global lighthouse network on leading manufacturing.
And that’s based very much on applying around AI and metaverse technologies and embedding them into every aspect of what we do about our own supply chain and manufacturing network. And so if I pick a couple of examples on the quality side within the factory, we’ve implemented a combination of digital twin technology around how we can design to cost, design to quality in ways that are much faster than before, where we can prototype in the digital world where it’s faster and lower cost and correcting errors is more upfront and timely. So we are able to much more quickly iterate on our products. We’re able to have better quality. We’ve taken advanced computer vision so that we’re able to identify quality defects earlier on. We’re able to implement technologies around the industrial metaverse so that we can train our factory workers more effectively and better using aspects of AR and VR.
And we’re also able to, one of the really important parts of running an effective manufacturing operation is actually production planning, because there’s so many thousands of parts that are coming in, and I think everyone who’s listening knows how much uncertainty and volatility there have been in supply chains. So how do you take such a multi-thousand dimensional planning problem and optimize that? Those are things where we apply smart production planning models to keep our factories fully running so that we can meet our customer delivery dates. So I don’t want to drone on, but I think literally the answer was: there is no place, if you think about logistics, planning, production, scheduling, shipping, where we didn’t find AI and metaverse use cases that were able to significantly enhance the way we run our operations. And again, we’re doing this internally and that’s why we’re very proud that the World Economic Forum recognized us as a global lighthouse network manufacturing member.
Laurel: It’s certainly important, especially when we’re bringing together computing and IT environments in this increasing complexity. So as businesses continue to transform and accelerate their transformations, how do you build resiliency throughout Lenovo? Because that is certainly another foundational characteristic that is so necessary.
The Download: covid’s origin drama, and TikTok’s uncertain future
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.
Newly-revealed coronavirus data has reignited a debate over the virus’s origins
This week, we’ve seen the resurgence of a debate that has been swirling since the start of the pandemic—where did the virus that causes covid-19 come from?
For the most part, scientists have maintained that the virus probably jumped from an animal to a human at the Huanan Seafood Market in Wuhan at some point in late 2019. But some claim that the virus leaped from humans to animals, rather than the other way around. And many continue to claim that the virus somehow leaked from a nearby laboratory that was studying coronaviruses in bats.
Data collected in 2020—and kept from public view since then—potentially adds weight to the animal theory. It highlights a potential suspect: the raccoon dog. But exactly how much weight it adds depends on who you ask. Read the full story.
This story is from The Checkup, Jessica’s weekly biotech newsletter. Sign up to receive it in your inbox every Thursday.
Read more of MIT Technology Review’s covid reporting:
+ Our senior biotech editor Antonio Regalado investigated the origins of the coronavirus behind covid-19 in his five-part podcast series Curious Coincidence.
+ Meet the scientist at the center of the covid lab leak controversy. Shi Zhengli has spent years at the Wuhan Institute of Virology researching coronaviruses that live in bats. Her work has come under fire as the world tries to understand where covid-19 came from. Read the full story.
+ This scientist now believes covid started in Wuhan’s wet market. Here’s why. Michael Worobey of the University of Arizona, believes that a spillover of the virus from animals at the Huanan Seafood market was almost certainly behind the origin of the pandemic. Read the full story.
I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.
1 TikTok’s future in the US is hanging in the balance
Banning it is a colossal challenge, and officials still lack the legal authority to do so. (WP $)
+ TikTok CEO Shou Zi Chew was grilled by a congressional committee. (FT $)
+ He told lawmakers the company would earn their trust. (WSJ $)
+ Meanwhile, TikTok paid for influencers to travel to DC to lobby its cause. (Wired $)
2 A crypto fugitive has been arrested in Montenegro
Do Kwon has been on the run since TerraUSD stablecoin collapsed last year. (WSJ $)
+ Want to mine Bitcoin? Get yourself to Texas. (Reuters)
+ What’s next for crypto. (MIT Technology Review)
3 Twitter’s getting rid of its legacy blue checks
On the entirely serious date of April 1. (The Verge)+ The platform’s still an unattractive prospect for advertisers. (Vox)
4 Chatbots are having tough conversations for us
ChatGPT is adept at writing scripts for sensitive talks with kids and colleagues. (NYT $)
+ OpenAI has given ChatGPT access to the web’s live data. (The Verge)
+ How Character.AI became a billion-dollar unicorn. (WSJ $)
+ The inside story of how ChatGPT was built from the people who made it. (MIT Technology Review)
5 Jack Dorsey’s Block has been accused of fraudulent transactions
The payments company denied it, and claims it inflated its users numbers, too.(FT $)
+ Dorsey doesn’t have a track record of caring about this kind of thing. (The Information $)
6 Homeowners associations are secretly installing surveillance systems
The system tracks license plates and follows residents’ movements. (The Intercept)
7 Inside the tricky ethics of using DNA to solve crimes
A new database could help to protect users’ privacy. (Wired $)|
+ The citizen scientist who finds killers from her couch. (MIT Technology Review)
8 There’s plenty of reasons to be optimistic about the climate
Healthier, more sustainable diets are a good place to start. (Scientific American)
+ Taking stock of our climate past, present, and future. (MIT Technology Review)
9 TikTok keeps hectoring us
It seems we just can’t get enough of being aggressively told what to do. (Vox)
10 Don’t get scammed by a deepfake
CallerID can’t be trusted to protect you from rogue AI calls. (Gizmodo)
Quote of the day
“Wait, I need content.”
—TikTok fashion creator Kristine Thompson refuses to miss a content opportunity during a trip to the US Capitol to lobby against a potential TikTok ban, she tells the New York Times.
The big story
This sci-fi blockchain game could help create a metaverse that no one owns
Dark Forest is a vast universe, and most of it is shrouded in darkness. Your mission, should you choose to accept it, is to venture into the unknown, avoid being destroyed by opposing players who may be lurking in the dark, and build an empire of the planets you discover and can make your own.
But while the video game seemingly looks and plays much like other online strategy games, it doesn’t rely on the servers running other popular online strategy games. And it may point to something even more profound: the possibility of a metaverse that isn’t owned by a big tech company. Read the full story.
We can still have nice things
A place for comfort, fun and distraction in these weird times. (Got any ideas? Drop me a line or tweet ’em at me.)
+ If underwater terrors are your thing, Joe Romiero takes some seriously impressive shark pictures and videos.
+ Try as it might, Ted Lasso’s British dialog falls wide of the mark.
+ Let’s have a good old snoop around some celebrities’ bedrooms.
+ Why we can’t get enough of those fancy candles.
+ Interviewing animals with a tiny microphone, it doesn’t get much better than that.