Liri can juggle so many jobs, in multiple countries, because she has hired out her face to Hour One, a startup that uses people’s likenesses to create AI-voiced characters that then appear in marketing and educational videos for organizations around the world. It is part of a wave of companies overhauling the way digital content is produced. And it has big implications for the human workforce.
Liri does her waitressing and bar work in person, but she has little idea what her digital clones are up to. “It is definitely a bit strange to think that my face can appear in videos or ads for different companies,” she says.
Hour One is not the only company taking deepfake tech mainstream, using it to produce mash-ups of real footage and AI-generated video. Some have used professional actors to add life to deepfaked personas. But Hour One doesn’t ask for any particular skills. You just need to be willing to hand over the rights to your face.
Hour One is building up a pool of what it calls “characters.” It says it has around 100 on its books so far, with more being added each week. “We’ve got a queue of people that are dying to become these characters,” says Natalie Monbiot, the company’s head of strategy.
Anyone can apply to become a character. Like a modeling agency, Hour One filters through applicants, selecting those it wants on its books. The company is aiming for a broad sample of characters that reflect the ages, genders, and racial backgrounds of people in the real world, says Monbiot. (Currently, around 80% of its characters are under 50 years old, 70% are female, and 25% are white.)
To create a character, Hour One uses a high-resolution 4K camera to film a person talking and making different facial expressions in front of a green screen. And that’s it for the human part of the performance. Plugging the resulting data into AI software that works in a similar way to deepfake tech, Hour One can generate an endless amount of footage of that person saying whatever it wants, in any language.
Hour One’s clients pay the company to use its characters in promotional or commercial video. They select a face, upload the text they want it to say, and get back a video of what looks like a real person delivering that script to a camera. The quickest service uses text-to-speech software to generate synthetic voices, which are synced with the characters’ mouth movements and facial expressions. Hour One also offers a premium service where the audio is recorded by professional voice actors. These voices are again fitted to the movements of the character in the video. Hour One says it has more than 40 clients, including real estate, e-commerce, digital health, and entertainment firms. One major client is Berlitz, an international language school that provides teacher-led video courses for dozens of languages.
According to Monbiot, Berlitz wanted to increase the number of videos it offered but struggled to do so using real human actors. They had to have production crews creating the same setup with the same actor over and over again, she says: “They found it really unsustainable. We’re talking about thousands of videos.”
Berlitz now works with Hour One to generate hundreds of videos in minutes. “We’re replacing the studio,” says Monbiot. “A human being doesn’t need to waste their time filming.”
Meta’s new AI can turn text prompts into videos
Although the effect is rather crude, the system offers an early glimpse of what’s coming next for generative artificial intelligence, and it is the next obvious step from the text-to-image AI systems that have caused huge excitement this year.
Meta’s announcement of Make-A-Video, which is not yet being made available to the public, will likely prompt other AI labs to release their own versions. It also raises some big ethical questions.
In the last month alone, AI lab OpenAI has made its latest text-to-image AI system DALL-E available to everyone, and AI startup Stability.AI launched Stable Diffusion, an open-source text-to-image system.
But text-to-video AI comes with some even greater challenges. For one, these models need a vast amount of computing power. They are an even bigger computational lift than large text-to-image AI models, which use millions of images to train, because putting together just one short video requires hundreds of images. That means it’s really only large tech companies that can afford to build these systems for the foreseeable future. They’re also trickier to train, because there aren’t large-scale data sets of high-quality videos paired with text.
To work around this, Meta combined data from three open-source image and video data sets to train its model. Standard text-image data sets of labeled still images helped the AI learn what objects are called and what they look like. And a database of videos helped it learn how those objects are supposed to move in the world. The combination of the two approaches helped Make-A-Video, which is described in a non-peer-reviewed paper published today, generate videos from text at scale.
Tanmay Gupta, a computer vision research scientist at the Allen Institute for Artificial Intelligence, says Meta’s results are promising. The videos it’s shared show that the model can capture 3D shapes as the camera rotates. The model also has some notion of depth and understanding of lighting. Gupta says some details and movements are decently done and convincing.
However, “there’s plenty of room for the research community to improve on, especially if these systems are to be used for video editing and professional content creation,” he adds. In particular, it’s still tough to model complex interactions between objects.
In the video generated by the prompt “An artist’s brush painting on a canvas,” the brush moves over the canvas, but strokes on the canvas aren’t realistic. “I would love to see these models succeed at generating a sequence of interactions, such as ‘The man picks up a book from the shelf, puts on his glasses, and sits down to read it while drinking a cup of coffee,’” Gupta says.
How AI is helping birth digital humans that look and sound just like us
Jennifer: And the team has also been exploring how these digital twins can be useful beyond the 2D world of a video conference.
Greg Cross: I guess the.. the big, you know, shift that’s coming right at the moment is the move from the 2D world of the internet, into the 3D world of the metaverse. So, I mean, and that, and that’s something we’ve always thought about and we’ve always been preparing for, I mean, Jack exists in full 3D, um, You know, Jack exists as a full body. So I mean, Jack can, you know, today we have, you know, we’re building augmented reality, prototypes of Jack walking around on a golf course. And, you know, we can go and ask Jack, how, how should we play this hole? Um, so these are some of the things that we are starting to imagine in terms of the way in which digital people, the way in which digital celebrities. Interact with us as we move into the 3D world.
Jennifer: And he thinks this technology can go a lot further.
Greg Cross: Healthcare and education are two amazing applications of this type of technology. And it’s amazing because we don’t have enough real people to deliver healthcare and education in the real world. So, I mean, so you can, you know, you can imagine how you can use a digital workforce to augment. And, and extend the skills and capability, not replace, but extend the skills and, and capabilities of real people.
Jennifer: This episode was produced by Anthony Green with help from Emma Cillekens. It was edited by me and Mat Honan, mixed by Garret Lang… with original music from Jacob Gorski.
If you have an idea for a story or something you’d like to hear, please drop a note to podcasts at technology review dot com.
Thanks for listening… I’m Jennifer Strong.
A bionic pancreas could solve one of the biggest challenges of diabetes
The bionic pancreas, a credit card-sized device called an iLet, monitors a person’s levels around the clock and automatically delivers insulin when needed through a tiny cannula, a thin tube inserted into the body. It is worn constantly, generally on the abdomen. The device determines all insulin doses based on the user’s weight, and the user can’t adjust the doses.
A Harvard Medical School team has submitted its findings from the study, described in the New England Journal of Medicine, to the FDA in the hopes of eventually bringing the product to market in the US. While a team from Boston University and Massachusetts General Hospital first tested the bionic pancreas in 2010, this is the most extensive trial undertaken so far.
The Harvard team, working with other universities, provided 219 people with type 1 diabetes who had used insulin for at least a year with a bionic pancreas device for 13 weeks. The team compared their blood sugar levels with those of 107 diabetic people who used other insulin delivery methods, including injection and insulin pumps, during the same amount of time.
The blood sugar levels of the bionic pancreas group fell from 7.9% to 7.3%, while the standard care group’s levels remained steady at 7.7%. The American Diabetes Association recommends a goal of less than 7.0%, but that’s only met by approximately 20% of people with type 1 diabetes, according to a 2019 study.
Other types of artificial pancreas exist, but they typically require the user to input information before they will deliver insulin, including the amount of carbohydrates they ate in their last meal. Instead, the iLet takes the user’s weight and the type of meal they’re eating, such as breakfast, lunch, or dinner, added by the user via the iLet interface, and it uses an adaptive learning algorithm to deliver insulin automatically.