Connect with us

Tech

Google’s new AI can hear a snippet of song—and then keep on playing

Published

on

Google’s new AI can hear a snippet of song—and then keep on playing


A new AI system can create natural-sounding speech and music after being prompted with a few seconds of audio.

AudioLM, developed by Google researchers, generates audio that fits the style of the prompt, including complex sounds like piano music, or people speaking, in a way that is almost indistinguishable from the original recording. The technique shows promise for speeding up the process of training AI to generate audio, and it could eventually be used to auto-generate music to accompany videos.

(You can listen to all of the examples here.)

AI-generated audio is commonplace: voices on home assistants like Alexa use natural language processing. AI music systems like OpenAI’s Jukebox have already generated impressive results, but most existing techniques need people to prepare transcriptions and label text-based training data, which takes a lot of time and human labor. Jukebox, for example, uses text-based data to generate song lyrics.

AudioLM, described in a non-peer-reviewed paper last month, is different: it doesn’t require transcription or labeling. Instead, sound databases are fed into the program, and machine learning is used to compress the audio files into sound snippets, called “tokens,” without losing too much information. This tokenized training data is then fed into a machine-learning model that uses natural language processing to learn the sound’s patterns. 

To generate the audio, a few seconds of sound are fed into AudioLM, which then predicts what comes next. The process is similar to the way language models like GPT-3 predict what sentences and words typically follow one another. 

The audio clips released by the team sound pretty natural. In particular, piano music generated using AudioLM sounds more fluid than piano music generated using existing AI techniques, which tends to sound chaotic.

Roger Dannenberg, who researches computer-generated music at Carnegie Mellon University, says AudioLM already has much better sound quality than previous music generation programs. In particular, he says, AudioLM is surprisingly good at re-creating some of the repeating patterns inherent in human-made music. To generate realistic piano music, AudioLM has to capture a lot of the subtle vibrations contained in each note when piano keys are struck. The music also has to sustain its rhythms and harmonies over a period of time.

“That’s really impressive, partly because it indicates that they are learning some kinds of structure at multiple levels,” Dannenberg says.

AudioLM isn’t only confined to music. Because it was trained on a library of recordings of humans speaking sentences, the system can also generate speech that continues in the accent and cadence of the original speaker—although at this point those sentences can still seem like non sequiturs that don’t make any sense. AudioLM is trained to learn what types of sound snippets occur frequently together, and it uses the process in reverse to produce sentences. It also has the advantage of being able to learn the pauses and exclamations that are inherent in spoken languages but not easily translated into text. 

Rupal Patel, who researches information and speech science at Northeastern University, says that previous work using AI to generate audio could capture those nuances only if they were explicitly annotated in training data. In contrast, AudioLM learns those characteristics from the input data automatically, which adds to the realistic effect.

“There is a lot of what we could call linguistic information that is not in the words that you pronounce, but it’s another way of communicating based on the way you say things to express a specific intention or specific emotion,” says Neil Zeghidour, a co-creator of AudioLM. For example, someone may laugh after saying something to indicate that it was a joke. “All that makes speech natural,” he says.

Eventually, AI-generated music could be used to provide more natural-sounding background soundtracks for videos and slideshows. Speech generation technology that sounds more natural could help improve internet accessibility tools and bots that work in health care settings, says Patel. The team also hopes to create more sophisticated sounds, like a band with different instruments or sounds that mimic a recording of a tropical rainforest.

However, the technology’s ethical implications need to be considered, Patel says. In particular, it’s important to determine whether the musicians who produce the clips used as training data will get attribution or royalties from the end product—an issue that has cropped up with text-to-image AIs. AI-generated speech that’s indistinguishable from the real thing could also become so convincing that it enables the spread of misinformation more easily.

In the paper, the researchers write that they are already considering and working to mitigate these issues—for example, by developing techniques to distinguish natural sounds from sounds produced using AudioLM. Patel also suggested including audio watermarks in AI-generated products to make them easier to distinguish from natural audio.

Tech

The Download: AI films, and the threat of microplastics

Published

on

Welcome to the new surreal. How AI-generated video is changing film.


The Frost nails its uncanny, disconcerting vibe in its first few shots. Vast icy mountains, a makeshift camp of military-style tents, a group of people huddled around a fire, barking dogs. It’s familiar stuff, yet weird enough to plant a growing seed of dread. There’s something wrong here.

Welcome to the unsettling world of AI moviemaking. The Frost is a 12-minute movie from Detroit-based video creation company Waymark in which every shot is generated by an image-making AI. It’s one of the most impressive—and bizarre—examples yet of this strange new genre. Read the full story, and take an exclusive look at the movie.

—Will Douglas Heaven

Microplastics are everywhere. What does that mean for our immune systems?

Microplastics are pretty much everywhere you look. These tiny pieces of plastic pollution, less than five millimeters across, have been found in human blood, breast milk, and placentas. They’re even in our drinking water and the air we breathe.

Given their ubiquity, it’s worth considering what we know about microplastics. What are they doing to us? 

The short answer is: we don’t really know. But scientists have begun to build a picture of their potential effects from early studies in animals and clumps of cells, and new research suggests that they could affect not only the health of our body tissues, but our immune systems more generally. Read the full story.

—Jessica Hamzelou

Continue Reading

Tech

Microplastics are everywhere. What does that mean for our immune systems?

Published

on

Microplastics are everywhere. What does that mean for our immune systems?


Here, bits of plastic can end up collecting various types of bacteria, which cling to their surfaces. Seabirds that ingest them not only end up with a stomach full of plastic—which can end up starving them—but also get introduced to types of bacteria that they wouldn’t encounter otherwise. It seems to disturb their gut microbiomes.

There are similar concerns for humans. These tiny bits of plastic, floating and flying all over the world, could act as a “Trojan horse,” introducing harmful drug-resistant bacteria and their genes, as some researchers put it.

It’s a deeply unsettling thought. As research plows on, hopefully we’ll learn not only what microplastics are doing to us, but how we might tackle the problem.

Read more from Tech Review’s archive

It is too simplistic to say we should ban all plastic. But we could do with revolutionizing the way we recycle it, as my colleague Casey Crownhart pointed out in an article published last year. 

We can use sewage to track the rise of antimicrobial-resistant bacteria, as I wrote in a previous edition of the Checkup. At this point, we need all the help we can get …

… which is partly why scientists are also exploring the possibility of using tiny viruses to treat drug-resistant bacterial infections. Phages were discovered around 100 years ago and are due a comeback!

Our immune systems are incredibly complicated. And sex matters: there are important differences between the immune systems of men and women, as Sandeep Ravindran wrote in this feature, which ran in our magazine issue on gender.

Continue Reading

Tech

Welcome to the new surreal. How AI-generated video is changing film.

Published

on

Welcome to the new surreal. How AI-generated video is changing film.


Fast and cheap

Artists are often the first to experiment with new technology. But the immediate future of generative video is being shaped by the advertising industry. Waymark made The Frost to explore how generative AI could be built into its products. The company makes video creation tools for businesses looking for a fast and cheap way to make commercials. Waymark is one of several startups, alongside firms such as Softcube and Vedia AI, that offer bespoke video ads for clients with just a few clicks.

Waymark’s current tech, launched at the start of the year, pulls together several different AI techniques, including large language models, image recognition, and speech synthesis, to generate a video ad on the fly. Waymark also drew on its large data set of non-AI-generated commercials created for previous customers. “We have hundreds of thousands of videos,” says CEO Alex Persky-Stern. “We’ve pulled the best of those and trained it on what a good video looks like.”

To use Waymark’s tool, which it offers as part of a tiered subscription service starting at $25 a month, users supply the web address or social media accounts for their business, and it goes off and gathers all the text and images it can find. It then uses that data to generate a commercial, using OpenAI’s GPT-3 to write a script that is read aloud by a synthesized voice over selected images that highlight the business. A slick minute-long commercial can be generated in seconds. Users can edit the result if they wish, tweaking the script, editing images, choosing a different voice, and so on. Waymark says that more than 100,000 people have used its tool so far.

The trouble is that not every business has a website or images to draw from, says Parker. “An accountant or a therapist might have no assets at all,” he says. 

Waymark’s next idea is to use generative AI to create images and video for businesses that don’t yet have any—or don’t want to use the ones they have. “That’s the thrust behind making The Frost,” says Parker. “Create a world, a vibe.”

The Frost has a vibe, for sure. But it is also janky. “It’s not a perfect medium yet by any means,” says Rubin. “It was a bit of a struggle to get certain things from DALL-E, like emotional responses in faces. But at other times, it delighted us. We’d be like, ‘Oh my God, this is magic happening before our eyes.’”

This hit-and-miss process will improve as the technology gets better. DALL-E 2, which Waymark used to make The Frost, was released just a year ago. Video generation tools that generate short clips have only been around for a few months.  

The most revolutionary aspect of the technology is being able to generate new shots whenever you want them, says Rubin: “With 15 minutes of trial and error, you get that shot you wanted that fits perfectly into a sequence.” He remembers cutting the film together and needing particular shots, like a close-up of a boot on a mountainside. With DALL-E, he could just call it up. “It’s mind-blowing,” he says. “That’s when it started to be a real eye-opening experience as a filmmaker.”

Continue Reading

Copyright © 2021 Seminole Press.