These creepy fake humans herald a new age in AI
Once viewed as less desirable than real data, synthetic data is now seen by some as a panacea. Real data is messy and riddled with bias. New data privacy regulations make it hard to collect. By contrast, synthetic data is pristine and can be used to build more diverse data sets. You can produce perfectly labeled faces, say, of different ages, shapes, and ethnicities to build a face-detection system that works across populations.
But synthetic data has its limitations. If it fails to reflect reality, it could end up producing even worse AI than messy, biased real-world data—or it could simply inherit the same problems. “What I don’t want to do is give the thumbs up to this paradigm and say, ‘Oh, this will solve so many problems,’” says Cathy O’Neil, a data scientist and founder of the algorithmic auditing firm ORCAA. “Because it will also ignore a lot of things.”
Realistic, not real
Deep learning has always been about data. But in the last few years, the AI community has learned that good data is more important than big data. Even small amounts of the right, cleanly labeled data can do more to improve an AI system’s performance than 10 times the amount of uncurated data, or even a more advanced algorithm.
That changes the way companies should approach developing their AI models, says Datagen’s CEO and cofounder, Ofir Chakon. Today, they start by acquiring as much data as possible and then tweak and tune their algorithms for better performance. Instead, they should be doing the opposite: use the same algorithm while improving on the composition of their data.
But collecting real-world data to perform this kind of iterative experimentation is too costly and time intensive. This is where Datagen comes in. With a synthetic data generator, teams can create and test dozens of new data sets a day to identify which one maximizes a model’s performance.
To ensure the realism of its data, Datagen gives its vendors detailed instructions on how many individuals to scan in each age bracket, BMI range, and ethnicity, as well as a set list of actions for them to perform, like walking around a room or drinking a soda. The vendors send back both high-fidelity static images and motion-capture data of those actions. Datagen’s algorithms then expand this data into hundreds of thousands of combinations. The synthesized data is sometimes then checked again. Fake faces are plotted against real faces, for example, to see if they seem realistic.
Datagen is now generating facial expressions to monitor driver alertness in smart cars, body motions to track customers in cashier-free stores, and irises and hand motions to improve the eye- and hand-tracking capabilities of VR headsets. The company says its data has already been used to develop computer-vision systems serving tens of millions of users.
It’s not just synthetic humans that are being mass-manufactured. Click-Ins is a startup that uses synthetic AI to perform automated vehicle inspections. Using design software, it re-creates all car makes and models that its AI needs to recognize and then renders them with different colors, damages, and deformations under different lighting conditions, against different backgrounds. This lets the company update its AI when automakers put out new models, and helps it avoid data privacy violations in countries where license plates are considered private information and thus cannot be present in photos used to train AI.
Mostly.ai works with financial, telecommunications, and insurance companies to provide spreadsheets of fake client data that let companies share their customer database with outside vendors in a legally compliant way. Anonymization can reduce a data set’s richness yet still fail to adequately protect people’s privacy. But synthetic data can be used to generate detailed fake data sets that share the same statistical properties as a company’s real data. It can also be used to simulate data that the company doesn’t yet have, including a more diverse client population or scenarios like fraudulent activity.
Proponents of synthetic data say that it can help evaluate AI as well. In a recent paper published at an AI conference, Suchi Saria, an associate professor of machine learning and health care at Johns Hopkins University, and her coauthors demonstrated how data-generation techniques could be used to extrapolate different patient populations from a single set of data. This could be useful if, for example, a company only had data from New York City’s more youthful population but wanted to understand how its AI performs on an aging population with higher prevalence of diabetes. She’s now starting her own company, Bayesian Health, which will use this technique to help test medical AI systems.
The limits of faking it
But is synthetic data overhyped?
When it comes to privacy, “just because the data is ‘synthetic’ and does not directly correspond to real user data does not mean that it does not encode sensitive information about real people,” says Aaron Roth, a professor of computer and information science at the University of Pennsylvania. Some data generation techniques have been shown to closely reproduce images or text found in the training data, for example, while others are vulnerable to attacks that make them fully regurgitate that data.
This might be fine for a firm like Datagen, whose synthetic data isn’t meant to conceal the identity of the individuals who consented to be scanned. But it would be bad news for companies that offer their solution as a way to protect sensitive financial or patient information.
Research suggests that the combination of two synthetic-data techniques in particular—differential privacy and generative adversarial networks—can produce the strongest privacy protections, says Bernease Herman, a data scientist at the University of Washington eScience Institute. But skeptics worry that this nuance can be lost in the marketing lingo of synthetic-data vendors, which won’t always be forthcoming about what techniques they are using.
IBM wants to build a 100,000-qubit quantum computer
Quantum computing holds and processes information in a way that exploits the unique properties of fundamental particles: electrons, atoms, and small molecules can exist in multiple energy states at once, a phenomenon known as superposition, and the states of particles can become linked, or entangled, with one another. This means that information can be encoded and manipulated in novel ways, opening the door to a swath of classically impossible computing tasks.
As yet, quantum computers have not achieved anything useful that standard supercomputers cannot do. That is largely because they haven’t had enough qubits and because the systems are easily disrupted by tiny perturbations in their environment that physicists call noise.
Researchers have been exploring ways to make do with noisy systems, but many expect that quantum systems will have to scale up significantly to be truly useful, so that they can devote a large fraction of their qubits to correcting the errors induced by noise.
IBM is not the first to aim big. Google has said it is targeting a million qubits by the end of the decade, though error correction means only 10,000 will be available for computations. Maryland-based IonQ is aiming to have 1,024 “logical qubits,” each of which will be formed from an error-correcting circuit of 13 physical qubits, performing computations by 2028. Palo Alto–based PsiQuantum, like Google, is also aiming to build a million-qubit quantum computer, but it has not revealed its time scale or its error-correction requirements.
Because of those requirements, citing the number of physical qubits is something of a red herring—the particulars of how they are built, which affect factors such as their resilience to noise and their ease of operation, are crucially important. The companies involved usually offer additional measures of performance, such as “quantum volume” and the number of “algorithmic qubits.” In the next decade advances in error correction, qubit performance, and software-led error “mitigation,” as well as the major distinctions between different types of qubits, will make this race especially tricky to follow.
Refining the hardware
IBM’s qubits are currently made from rings of superconducting metal, which follow the same rules as atoms when operated at millikelvin temperatures, just a tiny fraction of a degree above absolute zero. In theory, these qubits can be operated in a large ensemble. But according to IBM’s own road map, quantum computers of the sort it’s building can only scale up to 5,000 qubits with current technology. Most experts say that’s not big enough to yield much in the way of useful computation. To create powerful quantum computers, engineers will have to go bigger. And that will require new technology.
How it feels to have a life-changing brain implant removed
Burkhart’s device was implanted in his brain around nine years ago, a few years after he was left unable to move his limbs following a diving accident. He volunteered to trial the device, which enabled him to move his hand and fingers. But it had to be removed seven and a half years later.
His particular implant was a small set of 100 electrodes, carefully inserted into a part of the brain that helps control movement. It worked by recording brain activity and sending these recordings to a computer, where they were processed using an algorithm. This was connected to a sleeve of electrodes worn on the arm. The idea was to translate thoughts of movement into electrical signals that would trigger movement.
Burkhart was the first to receive the implant, in 2014; he was 24 years old. Once he had recovered from the surgery, he began a training program to learn how to use it. Three times a week for around a year and a half, he visited a lab where the implant could be connected to a computer via a cable leading out of his head.
“It worked really well,” says Burkhart. “We started off just being able to open and close my hand, but after some time we were able to do individual finger movements.” He was eventually able to combine movements and control his grip strength. He was even able to play Guitar Hero.
“There was a lot that I was able to do, which was exciting,” he says. “But it was also still limited.” Not only was he only able to use the device in the lab, but he could only perform lab-based tasks. “Any of the activities we would do would be simplified,” he says.
For example, he could pour a bottle out, but it was only a bottle of beads, because the researchers didn’t want liquids around the electrical equipment. “It was kind of a bummer it wasn’t changing everything in my life, because I had seen how beneficial it could be,” he says.
At any rate, the device worked so well that the team extended the trial. Burkhart was initially meant to have the implant in place for 12 to 18 months, he says. “But everything was really successful … so we were able to continue on for quite a while after that.” The trial was extended on an annual basis, and Burkhart continued to visit the lab twice a week.
The Download: brain implant removal, and Nvidia’s AI payoff
Leggett told researchers that she “became one” with her device. It helped her to control the unpredictable, violent seizures she routinely experienced, and allowed her to take charge of her own life. So she was devastated when, two years later, she was told she had to remove the implant because the company that made it had gone bust.
The removal of this implant, and others like it, might represent a breach of human rights, ethicists say in a paper published earlier this month. And the issue will only become more pressing as the brain implant market grows in the coming years and more people receive devices like Leggett’s. Read the full story.
You can read more about what happens to patients when their life-changing brain implants are removed against their wishes in the latest issue of The Checkup, Jessica’s weekly newsletter giving you the inside track on all things biotech. Sign up to receive it in your inbox every Thursday.
If you’d like to read more about brain implants, why not check out:
+ Brain waves can tell us how much pain someone is in. The research could open doors for personalized brain therapies to target and treat the worst kinds of chronic pain. Read the full story.
+ An ALS patient set a record for communicating via a brain implant. Brain interfaces could let paralyzed people speak at almost normal speeds. Read the full story.
+ Here’s how personalized brain stimulation could treat depression. Implants that track and optimize our brain activity are on the way. Read the full story.