In late 2012, AI scientists first figured out how to get neural networks to “see.” They proved that software designed to loosely mimic the human brain could dramatically improve existing computer-vision systems. The field has since learned how to get neural networks to imitate the way we reason, hear, speak, and write.
But while AI has grown remarkably human-like—even superhuman—at achieving a specific task, it still doesn’t capture the flexibility of the human brain. We can learn skills in one context and apply them to another. By contrast, though DeepMind’s game-playing algorithm AlphaGo can beat the world’s best Go masters, it can’t extend that strategy beyond the board. Deep-learning algorithms, in other words, are masters at picking up patterns, but they cannot understand and adapt to a changing world.
Researchers have many hypotheses about how this problem might be overcome, but one in particular has gained traction. Children learn about the world by sensing and talking about it. The combination seems key. As kids begin to associate words with sights, sounds, and other sensory information, they are able to describe more and more complicated phenomena and dynamics, tease apart what is causal from what reflects only correlation, and construct a sophisticated model of the world. That model then helps them navigate unfamiliar environments and put new knowledge and experiences in context.
AI systems, on the other hand, are built to do only one of these things at a time. Computer-vision and audio-recognition algorithms can sense things but cannot use language to describe them. A natural-language model can manipulate words, but the words are detached from any sensory reality. If senses and language were combined to give an AI a more human-like way to gather and process new information, could it finally develop something like an understanding of the world?
The hope is that these “multimodal” systems, with access to both the sensory and linguistic “modes” of human intelligence, should give rise to a more robust kind of AI that can adapt more easily to new situations or problems. Such algorithms could then help us tackle more complex problems, or be ported into robots that can communicate and collaborate with us in our daily life.
New advances in language-processing algorithms like OpenAI’s GPT-3 have helped. Researchers now understand how to replicate language manipulation well enough to make combining it with sensing capabilities more potentially fruitful. To start with, they are using the very first sensing capability the field achieved: computer vision. The results are simple bimodal models, or visual-language AI.
In the past year, there have been several exciting results in this area. In September, researchers at the Allen Institute for Artificial Intelligence, AI2, created a model that can generate an image from a text caption, demonstrating the algorithm’s ability to associate words with visual information. In November, researchers at the University of North Carolina, Chapel Hill, developed a method that incorporates images into existing language models, which boosted the models’ reading comprehension.
OpenAI then used these ideas to extend GPT-3. At the start of 2021, the lab released two visual-language models. One links the objects in an image to the words that describe them in a caption. The other generates images based on a combination of the concepts it has learned. You can prompt it, for example, to produce “a painting of a capybara sitting in a field at sunrise.” Though it may have never seen this before, it can mix and match what it knows of paintings, capybaras, fields, and sunrises to dream up dozens of examples.
Achieving more flexible intelligence wouldn’t just unlock new AI applications: it would make them safer, too.
More sophisticated multimodal systems will also make possible more advanced robotic assistants (think robot butlers, not just Alexa). The current generation of AI-powered robots primarily use visual data to navigate and interact with their surroundings. That’s good for completing simple tasks in constrained environments, like fulfilling orders in a warehouse. But labs like AI2 are working to add language and incorporate more sensory inputs, like audio and tactile data, so the machines can understand commands and perform more complex operations, like opening a door when someone is knocking.
In the long run, multimodal breakthroughs could help overcome some of AI’s biggest limitations. Experts argue, for example, that its inability to understand the world is also why it can easily fail or be tricked. (An image can be altered in a way that’s imperceptible to humans but makes an AI identify it as something completely different.) Achieving more flexible intelligence wouldn’t just unlock new AI applications: it would make them safer, too. Algorithms that screen résumés wouldn’t treat irrelevant characteristics like gender and race as signs of ability. Self-driving cars wouldn’t lose their bearings in unfamiliar surroundings and crash in the dark or in snowy weather. Multimodal systems might become the first AIs we can really trust with our lives.
ChatGPT is about to revolutionize the economy. We need to decide what that looks like.
When Anton Korinek, an economist at the University of Virginia and a fellow at the Brookings Institution, got access to the new generation of large language models such as ChatGPT, he did what a lot of us did: he began playing around with them to see how they might help his work. He carefully documented their performance in a paper in February, noting how well they handled 25 “use cases,” from brainstorming and editing text (very useful) to coding (pretty good with some help) to doing math (not great).
ChatGPT did explain one of the most fundamental principles in economics incorrectly, says Korinek: “It screwed up really badly.” But the mistake, easily spotted, was quickly forgiven in light of the benefits. “I can tell you that it makes me, as a cognitive worker, more productive,” he says. “Hands down, no question for me that I’m more productive when I use a language model.”
When GPT-4 came out, he tested its performance on the same 25 questions that he documented in February, and it performed far better. There were fewer instances of making stuff up; it also did much better on the math assignments, says Korinek.
Since ChatGPT and other AI bots automate cognitive work, as opposed to physical tasks that require investments in equipment and infrastructure, a boost to economic productivity could happen far more quickly than in past technological revolutions, says Korinek. “I think we may see a greater boost to productivity by the end of the year—certainly by 2024,” he says.
What’s more, he says, in the longer term, the way the AI models can make researchers like himself more productive has the potential to drive technological progress.
That potential of large language models is already turning up in research in the physical sciences. Berend Smit, who runs a chemical engineering lab at EPFL in Lausanne, Switzerland, is an expert on using machine learning to discover new materials. Last year, after one of his graduate students, Kevin Maik Jablonka, showed some interesting results using GPT-3, Smit asked him to demonstrate that GPT-3 is, in fact, useless for the kinds of sophisticated machine-learning studies his group does to predict the properties of compounds.
“He failed completely,” jokes Smit.
It turns out that after being fine-tuned for a few minutes with a few relevant examples, the model performs as well as advanced machine-learning tools specially developed for chemistry in answering basic questions about things like the solubility of a compound or its reactivity. Simply give it the name of a compound, and it can predict various properties based on the structure.
Newly revealed coronavirus data has reignited a debate over the virus’s origins
Data collected in 2020—and kept from public view since then—potentially adds weight to the animal theory. It highlights a potential suspect: the raccoon dog. But exactly how much weight it adds depends on who you ask. New analyses of the data have only reignited the debate, and stirred up some serious drama.
The current ruckus starts with a study shared by Chinese scientists back in February 2022. In a preprint (a scientific paper that has not yet been peer-reviewed or published in a journal), George Gao of the Chinese Center for Disease Control and Prevention (CCDC) and his colleagues described how they collected and analyzed 1,380 samples from the Huanan Seafood Market.
These samples were collected between January and March 2020, just after the market was closed. At the time, the team wrote that they only found coronavirus in samples alongside genetic material from people.
There were a lot of animals on sale at this market, which sold more than just seafood. The Gao paper features a long list, including chickens, ducks, geese, pheasants, doves, deer, badgers, rabbits, bamboo rats, porcupines, hedgehogs, crocodiles, snakes, and salamanders. And that list is not exhaustive—there are reports of other animals being traded there, including raccoon dogs. We’ll come back to them later.
But Gao and his colleagues reported that they didn’t find the coronavirus in any of the 18 species of animal they looked at. They suggested that it was humans who most likely brought the virus to the market, which ended up being the first known epicenter of the outbreak.
Fast-forward to March 2023. On March 4, Florence Débarre, an evolutionary biologist at Sorbonne University in Paris, spotted some data that had been uploaded to GISAID, a website that allows researchers to share genetic data to help them study and track viruses that cause infectious diseases. The data appeared to have been uploaded in June 2022. It seemed to have been collected by Gao and his colleagues for their February 2022 study, although it had not been included in the actual paper.
Fostering innovation through a culture of curiosity
And so I think a big part of it as a company, by setting these ambitious goals, it forces us to say if we want to be number one, if we want to be top tier in these areas, if we want to continue to generate results, how do we get there using technology? And so that really forces us to throw away our assumptions because you can’t follow somebody, if you want to be number one you can’t follow someone to become number one. And so we understand that the path to get there, it’s through, of course, technology and the software and the enablement and the investment, but it really is by becoming goal-oriented. And if we look at these examples of how do we create the infrastructure on the technology side to support these ambitious goals, we ourselves have to be ambitious in turn because if we bring a solution that’s also a me too, that’s a copycat, that doesn’t have differentiation, that’s not going to propel us, for example, to be a top 10 supply chain. It just doesn’t pass muster.
So I think at the top level, it starts with the business ambition. And then from there we can organize ourselves at the intersection of the business ambition and the technology trends to have those very rich discussions and being the glue of how do we put together so many moving pieces because we’re constantly scanning the technology landscape for new advancing and emerging technologies that can come in and be a part of achieving that mission. And so that’s how we set it up on the process side. As an example, I think one of the things, and it’s also innovation, but it doesn’t get talked about as much, but for the community out there, I think it’s going to be very relevant is, how do we stay on top of the data sovereignty questions and data localization? There’s a lot of work that needs to go into rethinking what your cloud, private, public, edge, on-premise look like going forward so that we can remain cutting edge and competitive in each of our markets while meeting the increasing guidance that we’re getting from countries and regulatory agencies about data localization and data sovereignty.
And so in our case, as a global company that’s listed in Hong Kong and we operate all around the world, we’ve had to really think deeply about the architecture of our solutions and apply innovation in how we can architect for a longer term growth, but in a world that’s increasingly uncertain. So I think there’s a lot of drivers in some sense, which is our corporate aspirations, our operating environment, which has continued to have a lot of uncertainty, and that really forces us to take a very sharp lens on what cutting edge looks like. And it’s not always the bright and shiny technology. Cutting edge could mean going to the executive committee and saying, Hey, we’re going to face a challenge about compliance. Here’s the innovation we’re bringing about architecture so that we can handle not just the next country or regulatory regime that we have to comply with, but the next 10, the next 50.
Laurel: Well, and to follow up with a bit more of a specific example, how does R&D help improve manufacturing in the software supply chain as well as emerging technologies like artificial intelligence and the industrial metaverse?
Art: Oh, I love this one because this is the perfect example of there’s a lot happening in the technology industry and there’s so much back to the earlier point of applied curiosity and how we can try this. So specifically around artificial intelligence and industrial metaverse, I think those go really well together with what are Lenovo’s natural strengths. Our heritage is as a leading global manufacturer, and now we’re looking to also transition to services-led, but applying AI and technologies like the metaverse to our factories. I think it’s almost easier to talk about the inverse, Laurel, which is if we… Because, and I remember very clearly we’ve mapped this out, there’s no area within the supply chain and manufacturing that is not touched by these areas. If I think about an example, actually, it’s very timely that we’re having this discussion. Lenovo was recognized just a few weeks ago at the World Economic Forum as part of the global lighthouse network on leading manufacturing.
And that’s based very much on applying around AI and metaverse technologies and embedding them into every aspect of what we do about our own supply chain and manufacturing network. And so if I pick a couple of examples on the quality side within the factory, we’ve implemented a combination of digital twin technology around how we can design to cost, design to quality in ways that are much faster than before, where we can prototype in the digital world where it’s faster and lower cost and correcting errors is more upfront and timely. So we are able to much more quickly iterate on our products. We’re able to have better quality. We’ve taken advanced computer vision so that we’re able to identify quality defects earlier on. We’re able to implement technologies around the industrial metaverse so that we can train our factory workers more effectively and better using aspects of AR and VR.
And we’re also able to, one of the really important parts of running an effective manufacturing operation is actually production planning, because there’s so many thousands of parts that are coming in, and I think everyone who’s listening knows how much uncertainty and volatility there have been in supply chains. So how do you take such a multi-thousand dimensional planning problem and optimize that? Those are things where we apply smart production planning models to keep our factories fully running so that we can meet our customer delivery dates. So I don’t want to drone on, but I think literally the answer was: there is no place, if you think about logistics, planning, production, scheduling, shipping, where we didn’t find AI and metaverse use cases that were able to significantly enhance the way we run our operations. And again, we’re doing this internally and that’s why we’re very proud that the World Economic Forum recognized us as a global lighthouse network manufacturing member.
Laurel: It’s certainly important, especially when we’re bringing together computing and IT environments in this increasing complexity. So as businesses continue to transform and accelerate their transformations, how do you build resiliency throughout Lenovo? Because that is certainly another foundational characteristic that is so necessary.