Every week, teams each submit not only a point forecast predicting a single number outcome (say, that in one week there will be 500 deaths). They also submit probabilistic predictions that quantify the uncertainty by estimating the likelihood of the number of cases or deaths at intervals, or ranges, that get narrower and narrower, targeting a central forecast. For instance, a model might predict that there’s a 90 percent probability of seeing 100 to 500 deaths, a 50 percent probability of seeing 300 to 400, and 10 percent probability of seeing 350 to 360.
“It’s like a bull’s eye, getting more and more focused,” says Reich.
Funk adds: “The sharper you define the target, the less likely you are to hit it.” It’s fine balance, since an arbitrarily wide forecast will be correct, and also useless. “It should be as precise as possible,” says Funk, “while also giving the correct answer.”
In collating and evaluating all the individual models, the ensemble tries to optimize their information and mitigate their shortcomings. The result is a probabilistic prediction, statistical average, or a “median forecast.” It’s a consensus, essentially, with a more finely calibrated, and hence more realistic, expression of the uncertainty. All the various elements of uncertainty average out in the wash.
The study by Reich’s lab, which focused on projected deaths and evaluated about 200,000 forecasts from mid-May to late-December 2020 (an updated analysis with predictions for four more months will soon be added), found that the performance of individual models was highly variable. One week a model might be accurate, the next week it might be way off. But, as the authors wrote, “In combining the forecasts from all teams, the ensemble showed the best overall probabilistic accuracy.”
And these ensemble exercises serve not only to improve predictions, but also people’s trust in the models, says Ashleigh Tuite, an epidemiologist at the Dalla Lana School of Public Health at the University of Toronto. “One of the lessons of ensemble modeling is that none of the models is perfect,” Tuite says. “And even the ensemble sometimes will miss something important. Models in general have a hard time forecasting inflection points—peaks, or if things suddenly start accelerating or decelerating.”
The use of ensemble modeling is not unique to the pandemic. In fact, we consume probabilistic ensemble forecasts every day when Googling the weather and taking note that there’s 90 percent chance of precipitation. It’s the gold standard for both weather and climate predictions.
“It’s been a real success story and the way to go for about three decades,” says Tilmann Gneiting, a computational statistician at the Heidelberg Institute for Theoretical Studies and the Karlsruhe Institute of Technology in Germany. Prior to ensembles, weather forecasting used a single numerical model, which produced, in raw form, a deterministic weather forecast that was “ridiculously overconfident and wildly unreliable,” says Gneiting (weather forecasters, aware of this problem, subjected the raw results to subsequent statistical analysis that produced reasonably reliable probability of precipitation forecasts by the 1960s).
Gneiting notes, however, that the analogy between infectious disease and weather forecasting has its limitations. For one thing, the probability of precipitation doesn’t change in response to human behavior—it’ll rain, umbrella or no umbrella—whereas the trajectory of the pandemic responds to our preventative measures.
Forecasting during a pandemic is a system subject to a feedback loop. “Models are not oracles,” says Alessandro Vespignani, a computational epidemiologist at Northeastern University and ensemble hub contributor, who studies complex networks and infectious disease spread with a focus on the “techno-social” systems that drive feedback mechanisms. “Any model is providing an answer that is conditional on certain assumptions.”
When people process a model’s prediction, their subsequent behavioral changes upend the assumptions, change the disease dynamics and render the forecast inaccurate. In this way, modeling can be a “self-destroying prophecy.”
And there are other factors that could compound the uncertainty: seasonality, variants, vaccine availability or uptake; and policy changes like the swift decision from the CDC about unmasking. “These all amount to huge unknowns that, if you actually wanted to capture the uncertainty of the future, would really limit what you could say,” says Justin Lessler, an epidemiologist at the Johns Hopkins Bloomberg School of Public Health, and a contributor to the COVID-19 Forecast Hub.
The ensemble study of death forecasts observed that accuracy decays, and uncertainty grows, as models make predictions farther into the future—there was about two times the error looking four weeks ahead versus one week (four weeks is considered the limit for meaningful short-term forecasts; at the 20-week time horizon there was about five times the error).
“It’s fair to debate when things worked and when things didn’t.”
But assessing the quality of the models—warts and all—is an important secondary goal of forecasting hubs. And it’s easy enough to do, since short-term predictions are quickly confronted with the reality of the numbers tallied day-to-day, as a measure of their success.
Most researchers are careful to differentiate between this type of “forecast model,” aiming to make explicit and verifiable predictions about the future, which is only possible in the short- term; versus a “scenario model,” exploring “what if” hypotheticals, possible plotlines that might develop in the medium- or long-term future (since scenario models are not meant to be predictions, they shouldn’t be evaluated retrospectively against reality).
During the pandemic, a critical spotlight has often been directed at models with predictions that were spectacularly wrong. “While longer-term what-if projections are difficult to evaluate, we shouldn’t shy away from comparing short-term predictions with reality,” says Johannes Bracher, a biostatistician at the Heidelberg Institute for Theoretical Studies and the Karlsruhe Institute of Technology, who coordinates a German and Polish hub, and advises the European hub. “It’s fair to debate when things worked and when things didn’t,” he says. But an informed debate requires recognizing and considering the limits and intentions of models (sometimes the fiercest critics were those who mistook scenario models for forecast models).
Similarly, when predictions in any given situation prove particularly intractable, modelers should say so. “If we have learned one thing, it’s that cases are extremely difficult to model even in the short run,” says Bracher. “Deaths are a more lagged indicator and are easier to predict.”
In April, some of the European models were overly pessimistic and missed a sudden decrease in cases. A public debate ensued about the accuracy and reliability of pandemic models. Weighing in on Twitter, Bracher asked: “Is it surprising that the models are (not infrequently) wrong? After a 1-year pandemic, I would say: no.” This makes it all the more important, he says, that models indicate their level of certainty or uncertainty, that they take a realistic stance about how unpredictable cases are, and about the future course. “Modelers need to communicate the uncertainty, but it shouldn’t be seen as a failure,” Bracher says.
Trusting some models more than others
As an oft-quoted statistical aphorism goes, “All models are wrong, but some are useful.” But as Bracher notes, “If you do the ensemble model approach, in a sense you are saying that all models are useful, that each model has something to contribute”—though some models may be more informative or reliable than others.
Observing this fluctuation prompted Reich and others to try “training” the ensemble model—that is, as Reich explains, “building algorithms that teach the ensemble to ‘trust’ some models more than others and learn which precise combination of models works in harmony together.” Bracher’s team now contributes a mini-ensemble, built from only the models that have performed consistently well in the past, amplifying the clearest signal.
“The big question is, can we improve?” Reich says. “The original method is so simple. It seems like there has to be a way of improving on just taking a simple average of all these models.” So far, however, it is proving harder than expected—small improvements seem feasible, but dramatic improvements may be close to impossible.
A complementary tool for improving our overall perspective on the pandemic beyond week-to-week glimpses is to look further out on the time horizon, four to six months, with those “scenario modeling.” Last December, motivated by the surge in cases and the imminent availability of the vaccine, Lessler and collaborators launched the COVID-19 Scenario Modeling Hub, in consultation with the CDC.
Climate tech is back—and this time, it can’t afford to fail
Boston Metal’s strategy is to try to make the transition as digestible as possible for steelmakers. “We won’t own and operate steel plants,” says Adam Rauwerdink, who heads business development at the company. Instead, it plans to license the technology for electrochemical units that are designed to be a simple drop-in replacement for blast furnaces; the liquid iron that flows out of the electrochemical cells can be handled just as if it were coming out of a blast furnace, with the same equipment.
Working with industrial investors including ArcelorMittal, says Rauwerdink, allows the startup to learn “how to integrate our technology into their plants—how to handle the raw materials coming in, the metal products coming out of our systems, and how to integrate downstream into their established processes.”
The startup’s headquarters in a business park about 15 miles outside Boston is far from any steel manufacturing, but these days it’s drawing frequent visitors from the industry. There, the startup’s pilot-scale electrochemical unit, the size of a large furnace, is intentionally designed to be familiar to those potential customers. If you ignore the hordes of electrical cables running in and out of it, and the boxes of electric equipment surrounding it, it’s easy to forget that the unit is not just another part of the standard steelmaking process. And that’s exactly what Boston Metal is hoping for.
The company expects to have an industrial-scale unit ready for use by 2025 or 2026. The deadline is key, because Boston Metal is counting on commitments that many large steelmakers have made to reach zero carbon emissions by 2050. Given that the life of an average blast furnace is around 20 years, that means having the technology ready to license before 2030, as steelmakers plan their long-term capital expenditures. But even now, says Rauwerdink, demand is growing for green steel, especially in Europe, where it’s selling for a few hundred dollars a metric ton more than the conventional product.
It’s that kind of blossoming market for clean technologies that many of today’s startups are depending on. The recent corporate commitments to decarbonize, and the IRA and other federal spending initiatives, are creating significant demand in markets “that previously didn’t exist,” says Michael Kearney, a partner at Engine Ventures.
One wild card, however, will be just how aggressively and faithfully corporations pursue ways to transform their core businesses and to meet their publicly stated goals. Funding a small pilot-scale project, says Kearney, “looks more like greenwashing if you have no intention of scaling those projects.” Watching which companies move from pilot plants to full-scale commercial facilities will tell you “who’s really serious,” he says. Putting aside the fears of greenwashing, Kearney says it’s essential to engage these large corporations in the transition to cleaner technologies.
Susan Schofer, a partner at the venture firm SOSV, has some advice for those VCs and startups reluctant to work with existing companies in traditionally heavily polluting industries: Get over it. “We need to partner with them. These incumbents have important knowledge that we all need to get in order to effect change. So there needs to be healthy respect on both sides,” she says. Too often, she says, there is “an attitude that we don’t want to do that because it’s helping an incumbent industry.” But the reality, she says, is that finding ways for such industries to save energy or use cleaner technologies “can make the biggest difference in the near term.”
It’s tempting to dismiss the history of cleantech 1.0. It was more than a decade ago, and there’s a new generation of startups and investors. Far more money is around today, along with a broader range of financing options. Surely we’re savvier these days.
Making an image with generative AI uses as much energy as charging your phone
“If you’re doing a specific application, like searching through email … do you really need these big models that are capable of anything? I would say no,” Luccioni says.
The energy consumption associated with using AI tools has been a missing piece in understanding their true carbon footprint, says Jesse Dodge, a research scientist at the Allen Institute for AI, who was not part of the study.
Comparing the carbon emissions from newer, larger generative models and older AI models is also important, Dodge adds. “It highlights this idea that the new wave of AI systems are much more carbon intensive than what we had even two or five years ago,” he says.
Google once estimated that an average online search used 0.3 watt-hours of electricity, equivalent to driving 0.0003 miles in a car. Today, that number is likely much higher, because Google has integrated generative AI models into its search, says Vijay Gadepally, a research scientist at the MIT Lincoln lab, who did not participate in the research.
Not only did the researchers find emissions for each task to be much higher than they expected, but they discovered that the day-to-day emissions associated with using AI far exceeded the emissions from training large models. Luccioni tested different versions of Hugging Face’s multilingual AI model BLOOM to see how many uses would be needed to overtake training costs. It took over 590 million uses to reach the carbon cost of training its biggest model. For very popular models, such as ChatGPT, it could take just a couple of weeks for such a model’s usage emissions to exceed its training emissions, Luccioni says.
This is because large AI models get trained just once, but then they can be used billions of times. According to some estimates, popular models such as ChatGPT have up to 10 million users a day, many of whom prompt the model more than once.
Studies like these make the energy consumption and emissions related to AI more tangible and help raise awareness that there is a carbon footprint associated with using AI, says Gadepally, adding, “I would love it if this became something that consumers started to ask about.”
Dodge says he hopes studies like this will help us to hold companies more accountable about their energy usage and emissions.
“The responsibility here lies with a company that is creating the models and is earning a profit off of them,” he says.
The first CRISPR cure might kickstart the next big patent battle
And really, what’s the point of such a hard-won triumph unless it’s to enforce your rights? “Honestly, this train has been coming down the track since at least 2014, if not earlier. We’re at the collision point. I struggle to imagine there’s going to be a diversion,” says Sherkow. “Brace for impact.”
The Broad Institute didn’t answer any of my questions, and a spokesperson for MIT didn’t even reply to my email. That’s not a surprise. Private universities can be exceedingly obtuse when it comes to acknowledging their commercial activities. They are supposed to be centers of free inquiry and humanitarian intentions, so if employees get rich from biotechnology—and they do—they try to do it discreetly.
There are also strong reasons not to sue. Suing could make a nonprofit like the Broad Institute look bad. Really bad. That’s because it could get in the way of cures.
“It seems unlikely and undesirable, [as] legal challenges at this late date would delay saving patients,” says George Church, a Harvard professor and one of the original scientific founders of Editas, though he’s no longer closely involved with the company.
If a patent infringement lawsuit does get filed, it will happen sometime after Vertex notifies regulators it’s starting to sell the treatment. “That’s the starting gun,” says Sherkow. “There are no hypothetical lawsuits in the patent system, so one must wait until it’s sufficiently clear that an act of infringement is about to occur.”
How much money is at stake? It remains unclear what the demand for the Vertex treatment will be, but it could eventually prove a blockbuster. There are about 20,000 people with severe sickle-cell in the US who might benefit. And assuming a price of $3 million (my educated guess), that’s a total potential market of around $60 billion. A patent holder could potentially demand 10% of the take, or more.
Vertex can certainly defend itself. It’s a big, rich company, and through its partnership with the Swiss firm CRISPR Therapeutics, a biotech co-founded by Charpentier, Vertex has access to the competing set of intellectual-property claims—including those of UC Berkeley, which (though bested by Broad in the US) hold force in Europe and could be used to throw up a thicket of counterarguments.
Vertex could also choose to pay royalties. To do that, it would have to approach Editas, the biotech cofounded by Zhang and Church in Cambridge, Massachusetts, which previously bought exclusive rights to the Broad patents on CRISPR in the arena of human treatments, including sickle-cell therapies.