In March 2020, when the WHO declared a pandemic, the public sequence database GISAID held 524 covid sequences. Over the next month scientists uploaded 6,000 more. By the end of May, the total was over 35,000. (In contrast, global scientists added 40,000 flu sequences to GISAID in all of 2019.)
“Without a name, forget about it—we cannot understand what other people are saying,” says Anderson Brito, a postdoc in genomic epidemiology at the Yale School of Public Health, who contributes to the Pango effort.
As the number of covid sequences spiraled, researchers trying to study them were forced to create entirely new infrastructure and standards on the fly. A universal naming system has been one of the most important elements of this effort: without it, scientists would struggle to talk to each other about how the virus’s descendants are traveling and changing—either to flag up a question or, even more critically, to sound the alarm.
Where Pango came from
In April 2020, a handful of prominent virologists in the UK and Australia proposed a system of letters and numbers for naming lineages, or new branches, of the covid family. It had a logic, and a hierarchy, even though the names it generated—like B.1.1.7—were a bit of a mouthful.
One of the authors on the paper was Áine O’Toole, a PhD candidate at the University of Edinburgh. Soon she’d become the primary person actually doing that sorting and classifying, eventually combing through hundreds of thousands of sequences by hand.
She says: “Very early on, it was just who was available to curate the sequences. That ended up being my job for a good bit. I guess I never understood quite the scale we were going to get to.”
She quickly set about building software to assign new genomes to the right lineages. Not long after that, another researcher, postdoc Emily Scher, built a machine-learning algorithm to speed things up even more.
They named the software Pangolin, a tongue-in-cheek reference to a debate about the animal origin of covid. (The whole system is now simply known as Pango.)
The naming system, along with the software to implement it, quickly became a global essential. Although the WHO has recently started using Greek letters for variants that seem especially concerning, like delta, those nicknames are for the public and the media. Delta actually refers to a growing family of variants, which scientists call by their more precise Pango names: B.1.617.2, AY.1, AY.2, and AY.3.
“When alpha emerged in the UK, Pango made it very easy for us to look for those mutations in our genomes to see if we had that lineage in our country too,” says Jolly. “Ever since then, Pango has been used as the baseline for reporting and surveillance of variants in India.”
Because Pango offers a rational, orderly approach to what would otherwise be chaos, it may forever change the way scientists name viral strains—allowing experts from all over the world to work together with a shared vocabulary. Brito says: “Most likely, this will be a format we’ll use for tracking any other new virus.”
Many of the foundational tools for tracking covid genomes have been developed and maintained by early-career scientists like O’Toole and Scher over the last year and a half. As the need for worldwide covid collaboration exploded, scientists rushed to support it with ad hoc infrastructure like Pango. Much of that work fell to tech-savvy young researchers in their 20s and 30s. They used informal networks and tools that were open source—meaning they were free to use, and anyone could volunteer to add tweaks and improvements.
“The people on the cutting edge of new technologies tend to be grad students and postdocs,” says Angie Hinrichs, a bioinformatician at UC Santa Cruz who joined the Pangolin project earlier this year. For example, O’Toole and Scher work in the lab of Andrew Rambaut, a genomic epidemiologist who posted the first public covid sequences online after receiving them from Chinese scientists. “They just happened to be perfectly placed to provide these tools that became absolutely critical,” Hinrichs says.
It hasn’t been easy. For most of 2020, O’Toole took on the bulk of the responsibility for identifying and naming new lineages by herself. The university was shuttered, but she and another of Rambaut’s PhD students, Verity Hill, got permission to come into the office. Her commute, walking 40 minutes to school from the apartment where she lived alone, gave her some sense of normalcy.
Every few weeks, O’Toole would download the entire covid repository from the GISAID database, which had grown exponentially each time. Then she would hunt around for groups of genomes with mutations that looked similar, or things that looked odd and might have been mislabeled.
When she got particularly stuck, Hill, Rambaut, and other members of the lab would pitch in to discuss the designations. But the grunt work fell on her.
Deciding when descendants of the virus deserve a new family name can be as much art as science. It was a painstaking process, sifting through an unheard-of number of genomes and asking time and again: Is this a new variant of covid or not?
“It was pretty tedious,” she says. “But it was always really humbling. Imagine going through 20,000 sequences from 100 different places in the world. I saw sequences from places I’d never even heard of.”
As time went on, O’Toole struggled to keep up with the volume of new genomes to sort and name.
In June 2020, there were over 57,000 sequences stored in the GISAID database, and O’Toole had sorted them into 39 variants. By November 2020, a month after she was supposed to turn in her thesis, O’Toole took her last solo run through the data. It took her 10 days to go through all the sequences, which by then numbered 200,000. (Although covid has overshadowed her research on other viruses, she’s putting a chapter on Pango in her thesis.)
Fortunately, the Pango software is built to be collaborative, and others have stepped up. An online community—the one that Jolly turned to when she noticed the variant sweeping across India—sprouted and grew. This year, O’Toole’s work has been much more hands-off. New lineages are now designated mostly when epidemiologists around the world contact O’Toole and the rest of the team through Twitter, email, or GitHub— her preferred method.
“Now it’s more reactionary,” says O’Toole. “If a group of researchers somewhere in the world is working on some data and they believe they’ve identified a new lineage, they can put in a request.”
The deluge of data has continued. This past spring, the team held a “pangothon,” a sort of hackathon in which they sorted 800,000 sequences into around 1,200 lineages.
“We gave ourselves three solid days,” says O’Toole. “It took two weeks.”
Since then, the Pango team has recruited a few more volunteers, like UCSC researcher Hindriks and Yale researcher Brito, who both got involved initially by adding their two cents on Twitter and the GitHub page. A postdoc at the University of Cambridge, Chris Ruis, has turned his attention to helping O’Toole clear out the backlog of GitHub requests.
O’Toole recently asked them to formally join the organization as part of the newly created Pango Network Lineage Designation Committee, which discusses and makes decisions about variant names. Another committee, which includes lab leader Rambaut, makes higher-level decisions.
“We’ve got a website, and an email that’s not just my email,” O’Toole says. “It’s become a lot more formalized, and I think that will really help it scale.”
A few cracks around the edges have started to show as the data has grown. As of today, there are nearly 2.5 million covid sequences in GISAID, which the Pango team has split into 1,300 branches. Each branch corresponds to a variant. Of those, eight are ones to watch, according to the WHO.
With so much to process, the software is starting to buckle. Things are getting mislabeled. Many strains look similar, because the virus evolves the most advantageous mutations over and over again.
As a stopgap measure, the team has built new software that uses a different sorting method and can catch things that Pango may miss.
The hunter-gatherer groups at the heart of a microbiome gold rush
The first step to finding out is to catalogue what microbes we might have lost. To get as close to ancient microbiomes as possible, microbiologists have begun studying multiple Indigenous groups. Two have received the most attention: the Yanomami of the Amazon rainforest and the Hadza, in northern Tanzania.
Researchers have made some startling discoveries already. A study by Sonnenburg and his colleagues, published in July, found that the gut microbiomes of the Hadza appear to include bugs that aren’t seen elsewhere—around 20% of the microbe genomes identified had not been recorded in a global catalogue of over 200,000 such genomes. The researchers found 8.4 million protein families in the guts of the 167 Hadza people they studied. Over half of them had not previously been identified in the human gut.
Plenty of other studies published in the last decade or so have helped build a picture of how the diets and lifestyles of hunter-gatherer societies influence the microbiome, and scientists have speculated on what this means for those living in more industrialized societies. But these revelations have come at a price.
A changing way of life
The Hadza people hunt wild animals and forage for fruit and honey. “We still live the ancient way of life, with arrows and old knives,” says Mangola, who works with the Olanakwe Community Fund to support education and economic projects for the Hadza. Hunters seek out food in the bush, which might include baboons, vervet monkeys, guinea fowl, kudu, porcupines, or dik-dik. Gatherers collect fruits, vegetables, and honey.
Mangola, who has met with multiple scientists over the years and participated in many research projects, has witnessed firsthand the impact of such research on his community. Much of it has been positive. But not all researchers act thoughtfully and ethically, he says, and some have exploited or harmed the community.
One enduring problem, says Mangola, is that scientists have tended to come and study the Hadza without properly explaining their research or their results. They arrive from Europe or the US, accompanied by guides, and collect feces, blood, hair, and other biological samples. Often, the people giving up these samples don’t know what they will be used for, says Mangola. Scientists get their results and publish them without returning to share them. “You tell the world [what you’ve discovered]—why can’t you come back to Tanzania to tell the Hadza?” asks Mangola. “It would bring meaning and excitement to the community,” he says.
Some scientists have talked about the Hadza as if they were living fossils, says Alyssa Crittenden, a nutritional anthropologist and biologist at the University of Nevada in Las Vegas, who has been studying and working with the Hadza for the last two decades.
The Hadza have been described as being “locked in time,” she adds, but characterizations like that don’t reflect reality. She has made many trips to Tanzania and seen for herself how life has changed. Tourists flock to the region. Roads have been built. Charities have helped the Hadza secure land rights. Mangola went abroad for his education: he has a law degree and a master’s from the Indigenous Peoples Law and Policy program at the University of Arizona.
The Download: a microbiome gold rush, and Eric Schmidt’s election misinformation plan
Over the last couple of decades, scientists have come to realize just how important the microbes that crawl all over us are to our health. But some believe our microbiomes are in crisis—casualties of an increasingly sanitized way of life. Disturbances in the collections of microbes we host have been associated with a whole host of diseases, ranging from arthritis to Alzheimer’s.
Some might not be completely gone, though. Scientists believe many might still be hiding inside the intestines of people who don’t live in the polluted, processed environment that most of the rest of us share. They’ve been studying the feces of people like the Yanomami, an Indigenous group in the Amazon, who appear to still have some of the microbes that other people have lost.
But there is a major catch: we don’t know whether those in hunter-gatherer societies really do have “healthier” microbiomes—and if they do, whether the benefits could be shared with others. At the same time, members of the communities being studied are concerned about the risk of what’s called biopiracy—taking natural resources from poorer countries for the benefit of wealthier ones. Read the full story.
Eric Schmidt has a 6-point plan for fighting election misinformation
—by Eric Schmidt, formerly the CEO of Google, and current cofounder of philanthropic initiative Schmidt Futures
The coming year will be one of seismic political shifts. Over 4 billion people will head to the polls in countries including the United States, Taiwan, India, and Indonesia, making 2024 the biggest election year in history.
Navigating a shifting customer-engagement landscape with generative AI
A strategic imperative
Generative AI’s ability to harness customer data in a highly sophisticated manner means enterprises are accelerating plans to invest in and leverage the technology’s capabilities. In a study titled “The Future of Enterprise Data & AI,” Corinium Intelligence and WNS Triange surveyed 100 global C-suite leaders and decision-makers specializing in AI, analytics, and data. Seventy-six percent of the respondents said that their organizations are already using or planning to use generative AI.
According to McKinsey, while generative AI will affect most business functions, “four of them will likely account for 75% of the total annual value it can deliver.” Among these are marketing and sales and customer operations. Yet, despite the technology’s benefits, many leaders are unsure about the right approach to take and mindful of the risks associated with large investments.
Mapping out a generative AI pathway
One of the first challenges organizations need to overcome is senior leadership alignment. “You need the necessary strategy; you need the ability to have the necessary buy-in of people,” says Ayer. “You need to make sure that you’ve got the right use case and business case for each one of them.” In other words, a clearly defined roadmap and precise business objectives are as crucial as understanding whether a process is amenable to the use of generative AI.
The implementation of a generative AI strategy can take time. According to Ayer, business leaders should maintain a realistic perspective on the duration required for formulating a strategy, conduct necessary training across various teams and functions, and identify the areas of value addition. And for any generative AI deployment to work seamlessly, the right data ecosystems must be in place.
Ayer cites WNS Triange’s collaboration with an insurer to create a claims process by leveraging generative AI. Thanks to the new technology, the insurer can immediately assess the severity of a vehicle’s damage from an accident and make a claims recommendation based on the unstructured data provided by the client. “Because this can be immediately assessed by a surveyor and they can reach a recommendation quickly, this instantly improves the insurer’s ability to satisfy their policyholders and reduce the claims processing time,” Ayer explains.
All that, however, would not be possible without data on past claims history, repair costs, transaction data, and other necessary data sets to extract clear value from generative AI analysis. “Be very clear about data sufficiency. Don’t jump into a program where eventually you realize you don’t have the necessary data,” Ayer says.
The benefits of third-party experience
Enterprises are increasingly aware that they must embrace generative AI, but knowing where to begin is another thing. “You start off wanting to make sure you don’t repeat mistakes other people have made,” says Ayer. An external provider can help organizations avoid those mistakes and leverage best practices and frameworks for testing and defining explainability and benchmarks for return on investment (ROI).
Using pre-built solutions by external partners can expedite time to market and increase a generative AI program’s value. These solutions can harness pre-built industry-specific generative AI platforms to accelerate deployment. “Generative AI programs can be extremely complicated,” Ayer points out. “There are a lot of infrastructure requirements, touch points with customers, and internal regulations. Organizations will also have to consider using pre-built solutions to accelerate speed to value. Third-party service providers bring the expertise of having an integrated approach to all these elements.”