In March 2020, when the WHO declared a pandemic, the public sequence database GISAID held 524 covid sequences. Over the next month scientists uploaded 6,000 more. By the end of May, the total was over 35,000. (In contrast, global scientists added 40,000 flu sequences to GISAID in all of 2019.)
“Without a name, forget about it—we cannot understand what other people are saying,” says Anderson Brito, a postdoc in genomic epidemiology at the Yale School of Public Health, who contributes to the Pango effort.
As the number of covid sequences spiraled, researchers trying to study them were forced to create entirely new infrastructure and standards on the fly. A universal naming system has been one of the most important elements of this effort: without it, scientists would struggle to talk to each other about how the virus’s descendants are traveling and changing—either to flag up a question or, even more critically, to sound the alarm.
Where Pango came from
In April 2020, a handful of prominent virologists in the UK and Australia proposed a system of letters and numbers for naming lineages, or new branches, of the covid family. It had a logic, and a hierarchy, even though the names it generated—like B.1.1.7—were a bit of a mouthful.
One of the authors on the paper was Áine O’Toole, a PhD candidate at the University of Edinburgh. Soon she’d become the primary person actually doing that sorting and classifying, eventually combing through hundreds of thousands of sequences by hand.
She says: “Very early on, it was just who was available to curate the sequences. That ended up being my job for a good bit. I guess I never understood quite the scale we were going to get to.”
She quickly set about building software to assign new genomes to the right lineages. Not long after that, another researcher, postdoc Emily Scher, built a machine-learning algorithm to speed things up even more.
They named the software Pangolin, a tongue-in-cheek reference to a debate about the animal origin of covid. (The whole system is now simply known as Pango.)
The naming system, along with the software to implement it, quickly became a global essential. Although the WHO has recently started using Greek letters for variants that seem especially concerning, like delta, those nicknames are for the public and the media. Delta actually refers to a growing family of variants, which scientists call by their more precise Pango names: B.1.617.2, AY.1, AY.2, and AY.3.
“When alpha emerged in the UK, Pango made it very easy for us to look for those mutations in our genomes to see if we had that lineage in our country too,” says Jolly. “Ever since then, Pango has been used as the baseline for reporting and surveillance of variants in India.”
Because Pango offers a rational, orderly approach to what would otherwise be chaos, it may forever change the way scientists name viral strains—allowing experts from all over the world to work together with a shared vocabulary. Brito says: “Most likely, this will be a format we’ll use for tracking any other new virus.”
Many of the foundational tools for tracking covid genomes have been developed and maintained by early-career scientists like O’Toole and Scher over the last year and a half. As the need for worldwide covid collaboration exploded, scientists rushed to support it with ad hoc infrastructure like Pango. Much of that work fell to tech-savvy young researchers in their 20s and 30s. They used informal networks and tools that were open source—meaning they were free to use, and anyone could volunteer to add tweaks and improvements.
“The people on the cutting edge of new technologies tend to be grad students and postdocs,” says Angie Hinrichs, a bioinformatician at UC Santa Cruz who joined the Pangolin project earlier this year. For example, O’Toole and Scher work in the lab of Andrew Rambaut, a genomic epidemiologist who posted the first public covid sequences online after receiving them from Chinese scientists. “They just happened to be perfectly placed to provide these tools that became absolutely critical,” Hinrichs says.
It hasn’t been easy. For most of 2020, O’Toole took on the bulk of the responsibility for identifying and naming new lineages by herself. The university was shuttered, but she and another of Rambaut’s PhD students, Verity Hill, got permission to come into the office. Her commute, walking 40 minutes to school from the apartment where she lived alone, gave her some sense of normalcy.
Every few weeks, O’Toole would download the entire covid repository from the GISAID database, which had grown exponentially each time. Then she would hunt around for groups of genomes with mutations that looked similar, or things that looked odd and might have been mislabeled.
When she got particularly stuck, Hill, Rambaut, and other members of the lab would pitch in to discuss the designations. But the grunt work fell on her.
Deciding when descendants of the virus deserve a new family name can be as much art as science. It was a painstaking process, sifting through an unheard-of number of genomes and asking time and again: Is this a new variant of covid or not?
“It was pretty tedious,” she says. “But it was always really humbling. Imagine going through 20,000 sequences from 100 different places in the world. I saw sequences from places I’d never even heard of.”
As time went on, O’Toole struggled to keep up with the volume of new genomes to sort and name.
In June 2020, there were over 57,000 sequences stored in the GISAID database, and O’Toole had sorted them into 39 variants. By November 2020, a month after she was supposed to turn in her thesis, O’Toole took her last solo run through the data. It took her 10 days to go through all the sequences, which by then numbered 200,000. (Although covid has overshadowed her research on other viruses, she’s putting a chapter on Pango in her thesis.)
Fortunately, the Pango software is built to be collaborative, and others have stepped up. An online community—the one that Jolly turned to when she noticed the variant sweeping across India—sprouted and grew. This year, O’Toole’s work has been much more hands-off. New lineages are now designated mostly when epidemiologists around the world contact O’Toole and the rest of the team through Twitter, email, or GitHub— her preferred method.
“Now it’s more reactionary,” says O’Toole. “If a group of researchers somewhere in the world is working on some data and they believe they’ve identified a new lineage, they can put in a request.”
The deluge of data has continued. This past spring, the team held a “pangothon,” a sort of hackathon in which they sorted 800,000 sequences into around 1,200 lineages.
“We gave ourselves three solid days,” says O’Toole. “It took two weeks.”
Since then, the Pango team has recruited a few more volunteers, like UCSC researcher Hindriks and Yale researcher Brito, who both got involved initially by adding their two cents on Twitter and the GitHub page. A postdoc at the University of Cambridge, Chris Ruis, has turned his attention to helping O’Toole clear out the backlog of GitHub requests.
O’Toole recently asked them to formally join the organization as part of the newly created Pango Network Lineage Designation Committee, which discusses and makes decisions about variant names. Another committee, which includes lab leader Rambaut, makes higher-level decisions.
“We’ve got a website, and an email that’s not just my email,” O’Toole says. “It’s become a lot more formalized, and I think that will really help it scale.”
A few cracks around the edges have started to show as the data has grown. As of today, there are nearly 2.5 million covid sequences in GISAID, which the Pango team has split into 1,300 branches. Each branch corresponds to a variant. Of those, eight are ones to watch, according to the WHO.
With so much to process, the software is starting to buckle. Things are getting mislabeled. Many strains look similar, because the virus evolves the most advantageous mutations over and over again.
As a stopgap measure, the team has built new software that uses a different sorting method and can catch things that Pango may miss.
Investing in women pays off
“Starting a business is a privilege,” says Burton O’Toole, who worked at various startups before launching and later selling AdMass, her own marketing technology company. The company gave her access to the HearstLab program in 2016, but she soon discovered that she preferred the investment aspect and became a vice president at HearstLab a year later. “To empower some of the smartest women to do what they love is great,” she says. But in addition to rooting for women, Burton O’Toole loves the work because it’s a great market opportunity.
“Research shows female-led teams see two and a half times higher returns compared to male-led teams,” she says, adding that women and people of color tend to build more diverse teams and therefore benefit from varied viewpoints and perspectives. She also explains that companies with women on their founding teams are likely to get acquired or go public sooner. “Despite results like this, just 2.3% of venture capital funding goes to teams founded by women. It’s still amazing to me that more investors aren’t taking this data more seriously,” she says.
Burton O’Toole—who earned a BS from Duke in 2007 before getting an MS and PhD from MIT, all in mechanical engineering—has been a “data nerd” since she can remember. In high school she wanted to become an actuary. “Ten years ago, I never could have imagined this work; I like the idea of doing something in 10 more years I couldn’t imagine now,” she says.
When starting a business, Burton O’Toole says, “women tend to want all their ducks in a row before they act. They say, ‘I’ll do it when I get this promotion, have enough money, finish this project.’ But there’s only one good way. Make the jump.”
Preparing for disasters, before it’s too late
All too often, the work of developing global disaster and climate resiliency happens when disaster—such as a hurricane, earthquake, or tsunami—has already ravaged entire cities and torn communities apart. But Elizabeth Petheo, MBA ’14, says that recently her work has been focused on preparedness.
It’s hard to get attention for preparedness efforts, explains Petheo, a principal at Miyamoto International, an engineering and disaster risk reduction consulting firm. “You can always get a lot of attention when there’s a disaster event, but at that point it’s too late,” she adds.
Petheo leads the firm’s projects and partnerships in the Asia-Pacific region and advises globally on international development and humanitarian assistance. She also works on preparedness in the Asia-Pacific region with the United States Agency for International Development.
“We’re doing programming on the engagement of the private sector in disaster risk management in Indonesia, which is a very disaster-prone country,” she says. “Smaller and medium-sized businesses are important contributors to job creation and economic development. When they go down, the impact on lives, livelihoods, and the community’s ability to respond and recover effectively is extreme. We work to strengthen their own understanding of their risk and that of their surrounding community, lead them through an action-planning process to build resilience, and link that with larger policy initiatives at the national level.”
Petheo came to MIT with international leadership experience, having managed high-profile global development and risk mitigation initiatives at the World Bank in Washington, DC, as well as with US government agencies and international organizations leading major global humanitarian responses and teams in Sri Lanka and Haiti. But she says her time at Sloan helped her become prepared for this next phase in her career. “Sloan was the experience that put all the pieces together,” she says.
Petheo has maintained strong connections with MIT. In 2018, she received the Margaret L.A. MacVicar ’65, ScD ’67, Award in recognition of her role starting and leading the MIT Sloan Club in Washington, DC, and her work as an inaugural member of the Graduate Alumni Council (GAC). She is also a member of the Friends of the MIT Priscilla King Gray Public Service Center.
“I believe deeply in the power and impact of the Institute’s work and people,” she says. “The moment I graduated, my thought process was, ‘How can I give back, and how can I continue to strengthen the experience of those who will come after me?’”
The Download: a curb on climate action, and post-Roe period tracking
Why’s it so controversial?: Geoengineering was long a taboo topic among scientists, and some argue it should remain one. There are questions about its potential environmental side effects, and concerns that the impacts will be felt unevenly across the globe. Some feel it’s too dangerous to ever try or even to investigate, arguing that just talking about the possibility could weaken the need to address the underlying causes of climate change.
But it’s going ahead?: Despite the concerns, as the threat of climate change grows and major nations fail to make rapid progress on emissions, growing numbers of experts are seriously exploring the potential effects of these approaches. Read the full story.
I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.
1 The belief that AI is alive refuses to die
People want to believe the models are sentient, even when their creators deny it. (Reuters)
+ It’s unsurprising wild religious beliefs find a home in Silicon Valley. (Vox)
+ AI systems are being trained twice as quickly as they were just last year. (Spectrum IEEE)
2 The FBI added the missing cryptoqueen to its most-wanted list
It’s offering a $100,000 reward for information leading to Ruja Ignatova, whose crypto scheme defrauded victims out of more than $4 billion. (BBC)
+ A new documentary on the crypto Ponzi scheme is in the works. (Variety)
3 Social media platforms turn a blind eye to dodgy telehealth ads
Which has played a part in the prescription drugs abuse boom. (Protocol)
+ The doctor will Zoom you now. (MIT Technology Review)
4 We’re addicted to China’s lithium batteries
Which isn’t great news for other countries building electric cars. (Wired $)
+ This battery uses a new anode that lasts 20 times longer than lithium. (Spectrum IEEE)
+ Quantum batteries could, in theory, allow us to drive a million miles between charges. (The Next Web)
5 Far-right extremists are communicating over radio to avoid detection
Making it harder to monitor them and their violent activities. (Slate $)
+ Many of the rioters who stormed the Capitol were carrying radio equipment. (The Guardian)
6 Bro culture has no place in space 🚀
So says NASA’s former deputy administrator, who’s sick and tired of misogyny in the sector. (CNN)
7 A US crypto exchange is gaining traction in Venezuela
It’s helping its growing community battle hyperinflation, but isn’t as decentralized as they believe it to be. (Rest of World)
+ The vast majority of NFT players won’t be around in a decade. (Vox)
+ Exchange Coinbase is working with ICE to track and identify crypto users. (The Intercept)
+ If RadioShack’s edgy tweets shock you, don’t forget it’s a crypto firm now. (NY Mag)
8 It’s time we learned to love our swamps
Draining them prevents them from absorbing CO2 and filtering out our waste. (New Yorker $)
+ The architect making friends with flooding. (MIT Technology Review)
9 Robots love drawing too 🖍️
Though I’ll bet they don’t get as frustrated as we do when they mess up. (Input)
10 The risky world of teenage brains
Making potentially dangerous decisions is an important part of adolescence, and our brains reflect that. (Knowable Magazine)
Quote of the day
“They shamelessly celebrate an all-inclusive pool party while we can’t even pay our rent!”