The big data era has created valuable resources for public interest outcomes, like health care. In the last 18 months, the speed with which scientists were able to respond to the covid-19 pandemic—faster than any other disease in history—demonstrated the benefits of gathering, sharing, and extracting value from data for a wider good.
Access to data from 56 million National Health Service (NHS) patients’ medical records enabled public health researchers in the UK to provide some of the strongest data on risk factors for covid mortality and features of long covid, and access to health records sped up the development of lifesaving medical treatments like the messenger-RNA vaccines produced by Moderna and Pfizer.
But balancing the benefits of data sharing with the protection of individual and organizational privacy is a delicate process—and rightly so. Governments and businesses are increasingly collecting vast amounts of data, prompting investigations, concerns around privacy, and calls for stricter regulation.
“Data increasingly powers innovation, and it needs to be used for the public good, while individual privacy is protected. This is new and unfamiliar terrain for policymaking, and it requires a careful approach,” wrote David Deming, professor and director of the Malcolm Wiener Center for Social Policy at the Harvard Kennedy School, in a recent New York Times article.
A growing number of startups—some 230 and counting, according to Data Collaboratives—are forming to help empower citizens, nonprofit groups, and governments to gain more control over their data.
These startups are adopting legal and institutional structures like data trusts, cooperatives, and stewards to help provide people and organizations with a means of effectively and securely gathering and using relevant data—and in the process, taking on Big Tech’s control of the data economy.
“The relationship between data and society is fundamentally broken,” says Matt Gee, CEO of Brighthive, which helps networks and organizations set up alternative governance models including data trusts, data commons, and data cooperatives.
“We think it should be more collaborative instead of competitive, it should be more open and transparent, it should be more distributed and democratic instead of monopolistic. This is how we make the gains more equitable and reduce harmful biases in data.”
Access and control
As demonstrated by the pandemic, medical research and public health planning can be enriched by access to electronic health records, prescription and medicines data, and epidemiology. But health data are also highly sensitive, with understandable public scrutiny over efforts to share them. So-called “secondary use,” which applies personal health information for uses outside health-care delivery, requires a new governance framework.
Findata is an independent authority in the Finnish Institute of Health and Welfare, established by a government act in May 2019. The agency facilitates researchers’ access to Finnish health data, issuing permits for use or responding to specific statistical requests. In so doing, it aims to protect the interests of citizens while also appreciating the value that their data could offer to medical research, teaching, and health planning.
Prior to the formation of Findata, it was costly and complex for researchers to access this vital research resource. “The purpose of this agency is to streamline and secure the use of health data,” explains Johanna Seppänen, director of Findata.
“Before, if you wanted to have data from different registers or hospitals, you had to make four applications, and there were no standard ways of handling them, no ways to determine prices. It was very time-consuming, difficult, and confusing.”
Findata is the only agency of its kind so far, but it might inspire other countries that want to realize more value from health data in a safe and secure way.
The UK’s NHS recently faced pushback from privacy campaigners over reforms to improve data sharing for public health planning, showing the challenges that can come from attempts to change data collection and sharing protocols.
Empowerment and autonomy
Helping disenfranchised individuals and groups has been another focus area for new data governance organizations.
Data stewards—which range from community-based collectives to public or private organizations—serve as “both intermediaries and guardians during the exchange of data, thereby supporting individuals and communities to better navigate the data economy and better negotiate on their data rights,” says Suha Mohamed, strategy and partnerships associate at Aapti, an organization working on the intersection of technology and society with a focus on data rights.
One example of where data stewards can prove useful is for individuals in the gig economy, a fast-growing labor market that has been characterized by the prevalence of short-term contracts or freelance work, as opposed to permanent jobs, and has been rife with power inequalities.
“Asymmetric control of data is one of the primary levers of power that gig platforms use to manage their workforce and shape the narrative and public policy in the arena that they operate in,” says Hays Witt, co-founder and CEO of Driver’s Seat, a driver-owned data cooperative specializing in ride-hailing.
“Very few stakeholders have access to the data they need to engage in productive and constructive ways, starting with gig workers themselves. Our premise [at Driver’s Seat] is: let’s use tech and a data cooperative to empower gig workers to collect, aggregate, and share their data,” says Witt.
Driver’s Seat has developed a proprietary app through which workers can submit their location, work, and earnings information, which is then aggregated and analyzed. Drivers then receive insights that help them understand their real earnings and performance, informing their choices about where, when, on what platforms, and on what terms to work.
Driver’s Seat is developing tools that can tell drivers their average real pay across platforms in their city, compare their pay with averages, and tell them whether their pay is going up or down. All of this could help drivers move to platforms that offer them a better deal, empowering what is an otherwise atomized labor force.
“Our drivers are really excited to be engaged, because their day-to-day experience is seeing metrics, fed back to them by the platforms, that they don’t trust,” says Witt. “They know that the metrics are influential, their day-to-day experience is totally mediated by data. It impacts their earnings and their life, and they know it.”
Witt believes that in the future, workers will increasingly be able to contribute to crowdsourced information to develop “collective analyses of their problems, which means they can put forward collective policy solutions or agreements to negotiate with the employment platform.”
Balancing social mission and business models
All data startups, whether they are government-sanctioned institutions like Findata or entrepreneurial businesses like Driver’s Seat, face the challenge of balancing their mission with operational sustainability.
Securing a sustainable financial footing is a major challenge for nonprofit groups and social impact businesses. For data equity institutions, the funding mix commonly includes community- and membership-driven approaches, and philanthropic aid.
But some organizations, like Brighthive, have found win-win models where private sector companies are looking to improve data governance and are willing to pay for it.
Brighthive’s Gee describes commercial clients who have “seen what’s happening in the European Union around AI regulation and they want to get ahead of it in the US. They are taking a proactive stance on issues like algorithmic transparency, equity audits, and an alternative governance model for how they use customer data.”
Other data equity platforms have found revenue models in which beneficiary data can be harnessed by third parties in socially positive ways. Hays Witt at Driver’s Seat cites the example of municipal authorities and planning agencies.
Both the authorities and ride-hailing drivers have an incentive to reduce “dead time” in which a driver is circulating without earning money, causing emissions and congestion. If appropriate data can be collected, aggregated, and analyzed in a useful way, it can lead to better traffic and mobility decisions and infrastructure interventions. So, all participants benefit.
Witt points out other “neutral” cases where beneficiary data could be valuable to unrelated private sector entities in ways that do not work against the interests of the drivers. He gives the example of commercial real estate developers who are often forced to make decisions about investments and services based on out-of-date traffic and mobility data.
Driver’s Seat is exploring opportunities to offer aggregated analytics products to such companies with revenues returned as dividends to gig workers and to help finance the cooperative.
Many data startups seeking out sustainable revenue opportunities need to decide where to draw the line in terms of the kind of work they are willing to take on or the kind of businesses they’re willing to work with.
Brighthive’s Matt Gee points to growing investor interest in startups that can help companies navigate the end of “cookies,” which have been critical to third-party advertising but are now being phased out. “Investors are concerned about the death of third-party data and are hungry for companies addressing that,” he says.
But as socially minded startups gain more business from corporate clients, they need to balance their mission for social good with the financial gain of lucrative contracts.
“Is being a public benefit corporation more about what you do and how you do it, or who you work with? If we work on a data collaborative that provides transparency and accountability for marketing organizations pooling customer lists, are we actually reducing societal harm? These are questions that our team is constantly grappling with,” says Gee.
Data startups will inevitably face challenges, including balancing social mission, ethics, and business models, but as the data economy continues to grow, they are in a unique position to carve out new ways of responsibly leveraging the insight that data can provide for citizens, organizations, and governments—wresting some of the power over data away from Big Tech.
“Our data economy needs to anchor on creating value for everyone in society, and that requires user control, trusted intermediation, and collective governance to be embedded in innovative data stewardship models,” says Sushant Kumar, principal of responsible technology at social change venture Omidyar Network.
“Onboarding a critical mass of users, receiving regulatory support, and achieving financial sustainability will also ensure these designs succeed in disrupting the status quo and injecting fairness into the current paradigm.”
This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.
How do I know if egg freezing is for me?
The tool is currently being trialed in a group of research volunteers and is not yet widely available. But I’m hoping it represents a move toward more transparency and openness about the real costs and benefits of egg freezing. Yes, it is a remarkable technology that can help people become parents. But it might not be the best option for everyone.
Read more from Tech Review’s archive
Anna Louie Sussman had her eggs frozen in Italy and Spain because services in New York were too expensive. Luckily, there are specialized couriers ready to take frozen sex cells on international journeys, she wrote.
Michele Harrison was 41 when she froze 21 of her eggs. By the time she wanted to use them, two years later, only one was viable. Although she did have a baby, her case demonstrates that egg freezing is no guarantee of parenthood, wrote Bonnie Rochman.
What happens if someone dies with eggs in storage? Frozen eggs and sperm can still be used to create new life, but it’s tricky to work out who can make the decision, as I wrote in a previous edition of The Checkup.
Meanwhile, the race is on to create lab-made eggs and sperm. These cells, which might be made from a person’s blood or skin cells, could potentially solve a lot of fertility problems—should they ever prove safe, as I wrote in a feature for last year’s magazine issue on gender.
Researchers are also working on ways to mature eggs from transgender men in the lab, which could allow them to store and use their eggs without having to pause gender-affirming medical care or go through other potentially distressing procedures, as I wrote last year.
From around the web
The World Health Organization is set to decide whether covid still represents a “public health emergency of international concern.” It will probably decide to keep this status, because of the current outbreak in China. (STAT)
Researchers want to study the brains, genes, and other biological features of incarcerated people to find ways to stop them from reoffending. Others warn that this approach is based on shoddy science and racist ideas. (Undark)
A watermark for chatbots can expose text written by an AI
For example, since OpenAI’s chatbot ChatGPT was launched in November, students have already started cheating by using it to write essays for them. News website CNET has used ChatGPT to write articles, only to have to issue corrections amid accusations of plagiarism. Building the watermarking approach into such systems before they’re released could help address such problems.
In studies, these watermarks have already been used to identify AI-generated text with near certainty. Researchers at the University of Maryland, for example, were able to spot text created by Meta’s open-source language model, OPT-6.7B, using a detection algorithm they built. The work is described in a paper that’s yet to be peer-reviewed, and the code will be available for free around February 15.
AI language models work by predicting and generating one word at a time. After each word, the watermarking algorithm randomly divides the language model’s vocabulary into words on a “greenlist” and a “redlist” and then prompts the model to choose words on the greenlist.
The more greenlisted words in a passage, the more likely it is that the text was generated by a machine. Text written by a person tends to contain a more random mix of words. For example, for the word “beautiful,” the watermarking algorithm could classify the word “flower” as green and “orchid” as red. The AI model with the watermarking algorithm would be more likely to use the word “flower” than “orchid,” explains Tom Goldstein, an assistant professor at the University of Maryland, who was involved in the research.
The Download: watermarking AI text, and freezing eggs
That’s why the team behind a new decision-making tool hope it will help to clear up some of the misconceptions around the procedure—and give would-be parents a much-needed insight into its real costs, benefits, and potential pitfalls. Read the full story.
This story is from The Checkup, MIT Technology Review’s weekly newsletter giving you the inside track on all things health and biotech. Sign up to receive it in your inbox every Thursday.
I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.
1 Elon Musk held a surprise meeting with US political leaders
Allegedly in the interest of ensuring Twitter is “fair to both parties.” (Insider $)
+ Kanye West’s presidential campaign advisors have been booted off Twitter. (Rolling Stone $)
+ Twitter’s trust and safety head is Musk’s biggest champion. (Bloomberg $)
2 We’re treating covid like flu now
Annual covid shots are the next logical step. (The Atlantic $)
3 The worst thing about Sam Bankman-Fried’s spell in jail?
Being cut off from the internet. (Forbes $)
+ Most crypto criminals use just five exchanges. (Wired $)
+ Collapsed crypto firmFTX has objected to a new investigation request. (Reuters)
4 Israel’s tech sector is rising up against its government
Tech workers fear its hardline policies will harm startups. (FT $)
5 It’s possible to power the world solely using renewable energy
At least, according to Stanford academic Mark Jacobson. (The Guardian)
+ Tech bros love the environment these days. (Slate $)
+ How new versions of solar, wind, and batteries could help the grid. (MIT Technology Review)
6 Generative AI is wildly expensive to run
And that’s why promising startups like OpenAI need to hitch their wagons to the likes of Microsoft. (Bloomberg $)
+ How Microsoft benefits from the ChatGPT hype. (Vox)
+ BuzzFeed is planning to make quizzes supercharged by OpenAI. (WSJ $)
+ Generative AI is changing everything. But what’s left when the hype is gone? (MIT Technology Review)
7 It’s hard not to blame self-driving cars for accidents
Even when it’s not technically their fault. (WSJ $)
8 What it’s like to swap Google for TikTok
It’s great for food suggestions and hacks, but hopeless for anything work-related. (Wired $)
+ The platform really wants to stay operational in the US. (Vox)
+ TikTok is mired in an eyelash controversy. (Rolling Stone $)
9 CRISPR gene editing kits are available to buy online
But there’s no guarantee these experiments will actually work. (Motherboard)
+ Next up for CRISPR: Gene editing for the masses? (MIT Technology Review)
10 Tech workers are livestreaming their layoffs
It’s a candid window into how these notoriously secretive companies treat their staff. (The Information $)