In partnership with Omidyar Network, MIT Technology Review Insights spoke to leading thinkers examining the data economy, including researchers, lawyers, and economists at organizations such as the Open Data Institute, Yale University, and Microsoft Research, to explore the key data inequality trends, their root causes, and the ideas and tools available to solve them. The key findings of the report are as follows:
The data economy has become increasingly unequal.
Critics believe that the internet—once optimistically envisioned as a transformative public infrastructure—has fallen under the control of a small group of tech giants whose ownership of data, and the “computational infrastructures” that support it, leads to an unequal exchange where data controllers are disproportionately benefitted. The challenge for data rights advocates, economists, and governments is in developing ways of democratizing the data economy so that societies as a whole can leverage more benefit from the data revolution.
Data is a novel resource requiring new tools to calculate its value and identify its participants.
The power of data is relational and cumulative, it is the product of many participants and users who are often unwitting in their contribution to “data labor,” and are often not compensated fairly for it. We need more sophisticated tools for understanding the unique properties and dynamics of data.
Redressing the balance of the data economy is a mammoth task that falls to society as a whole.
Innovations to redress the imbalance of the data economy range from top-down government interventions, such as targeted regulatory reforms, to bottom-up civil society-led actions, where “data stewardship” can be fostered through institutions like trusts, cooperatives, and unions, which give people more control. Overall, a broad community of perspectives should be included in any efforts to rebalance the digital economy.
This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.
Why detecting AI-generated text is so difficult (and what to do about it)
This tool is OpenAI’s response to the heat it’s gotten from educators, journalists, and others for launching ChatGPT without any ways to detect text it has generated. However, it is still very much a work in progress, and it is woefully unreliable. OpenAI says its AI text detector correctly identifies 26% of AI-written text as “likely AI-written.”
While OpenAI clearly has a lot more work to do to refine its tool, there’s a limit to just how good it can make it. We’re extremely unlikely to ever get a tool that can spot AI-generated text with 100% certainty. It’s really hard to detect AI-generated text because the whole point of AI language models is to generate fluent and human-seeming text, and the model is mimicking text created by humans, says Muhammad Abdul-Mageed, a professor who oversees research in natural-language processing and machine learning at the University of British Columbia
We are in an arms race to build detection methods that can match the latest, most powerful models, Abdul-Mageed adds. New AI language models are more powerful and better at generating even more fluent language, which quickly makes our existing detection tool kit outdated.
OpenAI built its detector by creating a whole new AI language model akin to ChatGPT that is specifically trained to detect outputs from models like itself. Although details are sparse, the company apparently trained the model with examples of AI-generated text and examples of human-generated text, and then asked it to spot the AI-generated text. We asked for more information, but OpenAI did not respond.
Last month, I wrote about another method for detecting text generated by an AI: watermarks. These act as a sort of secret signal in AI-produced text that allows computer programs to detect it as such.
Researchers at the University of Maryland have developed a neat way of applying watermarks to text generated by AI language models, and they have made it freely available. These watermarks would allow us to tell with almost complete certainty when AI-generated text has been used.
The trouble is that this method requires AI companies to embed watermarking in their chatbots right from the start. OpenAI is developing these systems but has yet to roll them out in any of its products. Why the delay? One reason might be that it’s not always desirable to have AI-generated text watermarked.
One of the most promising ways ChatGPT could be integrated into products is as a tool to help people write emails or as an enhanced spell-checker in a word processor. That’s not exactly cheating. But watermarking all AI-generated text would automatically flag these outputs and could lead to wrongful accusations.
The original startup behind Stable Diffusion has launched a generative AI for video
Set up in 2018, Runway has been developing AI-powered video-editing software for several years. Its tools are used by TikTokers and YouTubers as well as mainstream movie and TV studios. The makers of The Late Show with Stephen Colbert used Runway software to edit the show’s graphics; the visual effects team behind the hit movie Everything Everywhere All at Once used the company’s tech to help create certain scenes.
In 2021, Runway collaborated with researchers at the University of Munich to build the first version of Stable Diffusion. Stability AI, a UK-based startup, then stepped in to pay the computing costs required to train the model on much more data. In 2022, Stability AI took Stable Diffusion mainstream, transforming it from a research project into a global phenomenon.
But the two companies no longer collaborate. Getty is now taking legal action against Stability AI—claiming that the company used Getty’s images, which appear in Stable Diffusion’s training data, without permission—and Runway is keen to keep its distance.
Gen-1 represents a new start for Runway. It follows a smattering of text-to-video models revealed late last year, including Make-a-Video from Meta and Phenaki from Google, both of which can generate very short video clips from scratch. It is also similar to Dreamix, a generative AI from Google revealed last week, which can create new videos from existing ones by applying specified styles. But at least judging from Runway’s demo reel, Gen-1 appears to be a step up in video quality. Because it transforms existing footage, it can also produce much longer videos than most previous models. (The company says it will post technical details about Gen-1 on its website in the next few days.)
Unlike Meta and Google, Runway has built its model with customers in mind. “This is one of the first models to be developed really closely with a community of video makers,” says Valenzuela. “It comes with years of insight about how filmmakers and VFX editors actually work on post-production.”
Gen-1, which runs on the cloud via Runway’s website, is being made available to a handful of invited users today and will be launched to everyone on the waitlist in a few weeks.
Last year’s explosion in generative AI was fueled by the millions of people who got their hands on powerful creative tools for the first time and shared what they made with them. Valenzuela hopes that putting Gen-1 into the hands of creative professionals will soon have a similar impact on video.
“We’re really close to having full feature films being generated,” he says. “We’re close to a place where most of the content you’ll see online will be generated.”
When my dad was sick, I started Googling grief. Then I couldn’t escape it.
I am a mostly visual thinker, and thoughts pose as scenes in the theater of my mind. When my many supportive family members, friends, and colleagues asked how I was doing, I’d see myself on a cliff, transfixed by an omniscient fog just past its edge. I’m there on the brink, with my parents and sisters, searching for a way down. In the scene, there is no sound or urgency and I am waiting for it to swallow me. I’m searching for shapes and navigational clues, but it’s so huge and gray and boundless.
I wanted to take that fog and put it under a microscope. I started Googling the stages of grief, and books and academic research about loss, from the app on my iPhone, perusing personal disaster while I waited for coffee or watched Netflix. How will it feel? How will I manage it?
I started, intentionally and unintentionally, consuming people’s experiences of grief and tragedy through Instagram videos, various newsfeeds, and Twitter testimonials. It was as if the internet secretly teamed up with my compulsions and started indulging my own worst fantasies; the algorithms were a sort of priest, offering confession and communion.
Yet with every search and click, I inadvertently created a sticky web of digital grief. Ultimately, it would prove nearly impossible to untangle myself. My mournful digital life was preserved in amber by the pernicious personalized algorithms that had deftly observed my mental preoccupations and offered me ever more cancer and loss.
I got out—eventually. But why is it so hard to unsubscribe from and opt out of content that we don’t want, even when it’s harmful to us?
I’m well aware of the power of algorithms—I’ve written about the mental-health impact of Instagram filters, the polarizing effect of Big Tech’s infatuation with engagement, and the strange ways that advertisers target specific audiences. But in my haze of panic and searching, I initially felt that my algorithms were a force for good. (Yes, I’m calling them “my” algorithms, because while I realize the code is uniform, the output is so intensely personal that they feel like mine.) They seemed to be working with me, helping me find stories of people managing tragedy, making me feel less alone and more capable.
In reality, I was intimately and intensely experiencing the effects of an advertising-driven internet, which Ethan Zuckerman, the renowned internet ethicist and professor of public policy, information, and communication at the University of Massachusetts at Amherst, famously called “the Internet’s Original Sin” in a 2014 Atlantic piece. In the story, he explained the advertising model that brings revenue to content sites that are most equipped to target the right audience at the right time and at scale. This, of course, requires “moving deeper into the world of surveillance,” he wrote. This incentive structure is now known as “surveillance capitalism.”
Understanding how exactly to maximize the engagement of each user on a platform is the formula for revenue, and it’s the foundation for the current economic model of the web.