Language-generation algorithms are known to embed racist and sexist ideas. They’re trained on the language of the internet, including the dark corners of Reddit and Twitter that may include hate speech and disinformation. Whatever harmful ideas are present in those forums get normalized as part of their learning.
Researchers have now demonstrated that the same can be true for image-generation algorithms. Feed one a photo of a man cropped right below his neck, and 43% of the time, it will autocomplete him wearing a suit. Feed the same one a cropped photo of a woman, even a famous woman like US Representative Alexandria Ocasio-Cortez, and 53% of the time, it will autocomplete her wearing a low-cut top or bikini. This has implications not just for image generation, but for all computer-vision applications, including video-based candidate assessment algorithms, facial recognition, and surveillance.
Ryan Steed, a PhD student at Carnegie Mellon University, and Aylin Caliskan, an assistant professor at George Washington University, looked at two algorithms: OpenAI’s iGPT (a version of GPT-2 that is trained on pixels instead of words) and Google’s SimCLR. While each algorithm approaches learning images differently, they share an important characteristic—they both use completely unsupervised learning, meaning they do not need humans to label the images.
This is a relatively new innovation as of 2020. Previous computer-vision algorithms mainly used supervised learning, which involves feeding them manually labeled images: cat photos with the tag “cat” and baby photos with the tag “baby.” But in 2019, researcher Kate Crawford and artist Trevor Paglen found that these human-created labels in ImageNet, the most foundational image data set for training computer-vision models, sometimes contain disturbing language, like “slut” for women and racial slurs for minorities.
The latest paper demonstrates an even deeper source of toxicity. Even without these human labels, the images themselves encode unwanted patterns. The issue parallels what the natural-language processing (NLP) community has already discovered. The enormous datasets compiled to feed these data-hungry algorithms capture everything on the internet. And the internet has an overrepresentation of scantily clad women and other often harmful stereotypes.
To conduct their study, Steed and Caliskan cleverly adapted a technique that Caliskan previously used to examine bias in unsupervised NLP models. These models learn to manipulate and generate language using word embeddings, a mathematical representation of language that clusters words commonly used together and separates words commonly found apart. In a 2017 paper published in Science, Caliskan measured the distances between the different word pairings that psychologists were using to measure human biases in the Implicit Association Test (IAT). She found that those distances almost perfectly recreated the IAT’s results. Stereotypical word pairings like man and career or woman and family were close together, while opposite pairings like man and family or woman and career were far apart.
iGPT is also based on embeddings: it clusters or separates pixels based on how often they co-occur within its training images. Those pixel embeddings can then be used to compare how close or far two images are in mathematical space.
In their study, Steed and Caliskan once again found that those distances mirror the results of IAT. Photos of men and ties and suits appear close together, while photos of women appear farther apart. The researchers got the same results with SimCLR, despite it using a different method for deriving embeddings from images.
These results have concerning implications for image generation. Other image-generation algorithms, like generative adversarial networks, have led to an explosion of deepfake pornography that almost exclusively targets women. iGPT in particular adds yet another way for people to generate sexualized photos of women.
But the potential downstream effects are much bigger. In the field of NLP, unsupervised models have become the backbone for all kinds of applications. Researchers begin with an existing unsupervised model like BERT or GPT-2 and use a tailored datasets to “fine-tune” it for a specific purpose. This semi-supervised approach, a combination of both unsupervised and supervised learning, has become a de facto standard.
Likewise, the computer vision field is beginning to see the same trend. Steed and Caliskan worry about what these baked-in biases could mean when the algorithms are used for sensitive applications such as in policing or hiring, where models are already analyzing candidate video recordings to decide if they’re a good fit for the job. “These are very dangerous applications that make consequential decisions,” says Caliskan.
Deborah Raji, a Mozilla fellow who co-authored an influential study revealing the biases in facial recognition, says the study should serve as a wakeup call to the computer vision field. “For a long time, a lot of the critique on bias was about the way we label our images,” she says. Now this paper is saying “the actual composition of the dataset is resulting in these biases. We need accountability on how we curate these data sets and collect this information.”
Steed and Caliskan urge greater transparency from the companies who are developing these models to open source them and let the academic community continue their investigations. They also encourage fellow researchers to do more testing before deploying a vision model, such as by using the methods they developed for this paper. And finally, they hope the field will develop more responsible ways of compiling and documenting what’s included in training datasets.
Caliskan says the goal is ultimately to gain greater awareness and control when applying computer vision. “We need to be very careful about how we use them,” she says, “but at the same time, now that we have these methods, we can try to use this for social good.”
The Download: a long covid app, and California’s wind plans
1 The Twitter Files weren’t the bombshell Elon Musk billed them as
His carelessness triggered the harassment of some of Twitter’s content moderators, too. (WP $)
+ The files didn’t violate the First Amendment, either. (The Atlantic $)
+ Hate speech has exploded on the platform since he took over. (NYT $)
+ Journalists are staying on Twitter—for now. (Vox)
+ The company’s advertising revenue isn’t looking very healthy. (NYT $)
2 Russia is trying to freeze Ukrainians by destroying their electricity
It’s the country’s vulnerable who will suffer the most. (Economist $)
+ How Ukraine could keep the lights on. (MIT Technology Review)
3 Crypto is at a crossroads
Investors, executives, and advocates are unsure what’s next. (NYT $)
+ FTX and the Alameda Research trading firm were way too close. (FT $)
+ It’s okay to opt out of the crypto revolution. (MIT Technology Review)
4 Taylor Swift fans are suing Ticketmaster
They’re furious they weren’t able to buy tickets in the botched sale last month. (The Verge)
6 We need a global deal to safeguard the natural world
COP15, held this week in Montreal, is our best bet to thrash one out. (Vox)
+ Off-grid living is more viable these days than you may think. (The Verge)
7 What ultra-dim galaxies can teach us about dark matter
We’re going to need new telescopes to seek more of them out. (Wired $)
+ Japanese billionaire Yusaku Maezawa has some big plans for space. (Reuters)
+ A super-bright satellite could hamper our understanding of the cosmos. (Motherboard)
+ Here’s how to watch Mars disappear behind the moon. (New Scientist $)
8 An elite media newsletter wants to cover “power, money, and ego.”
It promises unparalleled access to prolific writers—and their audiences. (New Yorker $)
+ How to sign off an email sensibly. (Economist $)
9 The metaverse has a passion for fashion 👗
Here’s what its best-dressed residents are wearing. (WSJ $)
10 We’ve been sending text messages for 30 years 💬
Yet we’re still misunderstanding each other. (The Guardian)
Quote of the day
“There is certainly a rising sense of fear, justifiable fear. And I would say almost horror.”
—Pamela Nadell, director of American University’s Jewish Studies program, tells the Washington Post she fears that antisemitism has become normalized in the US, in the light of Kanye West’s recent comments praising Hitler.
The big story
California’s coming offshore wind boom faces big engineering hurdles
Research groups estimate that the costs could fall from around $200 per megawatt-hour to between $58 and $120 by 2030. That would leave floating offshore wind more expensive than solar and onshore wind, but it could still serve an important role in an overall energy portfolio.
The technology is improving as well. Turbines themselves continue to get taller, generating more electricity and revenue from any given site. Some research groups and companies are also developing new types of floating platforms and delivery mechanisms that could make it easier to work within the constraints of ports and bridges.
The Denmark-based company Stiesdal has developed a modular, floating platform with a keel that doesn’t drop into place until it’s in the deep ocean, enabling it to be towed out from relatively shallow ports.
Meanwhile, San Francisco startup Aikido Technologies is developing a way of shipping turbines horizontally and then upending them in the deep ocean, enabling the structures to duck under bridges en route. The company believes its designs provide enough clearance for developers to access any US port. Some 80% of these ports have height limits owing to bridges or airport restrictions.
A number of federal, state, and local organizations are conducting evaluations of California and other US ports, assessing which ones might be best positioned to serve floating wind projects and what upgrades could be required to make it possible.
Government policies in the US, the European Union, China, and elsewhere are also providing incentives to develop offshore wind turbines, domestic manufacturing, and supporting infrastructure. That includes the Inflation Reduction Act that Biden signed into law this summer.
Finally, as for California’s permitting challenges, Hochschild notes that the same 2021 law requiring the state’s energy commision to set offshore wind goals also requires it to undertake the long-term planning necessary to meet them. That includes mapping out a strategy for streamlining the approval process.
For all the promise of floating wind, there’s little question that ensuring it’s cost-competitive and achieving the targets envisioned will require making massive investments in infrastructure, manufacturing, and more, and building big projects at a pace that the state hasn’t shown itself capable of in the recent past.
If it can pull it off, however, California could become a leading player in a critical new clean energy sector, harnessing its vast coastal resources to meet its ambitious climate goals.
How Twitter’s “Teacher Li” became the central hub of China protest information
It’s hard to describe the feeling that came after. It’s like everyone is coming to you and all kinds of information from all over the world is converging toward you and [people are] telling you: Hey, what’s happening here; hey, what’s happening there; do you know, this is what’s happening in Guangzhou; I’m in Wuhan, Wuhan is doing this; I’m in Beijing, and I’m following the big group and walking together. Suddenly all the real-time information is being submitted to me, and I don’t know how to describe that feeling. But there was also no time to think about it.
My heart was beating very fast, and my hands and my brain were constantly switching between several software programs—because you know, you can’t save a video with Twitter’s web version. So I was constantly switching software, editing the video, exporting it, and then posting it on Twitter. [Editor’s note: Li adds subtitles, blocks out account information, and compiles shorter videos into one.] By the end, there was no time to edit the videos anymore. If someone shot and sent over a 12-second WeChat video, I would just use it as is. That’s it.
I got the largest amount of [private messages] around 6:00 p.m. on Sunday night. At that time, there were many people on the street in five major cities in China: Beijing, Shanghai, Chengdu, Wuhan, and Guangzhou. So I basically was receiving a dozen private messages every second. In the end, I couldn’t even screen the information anymore. I saw it, I clicked on it, and if it was worth posting, I posted it.
People all over the country are telling me about their real-time situations. In order for more people not to be in danger, they went to the [protest] sites themselves and sent me what was going on there. Like, some followers were riding bikes near the presidential palace in Nanjing, taking pictures, and telling me about the situation in the city. And then they asked me to inform everyone to be cautious. I think that’s a really moving thing.
It’s like I have gradually become an anchor sitting in a TV studio, getting endless information from reporters on the scene all over the country. For example, on Monday in Hangzhou, there were five or six people updating me on the latest news simultaneously. But there was a break because all of them were fleeing when the police cleared the venue.
On the importance of staying objective
There are a lot of tweets that embellish the truth. From their point of view, they think it’s the right thing to do. They think you have to maximize the outrage so that there can be a revolt. But for me, I think we need reliable information. We need to know what’s really going on, and that’s the most important thing. If we were doing it for the emotion, then in the end I really would have been part of the “foreign influence,” right?
But if there is a news account outside China that can record what’s happening objectively, in real time, and accurately, then people inside the Great Firewall won’t have doubts anymore. At this moment, in this quite extreme situation of a continuous news blackout, to be able to have an account that can keep posting news from all over the country at a speed of almost one tweet every few seconds is actually a morale boost for everyone.
Chinese people grow up with patriotism, so they become shy or don’t dare to say something directly or oppose something directly. That’s why the crowd was singing the national anthem and waving the red flag, the national flag [during protests]. You have to understand that the Chinese people are patriotic. Even when they are demanding things [from the government], they do it with that sentiment.