Chen says that while content moderation policies from Facebook, Twitter, and others succeeded in filtering out some of the most obvious English-language disinformation, the system often misses such content when it’s in other languages. That work instead had to be done by volunteers like her team, who looked for disinformation and were trained to defuse it and minimize its spread. “Those mechanisms meant to catch certain words and stuff don’t necessarily catch that dis- and misinformation when it’s in a different language,” she says.
Google’s translation services and technologies such as Translatotron and real-time translation headphones use artificial intelligence to convert between languages. But Xiong finds these tools inadequate for Hmong, a deeply complex language where context is incredibly important. “I think we’ve become really complacent and dependent on advanced systems like Google,” she says. “They claim to be ‘language accessible,’ and then I read it and it says something totally different.”
(A Google spokesperson admitted that smaller languages “pose a more difficult translation task” but said that the company has “invested in research that particularly benefits low-resource language translations,” using machine learning and community feedback.)
All the way down
The challenges of language online go beyond the US—and down, quite literally, to the underlying code. Yudhanjaya Wijeratne is a researcher and data scientist at the Sri Lankan think tank LIRNEasia. In 2018, he started tracking bot networks whose activity on social media encouraged violence against Muslims: in February and March of that year, a string of riots by Sinhalese Buddhists targeted Muslims and mosques in the cities of Ampara and Kandy. His team documented “the hunting logic” of the bots, catalogued hundreds of thousands of Sinhalese social media posts, and took the findings to Twitter and Facebook. “They’d say all sorts of nice and well-meaning things–basically canned statements,” he says. (In a statement, Twitter says it uses human review and automated systems to “apply our rules impartially for all people in the service, regardless of background, ideology, or placement on the political spectrum.”)
When contacted by MIT Technology Review, a Facebook spokesperson said the company commissioned an independent human rights assessment of the platform’s role in the violence in Sri Lanka, which was published in May 2020, and made changes in the wake of the attacks, including hiring dozens of Sinhala and Tamil-speaking content moderators. “We deployed proactive hate speech detection technology in Sinhala to help us more quickly and effectively identify potentially violating content,” they said.
When the bot behavior continued, Wijeratne grew skeptical of the platitudes. He decided to look at the code libraries and software tools the companies were using, and found that the mechanisms to monitor hate speech in most non-English languages had not yet been built.
“Much of the research, in fact, for a lot of languages like ours has simply not been done yet,” Wijeratne says. “What I can do with three lines of code in Python in English literally took me two years of looking at 28 million words of Sinhala to build the core corpuses, to build the core tools, and then get things up to that level where I could potentially do that level of text analysis.”
After suicide bombers targeted churches in Colombo, the Sri Lankan capital, in April 2019, Wijeratne built a tool to analyze hate speech and misinformation in Sinhala and Tamil. The system, called Watchdog, is a free mobile application that aggregates news and attaches warnings to false stories. The warnings come from volunteers who are trained in fact-checking.
Wijeratne stresses that this work goes far beyond translation.
“Many of the algorithms that we take for granted that are often cited in research, in particular in natural-language processing, show excellent results for English,” he says. “And yet many identical algorithms, even used on languages that are only a few degrees of difference apart—whether they’re West German or from the Romance tree of languages—may return completely different results.”
Natural-language processing is the basis of automated content moderation systems. Wijeratne published a paper in 2019 that examined the discrepancies between their accuracy in different languages. He argues that the more computational resources that exist for a language, like data sets and web pages, the better the algorithms can work. Languages from poorer countries or communities are disadvantaged.
“If you’re building, say, the Empire State Building for English, you have the blueprints. You have the materials,” he says. “You have everything on hand and all you have to do is put this stuff together. For every other language, you don’t have the blueprints.
“You have no idea where the concrete is going to come from. You don’t have steel and you don’t have the workers, either. So you’re going to be sitting there tapping away one brick at a time and hoping that maybe your grandson or your granddaughter might complete the project.”
The movement to provide those blueprints is known as language justice, and it is not new. The American Bar Association describes language justice as a “framework” that preserves people’s rights “to communicate, understand, and be understood in the language in which they prefer and feel most articulate and powerful.”
Human creators stand to benefit as AI rewrites the rules of content creation
A game-changer for content creation
Among the AI-related technologies to have emerged in the past several years is generative AI—deep-learning algorithms that allow computers to generate original content, such as text, images, video, audio, and code. And demand for such content will likely jump in the coming years—Gartner predicts that by 2025, generative AI will account for 10% of all data created, compared with 1% in 2022.
“Théâtre D’opéra Spatial” is an example of AI-generated content (AIGC), created with the Midjourney text-to-art generator program. Several other AI-driven art-generating programs have also emerged in 2022, capable of creating paintings from single-line text prompts. The diversity of technologies reflects a wide range of artistic styles and different user demands. DALL-E 2 and Stable Diffusion, for instance, are focused mainly on western-style artwork, while Baidu’s ERNIE-ViLG and Wenxin Yige produce images influenced by Chinese aesthetics. At Baidu’s deep learning developer conference Wave Summit+ 2022, the company announced that Wenxin Yige has been updated with new features, including turning photos into AI-generated art, image editing, and one-click video production.
Meanwhile, AIGC can also include articles, videos, and various other media offerings such as voice synthesis. A technology that generates audible speech indistinguishable from the voice of the original speaker, voice synthesis can be applied in many scenarios, including voice navigation for digital maps. Baidu Maps, for example, allows users to customize its voice navigation to their own voice just by recording nine sentences.
Recent advances in AI technologies have also created generative language models that can fluently compose texts with just one click. They can be used for generating marketing copy, processing documents, extracting summaries, and other text tasks, unlocking creativity that other technologies such as voice synthesis have failed to tap. One of the leading generative language models is Baidu’s ERNIE 3.0, which has been widely applied in various industries such as health care, education, technology, and entertainment.
“In the past year, artificial intelligence has made a great leap and changed its technological direction,” says Robin Li, CEO of Baidu. “Artificial intelligence has gone from understanding pictures and text to generating content.” Going one step further, Baidu App, a popular search and newsfeed app with over 600 million monthly users, including five million content creators, recently released a video editing feature that can produce a short video accompanied by a voiceover created from data provided in an article.
Improving efficiency and growth
As AIGC becomes increasingly common, it could make content creation more efficient by getting rid of repetitive, time-intensive tasks for creators such as sorting out source assets and voice recordings and rendering images. Aspiring filmmakers, for instance, have long had to pay their dues by spending countless hours mastering the complex and tedious process of video editing. AIGC may soon make that unnecessary.
Besides boosting efficiency, AIGC could also increase business growth in content creation amid rising demand for personalized digital content that users can interact with dynamically. InsightSLICE forecasts that the global digital creation market will on average grow 12% annually between 2020 and 2030 and hit $38.2 billion. With content consumption fast outpacing production, traditional development methods will likely struggle to meet such increasing demand, creating a gap that could be filled by AIGC. “AI has the potential to meet this massive demand for content at a tenth of the cost and a hundred times or thousands of times faster in the next decade,” Li says.
AI with humanity as its foundation
AIGC can also serve as an educational tool by helping children develop their creativity. StoryDrawer, for instance, is an AI-driven program designed to boost children’s creative thinking, which often declines as the focus in their education shifts to rote learning.
The Download: the West’s AI myth, and Musk v Apple
While the US and the EU may differ on how to regulate tech, their lawmakers seem to agree on one thing: the West needs to ban AI-powered social scoring.
As they understand it, social scoring is a practice in which authoritarian governments—specifically China—rank people’s trustworthiness and punish them for undesirable behaviors, such as stealing or not paying back loans. Essentially, it’s seen as a dystopian superscore assigned to each citizen.
The reality? While there have been some contentious local experiments with social credit scores in China, there is no countrywide, all-seeing social credit system with algorithms that rank people.
The irony is that while US and European politicians try to ban systems that don’t really exist, systems that do rank and penalize people are already in place in the West—and are denying people housing and jobs in the process. Read the full story.
Melissa’s story is from The Algorithm, her weekly AI newsletter covering all of the industry’s most interesting developments. Sign up to receive it in your inbox every Monday.
I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.
1 Apple has reportedly threatened to pull Twitter from the App Store
According to Elon Musk. (NYT $)
+ Musk has threatened to “go to war” with the company after it decided to stop advertising on Twitter. (WP $)
+ Apple’s reluctance to advertise on Twitter right now isn’t exactly unique. (Motherboard)
+ Twitter’s child protection team in Asia has been gutted. (Wired $)
2 Another crypto firm has collapsed
Lender BlockFi has filed for bankruptcy, and is (partly) blaming FTX. (WSJ $)
+ The company is suing FTX founder Sam Bankman-Fried. (FT $)
+ It looks like the much-feared “crypto contagion” is spreading. (NYT $)
3 AI is rapidly becoming more powerful—and dangerous
That’s particularly worrying when its growth is too much for safety teams to handle. (Vox)
+ Do AI systems need to come with safety warnings? (MIT Technology Review)
+ This AI chat-room game is gaining a legion of fans. (The Guardian)
4 A Pegasus spyware investigation is in danger of being compromised
It’s the target of a disinformation campaign, security experts have warned. (The Guardian)
+ Cyber insurance won’t protect you from theft of your data. (The Guardian)
5 Google gave the FBI geofence data for its January 6 investigation
Google identified more than 5,000 devices near the US Capitol during the riot. (Wired $)
6 Monkeypox isn’t going anywhere
But it’s not on the rise, either. (The Atlantic $)
+ The World Health Organization says it will now be known as mpox. (BBC)
+ Everything you need to know about the monkeypox vaccines. (MIT Technology Review)
7 What it’s like to be the unwitting face of a romance scam
James Scott Geras’ pictures have been used to catfish countless women. (Motherboard)
What’s next in cybersecurity
One of the reasons cyber hasn’t played a bigger role in the war, according to Carhart, is because “in the whole conflict, we saw Russia being underprepared for things and not having a good game plan. So it’s not really surprising that we see that as well in the cyber domain.”
Moreover, Ukraine, under the leadership of Zhora and his cybersecurity agency, has been working on its cyber defenses for years, and it has received support from the international community since the war started, according to experts. Finally, an interesting twist in the conflict on the internet between Russia and Ukraine was the rise of the decentralized, international cyber coalition known as the IT Army, which scored some significant hacks, showing that war in the future can also be fought by hacktivists.
Ransomware runs rampant again
This year, other than the usual corporations, hospitals, and schools, government agencies in Costa Rica, Montenegro, and Albania all suffered damaging ransomware attacks too. In Costa Rica, the government declared a national emergency, a first after a ransomware attack. And in Albania, the government expelled Iranian diplomats from the country—a first in the history of cybersecurity—following a destructive cyberattack.
These types of attacks were at an all-time high in 2022, a trend that will likely continue next year, according to Allan Liska, a researcher who focuses on ransomware at cybersecurity firm Recorded Future.
“[Ransomware is] not just a technical problem like an information stealer or other commodity malware. There are real-world, geopolitical implications,” he says. In the past, for example, a North Korean ransomware called WannaCry caused severe disruption to the UK’s National Health System and hit an estimated 230,000 computers worldwide.
Luckily, it’s not all bad news on the ransomware front. According to Liska, there are some early signs that point to “the death of the ransomware-as-a-service model,” in which ransomware gangs lease out hacking tools. The main reason, he said, is that whenever a gang gets too big, “something bad happens to them.”
For example, the ransomware groups REvil and DarkSide/BlackMatter were hit by governments; Conti, a Russian ransomware gang, unraveled internally when a Ukrainian researcher appalled by Conti’s public support of the war leaked internal chats; and the LockBit crew also suffered the leak of its code.
“We are seeing a lot of the affiliates deciding that maybe I don’t want to be part of a big ransomware group, because they all have targets on their back, which means that I might have a target on my back, and I just want to carry out my cybercrime,” Liska says.