The potential impact of the ongoing worldwide data explosion continues to excite the imagination. A 2018 report estimated that every second of every day, every person produces 1.7 MB of data on average—and annual data creation has more than doubled since then and is projected to more than double again by 2025. A report from McKinsey Global Institute estimates that skillful uses of big data could generate an additional $3 trillion in economic activity, enabling applications as diverse as self-driving cars, personalized health care, and traceable food supply chains.
But adding all this data to the system is also creating confusion about how to find it, use it, manage it, and legally, securely, and efficiently share it. Where did a certain dataset come from? Who owns what? Who’s allowed to see certain things? Where does it reside? Can it be shared? Can it be sold? Can people see how it was used?
As data’s applications grow and become more ubiquitous, producers, consumers, and owners and stewards of data are finding that they don’t have a playbook to follow. Consumers want to connect to data they trust so they can make the best possible decisions. Producers need tools to share their data safely with those who need it. But technology platforms fall short, and there are no real common sources of truth to connect both sides.
How do we find data? When should we move it?
In a perfect world, data would flow freely like a utility accessible to all. It could be packaged up and sold like raw materials. It could be viewed easily, without complications, by anyone authorized to see it. Its origins and movements could be tracked, removing any concerns about nefarious uses somewhere along the line.
Today’s world, of course, does not operate this way. The massive data explosion has created a long list of issues and opportunities that make it tricky to share chunks of information.
With data being created nearly everywhere within and outside of an organization, the first challenge is identifying what is being gathered and how to organize it so it can be found.
A lack of transparency and sovereignty over stored and processed data and infrastructure opens up trust issues. Today, moving data to centralized locations from multiple technology stacks is expensive and inefficient. The absence of open metadata standards and widely accessible application programming interfaces can make it hard to access and consume data. The presence of sector-specific data ontologies can make it hard for people outside the sector to benefit from new sources of data. Multiple stakeholders and difficulty accessing existing data services can make it hard to share without a governance model.
Europe is taking the lead
Despite the issues, data-sharing projects are being undertaken on a grand scale. One that’s backed by the European Union and a nonprofit group is creating an interoperable data exchange called Gaia-X, where businesses can share data under the protection of strict European data privacy laws. The exchange is envisioned as a vessel to share data across industries and a repository for information about data services around artificial intelligence (AI), analytics, and the internet of things.
Hewlett Packard Enterprise recently announced a solution framework to support companies, service providers, and public organizations’ participation in Gaia-X. The dataspaces platform, which is currently in development and based on open standards and cloud native, democratizes access to data, data analytics, and AI by making them more accessible to domain experts and common users. It provides a place where experts from domain areas can more easily identify trustworthy datasets and securely perform analytics on operational data—without always requiring the costly movement of data to centralized locations.
By using this framework to integrate complex data sources across IT landscapes, enterprises will be able to provide data transparency at scale, so everyone—whether a data scientist or not—knows what data they have, how to access it, and how to use it in real time.
Data-sharing initiatives are also on the top of enterprises’ agendas. One important priority enterprises face is the vetting of data that’s being used to train internal AI and machine learning models. AI and machine learning are already being used widely in enterprises and industry to drive ongoing improvements in everything from product development to recruiting to manufacturing. And we’re just getting started. IDC projects the global AI market will grow from $328 billion in 2021 to $554 billion in 2025.
To unlock AI’s true potential, governments and enterprises need to better understand the collective legacy of all the data that is driving these models. How do AI models make their decisions? Do they have bias? Are they trustworthy? Have untrustworthy individuals been able to access or change the data that an enterprise has trained its model against? Connecting data producers to data consumers more transparently and with greater efficiency can help answer some of these questions.
Building data maturity
Enterprises aren’t going to solve how to unlock all of their data overnight. But they can prepare themselves to take advantage of technologies and management concepts that help to create a data-sharing mentality. They can ensure that they’re developing the maturity to consume or share data strategically and effectively rather than doing it on an ad hoc basis.
Data producers can prepare for wider distribution of data by taking a series of steps. They need to understand where their data is and understand how they’re collecting it. Then, they need to make sure the people who consume the data have the ability to access the right sets of data at the right times. That’s the starting point.
Then comes the harder part. If a data producer has consumers—which can be inside or outside the organization—they have to connect to the data. That’s both an organizational and a technology challenge. Many organizations want governance over data sharing with other organizations. The democratization of data—at least being able to find it across organizations—is an organizational maturity issue. How do they handle that?
Companies that contribute to the auto industry actively share data with vendors, partners, and subcontractors. It takes a lot of parts—and a lot of coordination—to assemble a car. Partners readily share information on everything from engines to tires to web-enabled repair channels. Automotive dataspaces can serve upwards of 10,000 vendors. But in other industries, it might be more insular. Some large companies might not want to share sensitive information even within their own network of business units.
Creating a data mentality
Companies on either side of the consumer-producer continuum can advance their data-sharing mentality by asking themselves these strategic questions:
- If enterprises are building AI and machine learning solutions, where are the teams getting their data? How are they connecting to that data? And how do they track that history to ensure trustworthiness and provenance of data?
- If data has value to others, what is the monetization path the team is taking today to expand on that value, and how will it be governed?
- If a company is already exchanging or monetizing data, can it authorize a broader set of services on multiple platforms—on premises and in the cloud?
- For organizations that need to share data with vendors, how is the coordination of those vendors to the same datasets and updates getting done today?
- Do producers want to replicate their data or force people to bring models to them? Datasets might be so large that they can’t be replicated. Should a company host software developers on its platform where its data is and move the models in and out?
- How can workers in a department that consumes data influence the practices of the upstream data producers within their organization?
The data revolution is creating business opportunities—along with plenty of confusion about how to search for, collect, manage, and gain insights from that data in a strategic way. Data producers and data consumers are becoming more disconnected with each other. HPE is building a platform supporting both on-premises and public cloud, using open source as the foundation and solutions like HPE Ezmeral Software Platform to provide the common ground both sides need to make the data revolution work for them.
Read the original article on Enterprise.nxt.
This content was produced by Hewlett Packard Enterprise. It was not written by MIT Technology Review’s editorial staff.
Human creators stand to benefit as AI rewrites the rules of content creation
A game-changer for content creation
Among the AI-related technologies to have emerged in the past several years is generative AI—deep-learning algorithms that allow computers to generate original content, such as text, images, video, audio, and code. And demand for such content will likely jump in the coming years—Gartner predicts that by 2025, generative AI will account for 10% of all data created, compared with 1% in 2022.
“Théâtre D’opéra Spatial” is an example of AI-generated content (AIGC), created with the Midjourney text-to-art generator program. Several other AI-driven art-generating programs have also emerged in 2022, capable of creating paintings from single-line text prompts. The diversity of technologies reflects a wide range of artistic styles and different user demands. DALL-E 2 and Stable Diffusion, for instance, are focused mainly on western-style artwork, while Baidu’s ERNIE-ViLG and Wenxin Yige produce images influenced by Chinese aesthetics. At Baidu’s deep learning developer conference Wave Summit+ 2022, the company announced that Wenxin Yige has been updated with new features, including turning photos into AI-generated art, image editing, and one-click video production.
Meanwhile, AIGC can also include articles, videos, and various other media offerings such as voice synthesis. A technology that generates audible speech indistinguishable from the voice of the original speaker, voice synthesis can be applied in many scenarios, including voice navigation for digital maps. Baidu Maps, for example, allows users to customize its voice navigation to their own voice just by recording nine sentences.
Recent advances in AI technologies have also created generative language models that can fluently compose texts with just one click. They can be used for generating marketing copy, processing documents, extracting summaries, and other text tasks, unlocking creativity that other technologies such as voice synthesis have failed to tap. One of the leading generative language models is Baidu’s ERNIE 3.0, which has been widely applied in various industries such as health care, education, technology, and entertainment.
“In the past year, artificial intelligence has made a great leap and changed its technological direction,” says Robin Li, CEO of Baidu. “Artificial intelligence has gone from understanding pictures and text to generating content.” Going one step further, Baidu App, a popular search and newsfeed app with over 600 million monthly users, including five million content creators, recently released a video editing feature that can produce a short video accompanied by a voiceover created from data provided in an article.
Improving efficiency and growth
As AIGC becomes increasingly common, it could make content creation more efficient by getting rid of repetitive, time-intensive tasks for creators such as sorting out source assets and voice recordings and rendering images. Aspiring filmmakers, for instance, have long had to pay their dues by spending countless hours mastering the complex and tedious process of video editing. AIGC may soon make that unnecessary.
Besides boosting efficiency, AIGC could also increase business growth in content creation amid rising demand for personalized digital content that users can interact with dynamically. InsightSLICE forecasts that the global digital creation market will on average grow 12% annually between 2020 and 2030 and hit $38.2 billion. With content consumption fast outpacing production, traditional development methods will likely struggle to meet such increasing demand, creating a gap that could be filled by AIGC. “AI has the potential to meet this massive demand for content at a tenth of the cost and a hundred times or thousands of times faster in the next decade,” Li says.
AI with humanity as its foundation
AIGC can also serve as an educational tool by helping children develop their creativity. StoryDrawer, for instance, is an AI-driven program designed to boost children’s creative thinking, which often declines as the focus in their education shifts to rote learning.
The Download: the West’s AI myth, and Musk v Apple
While the US and the EU may differ on how to regulate tech, their lawmakers seem to agree on one thing: the West needs to ban AI-powered social scoring.
As they understand it, social scoring is a practice in which authoritarian governments—specifically China—rank people’s trustworthiness and punish them for undesirable behaviors, such as stealing or not paying back loans. Essentially, it’s seen as a dystopian superscore assigned to each citizen.
The reality? While there have been some contentious local experiments with social credit scores in China, there is no countrywide, all-seeing social credit system with algorithms that rank people.
The irony is that while US and European politicians try to ban systems that don’t really exist, systems that do rank and penalize people are already in place in the West—and are denying people housing and jobs in the process. Read the full story.
Melissa’s story is from The Algorithm, her weekly AI newsletter covering all of the industry’s most interesting developments. Sign up to receive it in your inbox every Monday.
I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.
1 Apple has reportedly threatened to pull Twitter from the App Store
According to Elon Musk. (NYT $)
+ Musk has threatened to “go to war” with the company after it decided to stop advertising on Twitter. (WP $)
+ Apple’s reluctance to advertise on Twitter right now isn’t exactly unique. (Motherboard)
+ Twitter’s child protection team in Asia has been gutted. (Wired $)
2 Another crypto firm has collapsed
Lender BlockFi has filed for bankruptcy, and is (partly) blaming FTX. (WSJ $)
+ The company is suing FTX founder Sam Bankman-Fried. (FT $)
+ It looks like the much-feared “crypto contagion” is spreading. (NYT $)
3 AI is rapidly becoming more powerful—and dangerous
That’s particularly worrying when its growth is too much for safety teams to handle. (Vox)
+ Do AI systems need to come with safety warnings? (MIT Technology Review)
+ This AI chat-room game is gaining a legion of fans. (The Guardian)
4 A Pegasus spyware investigation is in danger of being compromised
It’s the target of a disinformation campaign, security experts have warned. (The Guardian)
+ Cyber insurance won’t protect you from theft of your data. (The Guardian)
5 Google gave the FBI geofence data for its January 6 investigation
Google identified more than 5,000 devices near the US Capitol during the riot. (Wired $)
6 Monkeypox isn’t going anywhere
But it’s not on the rise, either. (The Atlantic $)
+ The World Health Organization says it will now be known as mpox. (BBC)
+ Everything you need to know about the monkeypox vaccines. (MIT Technology Review)
7 What it’s like to be the unwitting face of a romance scam
James Scott Geras’ pictures have been used to catfish countless women. (Motherboard)
What’s next in cybersecurity
One of the reasons cyber hasn’t played a bigger role in the war, according to Carhart, is because “in the whole conflict, we saw Russia being underprepared for things and not having a good game plan. So it’s not really surprising that we see that as well in the cyber domain.”
Moreover, Ukraine, under the leadership of Zhora and his cybersecurity agency, has been working on its cyber defenses for years, and it has received support from the international community since the war started, according to experts. Finally, an interesting twist in the conflict on the internet between Russia and Ukraine was the rise of the decentralized, international cyber coalition known as the IT Army, which scored some significant hacks, showing that war in the future can also be fought by hacktivists.
Ransomware runs rampant again
This year, other than the usual corporations, hospitals, and schools, government agencies in Costa Rica, Montenegro, and Albania all suffered damaging ransomware attacks too. In Costa Rica, the government declared a national emergency, a first after a ransomware attack. And in Albania, the government expelled Iranian diplomats from the country—a first in the history of cybersecurity—following a destructive cyberattack.
These types of attacks were at an all-time high in 2022, a trend that will likely continue next year, according to Allan Liska, a researcher who focuses on ransomware at cybersecurity firm Recorded Future.
“[Ransomware is] not just a technical problem like an information stealer or other commodity malware. There are real-world, geopolitical implications,” he says. In the past, for example, a North Korean ransomware called WannaCry caused severe disruption to the UK’s National Health System and hit an estimated 230,000 computers worldwide.
Luckily, it’s not all bad news on the ransomware front. According to Liska, there are some early signs that point to “the death of the ransomware-as-a-service model,” in which ransomware gangs lease out hacking tools. The main reason, he said, is that whenever a gang gets too big, “something bad happens to them.”
For example, the ransomware groups REvil and DarkSide/BlackMatter were hit by governments; Conti, a Russian ransomware gang, unraveled internally when a Ukrainian researcher appalled by Conti’s public support of the war leaked internal chats; and the LockBit crew also suffered the leak of its code.
“We are seeing a lot of the affiliates deciding that maybe I don’t want to be part of a big ransomware group, because they all have targets on their back, which means that I might have a target on my back, and I just want to carry out my cybercrime,” Liska says.