In 2016, Dell Technologies commissioned our first Digital Transformation Index (DT Index) study to assess the digital maturity of businesses around the globe. We have since commissioned the study every two years to track businesses’ digital maturity.
Our third installment of the DT Index, launched in 2020 (the year of the pandemic), revealed that “data overload/unable to extract insights from data” was the third highest-ranking barrier to transformation, up from 11th place in 2016. That is a huge jump from the bottom to close to the top of the ranking of barriers to digital transformation.
These findings point to a curious paradox—data has the potential to become businesses’ number one barrier to transformation while also being their greatest asset. To learn more about why this paradox exists and where businesses need the most help, we commissioned a study with Forrester Consulting to dig deeper.
The resulting study, based on a survey with 4,036 senior decision-makers with responsibility for their companies’ data strategy, titled: Unveiling Data Challenges Afflicting Businesses Around the World, is available to read now.
Candidly, the study confirms our concerns: in this data decade, data has become both a burden and an advantage for many businesses—which one depends on how data-ready the business might be.
While Forrester identifies several data paradoxes hindering businesses today, three major contradictions stood out for me.
1. The perception paradox
Two-thirds of respondents would say their business is data-driven and state “data is the lifeblood of their organization.” But only 21% say they treat data as capital and prioritize its use across the business today.
Clearly, there’s a disconnect here. To provide some clarity, Forrester created an objective measure of businesses’ data readiness (see figure).
The results showed that 88% of businesses are yet to progress either their data technology and processes and/or their data culture and skills. In fact, only 12% of businesses are defined as Data Champions: companies that are actively engaged in both areas (technology/process and culture/skills).
2. The “want more than they can handle” paradox
The research also shows that businesses need more data, but they have too much data to handle right now: 70% say they are gathering data faster than they can analyze and use, yet 67% say they constantly need more data than their current capabilities provide.
While this is a paradox, it’s not all that surprising when you consider the research holistically, such as the proportion of companies that are yet to secure data advocacy at a Boardroom level and fall back to an IT strategy that can’t scale (i.e., bolting on more data lakes).
The implications of this paradox are profound and far-reaching. Six in 10 businesses are battling with data silos; 64% of respondents complain they have such a glut of data they can’t meet security and compliance requirements, and 61% say their teams are already overwhelmed by the data they have.
3. The “seeing without doing” paradox
While economies have suffered during the pandemic, the on-demand sector has expanded rapidly, igniting a new wave of data-first, data-anywhere businesses that pay for what they use and only use what they need—determined by the data that they generate and analyze.
Although these businesses are emerging, and doing very well, they’re still relatively small in number. Only 20% of businesses have moved the majority of their applications and infrastructure to an as-a-service model—even though more than 6 in 10 believe an as-a-service model would enable firms to be more agile, scale, and provision applications without complexity.
Achieving breakthrough together
The research is sobering,but there is hope on the horizon. Businesses are looking to revise their data strategies with a multi-cloud environment, by moving to a data-as-a-service model and automating data processes with machine learning.
Granted, they have a lot to do to prime the pumps for a proliferation of data. Still, there is a path forward, by firstly modernizing their IT infrastructure so they can meet data where it lives, at the edge. This incorporates bringing businesses’ infrastructure and applications closer to where data needs to be captured, analyzed and acted on–while avoiding data sprawl, by maintaining a consistent multi-cloud operating model.
Secondly, by optimizing data pipelines, so data can flow freely and securely while being augmented by AI/ML; and thirdly, by developing software to deliver the personalized, integrated experiences customers crave.
The staggering volume, variety and velocity of data may seem overpowering but with the right technology, processes and culture, businesses can tame the data beast, innovate with it, and create new value.
To learn more about the study, visit www.delltechnologies.com/dataparadox.
This content was produced by Dell Technologies. It was not written by MIT Technology Review’s editorial staff.
The Download: watermarking AI text, and freezing eggs
That’s why the team behind a new decision-making tool hope it will help to clear up some of the misconceptions around the procedure—and give would-be parents a much-needed insight into its real costs, benefits, and potential pitfalls. Read the full story.
This story is from The Checkup, MIT Technology Review’s weekly newsletter giving you the inside track on all things health and biotech. Sign up to receive it in your inbox every Thursday.
I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.
1 Elon Musk held a surprise meeting with US political leaders
Allegedly in the interest of ensuring Twitter is “fair to both parties.” (Insider $)
+ Kanye West’s presidential campaign advisors have been booted off Twitter. (Rolling Stone $)
+ Twitter’s trust and safety head is Musk’s biggest champion. (Bloomberg $)
2 We’re treating covid like flu now
Annual covid shots are the next logical step. (The Atlantic $)
3 The worst thing about Sam Bankman-Fried’s spell in jail?
Being cut off from the internet. (Forbes $)
+ Most crypto criminals use just five exchanges. (Wired $)
+ Collapsed crypto firmFTX has objected to a new investigation request. (Reuters)
4 Israel’s tech sector is rising up against its government
Tech workers fear its hardline policies will harm startups. (FT $)
5 It’s possible to power the world solely using renewable energy
At least, according to Stanford academic Mark Jacobson. (The Guardian)
+ Tech bros love the environment these days. (Slate $)
+ How new versions of solar, wind, and batteries could help the grid. (MIT Technology Review)
6 Generative AI is wildly expensive to run
And that’s why promising startups like OpenAI need to hitch their wagons to the likes of Microsoft. (Bloomberg $)
+ How Microsoft benefits from the ChatGPT hype. (Vox)
+ BuzzFeed is planning to make quizzes supercharged by OpenAI. (WSJ $)
+ Generative AI is changing everything. But what’s left when the hype is gone? (MIT Technology Review)
7 It’s hard not to blame self-driving cars for accidents
Even when it’s not technically their fault. (WSJ $)
8 What it’s like to swap Google for TikTok
It’s great for food suggestions and hacks, but hopeless for anything work-related. (Wired $)
+ The platform really wants to stay operational in the US. (Vox)
+ TikTok is mired in an eyelash controversy. (Rolling Stone $)
9 CRISPR gene editing kits are available to buy online
But there’s no guarantee these experiments will actually work. (Motherboard)
+ Next up for CRISPR: Gene editing for the masses? (MIT Technology Review)
10 Tech workers are livestreaming their layoffs
It’s a candid window into how these notoriously secretive companies treat their staff. (The Information $)
People are already using ChatGPT to create workout plans
Hitting the gym
Despite the variable quality of ChatGPT’s fitness tips, some people have actually been following its advice in the gym.
John Yu, a TikTok content creator based in the US, filmed himself following a six-day full-body training program courtesy of ChatGPT. He instructed it to give him a sample workout plan each day, tailored to which bit of his body he wanted to work (his arms, legs, etc), and then did the workout it gave him.
The exercises it came up with were perfectly fine, and easy enough to follow. However, Yu found that the moves lacked variety. “Strictly following what ChatGPT gives me is something I’m not really interested in,” he says.
Lee Lem, a bodybuilding content creator based in Australia, had a similar experience. He asked ChatGPT to create an “optimal leg day” program. It suggested the right sorts of exercises—squats, lunges, deadlifts, and so on—but the rest times between them were far too brief. “It’s hard!” Lem says, laughing. “It’s very unrealistic to only rest 30 seconds between squat sets.”
Lem hit on the core problem with ChatGPT’s suggestions: they fail to consider human bodies. As both he and Yu found out, repetitive movements quickly leave us bored or tired. Human coaches know to mix their suggestions up. ChatGPT has to be explicitly told.
For some, though, the appeal of an AI-produced workout is still irresistible—and something they’re even willing to pay for. Ahmed Mire, a software engineer based in London, is selling ChatGPT-produced plans for $15 each. People give him their workout goals and specifications, and he runs them through ChatGPT. He says he’s already signed up customers since launching the service last month and is considering adding the option to create diet plans too. ChatGPT is free, but he says people pay for the convenience.
What united everyone I spoke to was their decision to treat ChatGPT’s training suggestions as entertaining experiments rather than serious athletic guidance. They all had a good enough understanding of fitness, and what does and doesn’t work for their bodies, to be able to spot the model’s weaknesses. They all knew they needed to treat its answers skeptically. People who are newer to working out might be more inclined to take them at face value.
The future of fitness?
This doesn’t mean AI models can’t or shouldn’t play a role in developing fitness plans. But it does underline that they can’t necessarily be trusted. ChatGPT will improve and could learn to ask its own questions. For example, it might ask users if there are any exercises they hate, or inquire about any niggling injuries. But essentially, it can’t come up with original suggestions, and it has no fundamental understanding of the concepts it is regurgitating
How Roomba tester’s private images ended up on Facebook
A Roomba recorded a woman on the toilet. How did screenshots end up on social media?
This episode we go behind the scenes of an MIT Technology Review investigation that uncovered how sensitive photos taken by an AI powered vacuum were leaked and landed on the internet.
- A Roomba recorded a woman on the toilet. How did screenshots end up on Facebook?
- Roomba testers feel misled after intimate images ended up on Facebook
- Eileen Guo, MIT Technology Review
- Albert Fox Cahn, Surveillance Technology Oversight Project
This episode was reported by Eileen Guo and produced by Emma Cillekens and Anthony Green. It was hosted by Jennifer Strong and edited by Amanda Silverman and Mat Honan. This show is mixed by Garret Lang with original music from Garret Lang and Jacob Gorski. Artwork by Stephanie Arnett.
Jennifer: As more and more companies put artificial intelligence into their products, they need data to train their systems.
And we don’t typically know where that data comes from.
But sometimes just by using a product, a company takes that as consent to use our data to improve its products and services.
Consider a device in a home, where setting it up involves just one person consenting on behalf of every person who enters… and living there—or just visiting—might be unknowingly recorded.
I’m Jennifer Strong and this episode we bring you a Tech Review investigation of training data… that was leaked from inside homes around the world.
Jennifer: Last year someone reached out to a reporter I work with… and flagged some pretty concerning photos that were floating around the internet.
Eileen Guo: They were essentially, pictures from inside people’s homes that were captured from low angles, sometimes had people and animals in them that didn’t appear to know that they were being recorded in most cases.
Jennifer: This is investigative reporter Eileen Guo.
And based on what she saw… she thought the photos might have been taken by an AI powered vacuum.
Eileen Guo: They looked like, you know, they were taken from ground level and pointing up so that you could see whole rooms, the ceilings, whoever happened to be in them…
Jennifer: So she set to work investigating. It took months.
Eileen Guo: So first we had to confirm whether or not they came from robot vacuums, as we suspected. And from there, we also had to then whittle down which robot vacuum it came from. And what we found was that they came from the largest manufacturer, by the number of sales of any robot vacuum, which is iRobot, which produces the Roomba.
Jennifer: It raised questions about whether or not these photos had been taken with consent… and how they wound up on the internet.
In one of them, a woman is sitting on a toilet.
So our colleague looked into it, and she found the images weren’t of customers… they were Roomba employees… and people the company calls ‘paid data collectors’.
In other words, the people in the photos were beta testers… and they’d agreed to participate in this process… although it wasn’t totally clear what that meant.
Eileen Guo: They’re really not as clear as you would think about what the data is ultimately being used for, who it’s being shared with and what other protocols or procedures are going to be keeping them safe—other than a broad statement that this data will be safe.
Jennifer: She doesn’t believe the people who gave permission to be recorded, really knew what they agreed to.
Eileen Guo: They understood that the robot vacuums would be taking videos from inside their houses, but they didn’t understand that, you know, they would then be labeled and viewed by humans or they didn’t understand that they would be shared with third parties outside of the country. And no one understood that there was a possibility at all that these images could end up on Facebook and Discord, which is how they ultimately got to us.
Jennifer: The investigation found these images were leaked by some data labelers in the gig economy.
At the time they were working for a data labeling company (hired by iRobot) called Scale AI.
Eileen Guo: It’s essentially very low paid workers that are being asked to label images to teach artificial intelligence how to recognize what it is that they’re seeing. And so the fact that these images were shared on the internet, was just incredibly surprising, given how incredibly surprising given how sensitive they were.
Jennifer: Labeling these images with relevant tags is called data annotation.
The process makes it easier for computers to understand and interpret the data in the form of images, text, audio, or video.
And it’s used in everything from flagging inappropriate content on social media to helping robot vacuums recognize what’s around them.
Eileen Guo: The most useful datasets to train algorithms is the most realistic, meaning that it’s sourced from real environments. But to make all of that data useful for machine learning, you actually need a person to go through and look at whatever it is, or listen to whatever it is, and categorize and label and otherwise just add context to each bit of data. You know, for self driving cars, it’s, it’s an image of a street and saying, this is a stoplight that is turning yellow, this is a stoplight that is green. This is a stop sign.
Jennifer: But there’s more than one way to label data.
Eileen Guo: If iRobot chose to, they could have gone with other models in which the data would have been safer. They could have gone with outsourcing companies that may be outsourced, but people are still working out of an office instead of on their own computers. And so their work process would be a little bit more controlled. Or they could have actually done the data annotation in house. But for whatever reason, iRobot chose not to go either of those routes.
Jennifer: When Tech Review got in contact with the company—which makes the Roomba—they confirmed the 15 images we’ve been talking about did come from their devices, but from pre-production devices. Meaning these machines weren’t released to consumers.
Eileen Guo: They said that they started an investigation into how these images leaked. They terminated their contract with Scale AI, and also said that they were going to take measures to prevent anything like this from happening in the future. But they really wouldn’t tell us what that meant.
Jennifer: These days, the most advanced robot vacuums can efficiently move around the room while also making maps of areas being cleaned.
Plus, they recognize certain objects on the floor and avoid them.
It’s why these machines no longer drive through certain kinds of messes… like dog poop for example.
But what’s different about these leaked training images is the camera isn’t pointed at the floor…
Eileen Guo: Why do these cameras point diagonally upwards? Why do they know what’s on the walls or the ceilings? How does that help them navigate around the pet waste, or the phone cords or the stray sock or whatever it is. And that has to do with some of the broader goals that iRobot has and other robot vacuum companies has for the future, which is to be able to recognize what room it’s in, based on what you have in the home. And all of that is ultimately going to serve the broader goals of these companies which is create more robots for the home and all of this data is going to ultimately help them reach those goals.
Jennifer: In other words… This data collection might be about building new products altogether.
Eileen Guo: These images are not just about iRobot. They’re not just about test users. It’s this whole data supply chain, and this whole new point where personal information can leak out that consumers aren’t really thinking of or aware of. And the thing that’s also scary about this is that as more companies adopt artificial intelligence, they need more data to train that artificial intelligence. And where is that data coming from? Is.. is a really big question.
Jennifer: Because in the US, companies aren’t required to disclose that…and privacy policies usually have some version of a line that allows consumer data to be used to improve products and services… Which includes training AI. Often, we opt in simply by using the product.
Eileen Guo: So it’s a matter of not even knowing that this is another place where we need to be worried about privacy, whether it’s robot vacuums, or Zoom or anything else that might be gathering data from us.
Jennifer: One option we expect to see more of in the future… is the use of synthetic data… or data that doesn’t come directly from real people.
And she says companies like Dyson are starting to use it.
Eileen Guo: There’s a lot of hope that synthetic data is the future. It is more privacy protecting because you don’t need real world data. There have been early research that suggests that it is just as accurate if not more so. But most of the experts that I’ve spoken to say that that is anywhere from like 10 years to multiple decades out.
Jennifer: You can find links to our reporting in the show notes… and you can support our journalism by going to tech review dot com slash subscribe.
We’ll be back… right after this.
Albert Fox Cahn: I think this is yet another wake up call that regulators and legislators are way behind in actually enacting the sort of privacy protections we need.
Albert Fox Cahn: My name’s Albert Fox Cahn. I’m the Executive Director of the Surveillance Technology Oversight Project.
Albert Fox Cahn: Right now it’s the Wild West and companies are kind of making up their own policies as they go along for what counts as a ethical policy for this type of research and development, and, you know, quite frankly, they should not be trusted to set their own ground rules and we see exactly why with this sort of debacle, because here you have a company getting its own employees to sign these ludicrous consent agreements that are just completely lopsided. Are, to my view, almost so bad that they could be unenforceable all while the government is basically taking a hands off approach on what sort of privacy protection should be in place.
Jennifer: He’s an anti-surveillance lawyer… a fellow at Yale and with Harvard’s Kennedy School.
And he describes his work as constantly fighting back against the new ways people’s data gets taken or used against them.
Albert Fox Cahn: What we see in here are terms that are designed to protect the privacy of the product, that are designed to protect the intellectual property of iRobot, but actually have no protections at all for the people who have these devices in their home. One of the things that’s really just infuriating for me about this is you have people who are using these devices in homes where it’s almost certain that a third party is going to be videotaped and there’s no provision for consent from that third party. One person is signing off for every single person who lives in that home, who visits that home, whose images might be recorded from within the home. And additionally, you have all these legal fictions in here like, oh, I guarantee that no minor will be recorded as part of this. Even though as far as we know, there’s no actual provision to make sure that people aren’t using these in houses where there are children.
Jennifer: And in the US, it’s anyone’s guess how this data will be handled.
Albert Fox Cahn: When you compare this to the situation we have in Europe where you actually have, you know, comprehensive privacy legislation where you have, you know, active enforcement agencies and regulators that are constantly pushing back at the way companies are behaving. And you have active trade unions that would prevent this sort of a testing regime with a employee most likely. You know, it’s night and day.
Jennifer: He says having employees work as beta testers is problematic… because they might not feel like they have a choice.
Albert Fox Cahn: The reality is that when you’re an employee, oftentimes you don’t have the ability to meaningfully consent. You oftentimes can’t say no. And so instead of volunteering, you’re being voluntold to bring this product into your home, to collect your data. And so you’ll have this coercive dynamic where I just don’t think, you know, at, at, from a philosophical perspective, from an ethics perspective, that you can have meaningful consent for this sort of an invasive testing program by someone who is in an employment arrangement with the person who’s, you know, making the product.
Jennifer: Our devices already monitor our data… from smartphones to washing machines.
And that’s only going to get more common as AI gets integrated into more and more products and services.
Albert Fox Cahn: We see evermore money being spent on evermore invasive tools that are capturing data from parts of our lives that we once thought were sacrosanct. I do think that there is just a growing political backlash against this sort of technological power, this surveillance capitalism, this sort of, you know, corporate consolidation.
Jennifer: And he thinks that pressure is going to lead to new data privacy laws in the US. Partly because this problem is going to get worse.
Albert Fox Cahn: And when we think about the sort of data labeling that goes on the sorts of, you know, armies of human beings that have to pour over these recordings in order to transform them into the sorts of material that we need to train machine learning systems. There then is an army of people who can potentially take that information, record it, screenshot it, and turn it into something that goes public. And, and so, you know, I, I just don’t ever believe companies when they claim that they have this magic way of keeping safe all of the data we hand them, there’s this constant potential harm when we’re, especially when we’re dealing with any product that’s in its early training and design phase.
Jennifer: This episode was reported by Eileen Guo, produced by Emma Cillekens and Anthony Green, edited by Amanda Silverman and Mat Honan. And it’s mixed by Garret Lang, with original music from Garret Lang and Jacob Gorski.
Thanks for listening, I’m Jennifer Strong.