Connect with us

Tech

How Roomba tester’s private images ended up on Facebook

Published

on

How Roomba tester’s private images ended up on Facebook


A Roomba recorded a woman on the toilet. How did screenshots end up on social media?

This episode we go behind the scenes of an MIT Technology Review investigation that uncovered how sensitive photos taken by an AI powered vacuum were leaked and landed on the internet.

Reporting:

  • A Roomba recorded a woman on the toilet. How did screenshots end up on Facebook?
  • Roomba testers feel misled after intimate images ended up on Facebook

We meet:

  • Eileen Guo, MIT Technology Review
  • Albert Fox Cahn, Surveillance Technology Oversight Project

Credits:

This episode was reported by Eileen Guo and produced by Emma Cillekens and Anthony Green. It was hosted by Jennifer Strong and edited by Amanda Silverman and Mat Honan. This show is mixed by Garret Lang with original music from Garret Lang and Jacob Gorski. Artwork by Stephanie Arnett.

Full transcript:

[TR ID]

Jennifer: As more and more companies put artificial intelligence into their products, they need data to train their systems.

And we don’t typically know where that data comes from. 

But sometimes just by using a product, a company takes that as consent to use our data to improve its products and services. 

Consider a device in a home, where setting it up involves just one person consenting on behalf of every person who enters… and living there—or just visiting—might be unknowingly recorded.

I’m Jennifer Strong and this episode we bring you a Tech Review investigation of training data… that was leaked from inside homes around the world. 

[SHOW ID] 

Jennifer: Last year someone reached out to a reporter I work with… and flagged some pretty concerning photos that were floating around the internet. 

Eileen Guo: They were essentially, pictures from inside people’s homes that were captured from low angles, sometimes had people and animals in them that didn’t appear to know that they were being recorded in most cases.

Jennifer: This is investigative reporter Eileen Guo.

And based on what she saw… she thought the photos might have been taken by an AI powered vacuum. 

Eileen Guo: They looked like, you know, they were taken from ground level and pointing up so that you could see whole rooms, the ceilings, whoever happened to be in them…

Jennifer: So she set to work investigating. It took months.  

Eileen Guo: So first we had to confirm whether or not they came from robot vacuums, as we suspected. And from there, we also had to then whittle down which robot vacuum it came from. And what we found was that they came from the largest manufacturer, by the number of sales of any robot vacuum, which is iRobot, which produces the Roomba.

Jennifer: It raised questions about whether or not these photos had been taken with consent… and how they wound up on the internet. 

In one of them, a woman is sitting on a toilet.

So our colleague looked into it, and she found the images weren’t of customers… they were Roomba employees… and people the company calls ‘paid data collectors’.

In other words, the people in the photos were beta testers… and they’d agreed to participate in this process… although it wasn’t totally clear what that meant. 

Eileen Guo: They’re really not as clear as you would think about what the data is ultimately being used for, who it’s being shared with and what other protocols or procedures are going to be keeping them safe—other than a broad statement that this data will be safe.

Jennifer: She doesn’t believe the people who gave permission to be recorded, really knew what they agreed to. 

Eileen Guo: They understood that the robot vacuums would be taking videos from inside their houses, but they didn’t understand that, you know, they would then be labeled and viewed by humans or they didn’t understand that they would be shared with third parties outside of the country. And no one understood that there was a possibility at all that these images could end up on Facebook and Discord, which is how they ultimately got to us.

Jennifer: The investigation found these images were leaked by some data labelers in the gig economy.

At the time they were working for a data labeling company (hired by iRobot) called Scale AI.

Eileen Guo: It’s essentially very low paid workers that are being asked to label images to teach artificial intelligence how to recognize what it is that they’re seeing. And so the fact that these images were shared on the internet, was just incredibly surprising, given how incredibly surprising given how sensitive they were.

Jennifer: Labeling these images with relevant tags is called data annotation. 

The process makes it easier for computers to understand and interpret the data in the form of images, text, audio, or video.

And it’s used in everything from flagging inappropriate content on social media to helping robot vacuums recognize what’s around them. 

Eileen Guo: The most useful datasets to train algorithms is the most realistic, meaning that it’s sourced from real environments. But to make all of that data useful for machine learning, you actually need a person to go through and look at whatever it is, or listen to whatever it is, and categorize and label and otherwise just add context to each bit of data. You know, for self driving cars, it’s, it’s an image of a street and saying, this is a stoplight that is turning yellow, this is a stoplight that is green. This is a stop sign. 

Jennifer: But there’s more than one way to label data. 

Eileen Guo: If iRobot chose to, they could have gone with other models in which the data would have been safer. They could have gone with outsourcing companies that may be outsourced, but people are still working out of an office instead of on their own computers. And so their work process would be a little bit more controlled. Or they could have actually done the data annotation in house. But for whatever reason, iRobot chose not to go either of those routes.

Jennifer: When Tech Review got in contact with the company—which makes the Roomba—they confirmed the 15 images we’ve been talking about did come from their devices, but from pre-production devices. Meaning these machines weren’t released to consumers.

Eileen Guo: They said that they started an investigation into how these images leaked. They terminated their contract with Scale AI, and also said that they were going to take measures to prevent anything like this from happening in the future. But they really wouldn’t tell us what that meant.  

Jennifer: These days, the most advanced robot vacuums can efficiently move around the room while also making maps of areas being cleaned. 

Plus, they recognize certain objects on the floor and avoid them. 

It’s why these machines no longer drive through certain kinds of messes… like dog poop for example.

But what’s different about these leaked training images is the camera isn’t pointed at the floor…  

Eileen Guo: Why do these cameras point diagonally upwards? Why do they know what’s on the walls or the ceilings? How does that help them navigate around the pet waste, or the phone cords or the stray sock or whatever it is. And that has to do with some of the broader goals that iRobot has and other robot vacuum companies has for the future, which is to be able to recognize what room it’s in, based on what you have in the home. And all of that is ultimately going to serve the broader goals of these companies which is create more robots for the home and all of this data is going to ultimately help them reach those goals.

Jennifer: In other words… This data collection might be about building new products altogether.

Eileen Guo: These images are not just about iRobot. They’re not just about test users. It’s this whole data supply chain, and this whole new point where personal information can leak out that consumers aren’t really thinking of or aware of. And the thing that’s also scary about this is that as more companies adopt artificial intelligence, they need more data to train that artificial intelligence. And where is that data coming from? Is.. is a really big question.

Jennifer: Because in the US, companies aren’t required to disclose that…and privacy policies usually have some version of a line that allows consumer data to be used to improve products and services… Which includes training AI. Often, we opt in simply by using the product.

Eileen Guo: So it’s a matter of not even knowing that this is another place where we need to be worried about privacy, whether it’s robot vacuums, or Zoom or anything else that might be gathering data from us.

Jennifer: One option we expect to see more of in the future… is the use of synthetic data… or data that doesn’t come directly from real people. 

And she says companies like Dyson are starting to use it.

Eileen Guo: There’s a lot of hope that synthetic data is the future. It is more privacy protecting because you don’t need real world data. There have been early research that suggests that it is just as accurate if not more so. But most of the experts that I’ve spoken to say that that is anywhere from like 10 years to multiple decades out.

Jennifer: You can find links to our reporting in the show notes… and you can support our journalism by going to tech review dot com slash subscribe.

We’ll be back… right after this.

[MIDROLL]

Albert Fox Cahn: I think this is yet another wake up call that regulators and legislators are way behind in actually enacting the sort of privacy protections we need.

Albert Fox Cahn: My name’s Albert Fox Cahn. I’m the Executive Director of the Surveillance Technology Oversight Project.  

Albert Fox Cahn: Right now it’s the Wild West and companies are kind of making up their own policies as they go along for what counts as a ethical policy for this type of research and development, and, you know, quite frankly, they should not be trusted to set their own ground rules and we see exactly why with this sort of debacle, because here you have a company getting its own employees to sign these ludicrous consent agreements that are just completely lopsided. Are, to my view, almost so bad that they could be unenforceable all while the government is basically taking a hands off approach on what sort of privacy protection should be in place. 

Jennifer: He’s an anti-surveillance lawyer… a fellow at Yale and with Harvard’s Kennedy School.

And he describes his work as constantly fighting back against the new ways people’s data gets taken or used against them.

Albert Fox Cahn: What we see in here are terms that are designed to protect the privacy of the product, that are designed to protect the intellectual property of iRobot, but actually have no protections at all for the people who have these devices in their home. One of the things that’s really just infuriating for me about this is you have people who are using these devices in homes where it’s almost certain that a third party is going to be videotaped and there’s no provision for consent from that third party. One person is signing off for every single person who lives in that home, who visits that home, whose images might be recorded from within the home. And additionally, you have all these legal fictions in here like, oh, I guarantee that no minor will be recorded as part of this. Even though as far as we know, there’s no actual provision to make sure that people aren’t using these in houses where there are children.

Jennifer: And in the US, it’s anyone’s guess how this data will be handled.

Albert Fox Cahn: When you compare this to the situation we have in Europe where you actually have, you know, comprehensive privacy legislation where you have, you know, active enforcement agencies and regulators that are constantly pushing back at the way companies are behaving. And you have active trade unions that would prevent this sort of a testing regime with a employee most likely. You know, it’s night and day. 

Jennifer: He says having employees work as beta testers is problematic… because they might not feel like they have a choice.

Albert Fox Cahn: The reality is that when you’re an employee, oftentimes you don’t have the ability to meaningfully consent. You oftentimes can’t say no. And so instead of volunteering, you’re being voluntold to bring this product into your home, to collect your data. And so you’ll have this coercive dynamic where I just don’t think, you know, at, at, from a philosophical perspective, from an ethics perspective, that you can have meaningful consent for this sort of an invasive testing program by someone who is in an employment arrangement with the person who’s, you know, making the product.

Jennifer: Our devices already monitor our data… from smartphones to washing machines. 

And that’s only going to get more common as AI gets integrated into more and more products and services.

Albert Fox Cahn: We see evermore money being spent on evermore invasive tools that are capturing data from parts of our lives that we once thought were sacrosanct. I do think that there is just a growing political backlash against this sort of technological power, this surveillance capitalism, this sort of, you know, corporate consolidation.  

Jennifer: And he thinks that pressure is going to lead to new data privacy laws in the US. Partly because this problem is going to get worse.

Albert Fox Cahn: And when we think about the sort of data labeling that goes on the sorts of, you know, armies of human beings that have to pour over these recordings in order to transform them into the sorts of material that we need to train machine learning systems. There then is an army of people who can potentially take that information, record it, screenshot it, and turn it into something that goes public. And, and so, you know, I, I just don’t ever believe companies when they claim that they have this magic way of keeping safe all of the data we hand them, there’s this constant potential harm when we’re, especially when we’re dealing with any product that’s in its early training and design phase.

[CREDITS]

Jennifer: This episode was reported by Eileen Guo, produced by Emma Cillekens and Anthony Green, edited by Amanda Silverman and Mat Honan. And it’s mixed by Garret Lang, with original music from Garret Lang and Jacob Gorski.

Thanks for listening, I’m Jennifer Strong.

Tech

Fostering innovation through a culture of curiosity

Published

on

Fostering innovation through a culture of curiosity


And so I think a big part of it as a company, by setting these ambitious goals, it forces us to say if we want to be number one, if we want to be top tier in these areas, if we want to continue to generate results, how do we get there using technology? And so that really forces us to throw away our assumptions because you can’t follow somebody, if you want to be number one you can’t follow someone to become number one. And so we understand that the path to get there, it’s through, of course, technology and the software and the enablement and the investment, but it really is by becoming goal-oriented. And if we look at these examples of how do we create the infrastructure on the technology side to support these ambitious goals, we ourselves have to be ambitious in turn because if we bring a solution that’s also a me too, that’s a copycat, that doesn’t have differentiation, that’s not going to propel us, for example, to be a top 10 supply chain. It just doesn’t pass muster.

So I think at the top level, it starts with the business ambition. And then from there we can organize ourselves at the intersection of the business ambition and the technology trends to have those very rich discussions and being the glue of how do we put together so many moving pieces because we’re constantly scanning the technology landscape for new advancing and emerging technologies that can come in and be a part of achieving that mission. And so that’s how we set it up on the process side. As an example, I think one of the things, and it’s also innovation, but it doesn’t get talked about as much, but for the community out there, I think it’s going to be very relevant is, how do we stay on top of the data sovereignty questions and data localization? There’s a lot of work that needs to go into rethinking what your cloud, private, public, edge, on-premise look like going forward so that we can remain cutting edge and competitive in each of our markets while meeting the increasing guidance that we’re getting from countries and regulatory agencies about data localization and data sovereignty.

And so in our case, as a global company that’s listed in Hong Kong and we operate all around the world, we’ve had to really think deeply about the architecture of our solutions and apply innovation in how we can architect for a longer term growth, but in a world that’s increasingly uncertain. So I think there’s a lot of drivers in some sense, which is our corporate aspirations, our operating environment, which has continued to have a lot of uncertainty, and that really forces us to take a very sharp lens on what cutting edge looks like. And it’s not always the bright and shiny technology. Cutting edge could mean going to the executive committee and saying, Hey, we’re going to face a challenge about compliance. Here’s the innovation we’re bringing about architecture so that we can handle not just the next country or regulatory regime that we have to comply with, but the next 10, the next 50.

Laurel: Well, and to follow up with a bit more of a specific example, how does R&D help improve manufacturing in the software supply chain as well as emerging technologies like artificial intelligence and the industrial metaverse?

Art: Oh, I love this one because this is the perfect example of there’s a lot happening in the technology industry and there’s so much back to the earlier point of applied curiosity and how we can try this. So specifically around artificial intelligence and industrial metaverse, I think those go really well together with what are Lenovo’s natural strengths. Our heritage is as a leading global manufacturer, and now we’re looking to also transition to services-led, but applying AI and technologies like the metaverse to our factories. I think it’s almost easier to talk about the inverse, Laurel, which is if we… Because, and I remember very clearly we’ve mapped this out, there’s no area within the supply chain and manufacturing that is not touched by these areas. If I think about an example, actually, it’s very timely that we’re having this discussion. Lenovo was recognized just a few weeks ago at the World Economic Forum as part of the global lighthouse network on leading manufacturing.

And that’s based very much on applying around AI and metaverse technologies and embedding them into every aspect of what we do about our own supply chain and manufacturing network. And so if I pick a couple of examples on the quality side within the factory, we’ve implemented a combination of digital twin technology around how we can design to cost, design to quality in ways that are much faster than before, where we can prototype in the digital world where it’s faster and lower cost and correcting errors is more upfront and timely. So we are able to much more quickly iterate on our products. We’re able to have better quality. We’ve taken advanced computer vision so that we’re able to identify quality defects earlier on. We’re able to implement technologies around the industrial metaverse so that we can train our factory workers more effectively and better using aspects of AR and VR.

And we’re also able to, one of the really important parts of running an effective manufacturing operation is actually production planning, because there’s so many thousands of parts that are coming in, and I think everyone who’s listening knows how much uncertainty and volatility there have been in supply chains. So how do you take such a multi-thousand dimensional planning problem and optimize that? Those are things where we apply smart production planning models to keep our factories fully running so that we can meet our customer delivery dates. So I don’t want to drone on, but I think literally the answer was: there is no place, if you think about logistics, planning, production, scheduling, shipping, where we didn’t find AI and metaverse use cases that were able to significantly enhance the way we run our operations. And again, we’re doing this internally and that’s why we’re very proud that the World Economic Forum recognized us as a global lighthouse network manufacturing member.

Laurel: It’s certainly important, especially when we’re bringing together computing and IT environments in this increasing complexity. So as businesses continue to transform and accelerate their transformations, how do you build resiliency throughout Lenovo? Because that is certainly another foundational characteristic that is so necessary.

Continue Reading

Tech

The Download: covid’s origin drama, and TikTok’s uncertain future

Published

on

The Download: covid’s origin drama, and TikTok’s uncertain future


This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

Newly-revealed coronavirus data has reignited a debate over the virus’s origins

This week, we’ve seen the resurgence of a debate that has been swirling since the start of the pandemic—where did the virus that causes covid-19 come from?

For the most part, scientists have maintained that the virus probably jumped from an animal to a human at the Huanan Seafood Market in Wuhan at some point in late 2019. But some claim that the virus leaped from humans to animals, rather than the other way around. And many continue to claim that the virus somehow leaked from a nearby laboratory that was studying coronaviruses in bats.  

Data collected in 2020—and kept from public view since then—potentially adds weight to the animal theory. It highlights a potential suspect: the raccoon dog. But exactly how much weight it adds depends on who you ask. Read the full story.

—Jessica Hamzelou

This story is from The Checkup, Jessica’s weekly biotech newsletter. Sign up to receive it in your inbox every Thursday.

Read more of MIT Technology Review’s covid reporting:

+ Our senior biotech editor Antonio Regalado investigated the origins of the coronavirus behind covid-19 in his five-part podcast series Curious Coincidence.

+ Meet the scientist at the center of the covid lab leak controversy. Shi Zhengli has spent years at the Wuhan Institute of Virology researching coronaviruses that live in bats. Her work has come under fire as the world tries to understand where covid-19 came from. Read the full story.

+ This scientist now believes covid started in Wuhan’s wet market. Here’s why. Michael Worobey of the University of Arizona, believes that a spillover of the virus from animals at the Huanan Seafood market was almost certainly behind the origin of the pandemic. Read the full story.

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 TikTok’s future in the US is hanging in the balance
Banning it is a colossal challenge, and officials still lack the legal authority to do so. (WP $)
+ TikTok CEO Shou Zi Chew was grilled by a congressional committee. (FT $)
+ He told lawmakers the company would earn their trust. (WSJ $)
+ Meanwhile, TikTok paid for influencers to travel to DC to lobby its cause. (Wired $)

2 A crypto fugitive has been arrested in Montenegro
Do Kwon has been on the run since TerraUSD stablecoin collapsed last year. (WSJ $)
+ Want to mine Bitcoin? Get yourself to Texas. (Reuters)
+ What’s next for crypto. (MIT Technology Review)

3 Twitter’s getting rid of its legacy blue checks
On the entirely serious date of April 1. (The Verge)+ The platform’s still an unattractive prospect for advertisers. (Vox)

4 Chatbots are having tough conversations for us
ChatGPT is adept at writing scripts for sensitive talks with kids and colleagues. (NYT $)
+ OpenAI has given ChatGPT access to the web’s live data. (The Verge)
+ How Character.AI became a billion-dollar unicorn. (WSJ $)
+ The inside story of how ChatGPT was built from the people who made it. (MIT Technology Review)

5 Jack Dorsey’s Block has been accused of fraudulent transactions
The payments company denied it, and claims it inflated its users numbers, too.(FT $)
+ Dorsey doesn’t have a track record of caring about this kind of thing. (The Information $)

6 Homeowners associations are secretly installing surveillance systems
The system tracks license plates and follows residents’ movements. (The Intercept)

7 Inside the tricky ethics of using DNA to solve crimes
A new database could help to protect users’ privacy. (Wired $)|
+ The citizen scientist who finds killers from her couch. (MIT Technology Review)

8 There’s plenty of reasons to be optimistic about the climate
Healthier, more sustainable diets are a good place to start. (Scientific American)
+ Taking stock of our climate past, present, and future. (MIT Technology Review)

9 TikTok keeps hectoring us
It seems we just can’t get enough of being aggressively told what to do. (Vox)

10 Don’t get scammed by a deepfake
CallerID can’t be trusted to protect you from rogue AI calls. (Gizmodo)

Quote of the day

“Wait, I need content.”

—TikTok fashion creator Kristine Thompson refuses to miss a content opportunity during a trip to the US Capitol to lobby against a potential TikTok ban, she tells the New York Times.

The big story

This sci-fi blockchain game could help create a metaverse that no one owns

November 2022

Dark Forest is a vast universe, and most of it is shrouded in darkness. Your mission, should you choose to accept it, is to venture into the unknown, avoid being destroyed by opposing players who may be lurking in the dark, and build an empire of the planets you discover and can make your own.

But while the video game seemingly looks and plays much like other online strategy games, it doesn’t rely on the servers running other popular online strategy games. And it may point to something even more profound: the possibility of a metaverse that isn’t owned by a big tech company. Read the full story.

—Mike Orcutt

We can still have nice things

A place for comfort, fun and distraction in these weird times. (Got any ideas? Drop me a line or tweet ’em at me.)

+ If underwater terrors are your thing, Joe Romiero takes some seriously impressive shark pictures and videos.
+ Try as it might, Ted Lasso’s British dialog falls wide of the mark.
+ Let’s have a good old snoop around some celebrities’ bedrooms.
+ Why we can’t get enough of those fancy candles.
+ Interviewing animals with a tiny microphone, it doesn’t get much better than that.



Continue Reading

Tech

Taking stock of our climate past, present, and future

Published

on

Taking stock of our climate past, present, and future


Before you say anything, I do know that it is, in fact, nearly April. But this week has the distinct feeling of a sort of climate change New Year’s to me. Not only is it the spring equinox this week, which is celebrated as the new year in some cultures (Happy Nowruz!), but we also saw a big UN climate report drop on Monday, which has me in a very contemplative mood.

The report comes from the UN Intergovernmental Panel on Climate Change (IPCC), a group of scientists that releases reports about the state of climate change research. 

The IPCC works in seven-year cycles, give or take. Each cycle, the group looks at all the published literature on climate change and puts together a handful of reports on different topics, leading up to a synthesis report that sums it all up. This week’s release was one of those synthesis reports. It follows one from 2014, and we should see another one around 2030. 

Because these reports are a sort of summary of existing research, I’ve been thinking about this moment as a time to reflect. So for the newsletter this week, I thought we could get in the new year’s spirit and take a look at where we’ve come from, where we are, and where we’re going on climate change. 

Climate past: 2014

Let’s start in 2014. The concentration of carbon dioxide in the atmosphere was just under 400 parts per million. The song “Happy” by Pharrell Williams was driving me slowly insane. And in November, the IPCC released its fifth synthesis report. 

Some bits of the 2014 IPCC synthesis report feel familiar. Its authors clearly laid out the case that human activity was causing climate change, adaptation wasn’t going to cut it, and the world would need to take action to limit greenhouse-gas emissions. I saw all those same lines in this year’s report. 

But there are also striking differences.  

First, we were in a different place politically. World leaders hadn’t yet signed the Paris agreement, the landmark treaty that set a goal to limit global warming to 2 °C (3.6 °F) above preindustrial levels, with a target of 1.5 °C (2.7 °F). The 2014 assessment report laid the groundwork for that agreement. 

Continue Reading

Copyright © 2021 Seminole Press.