Developing the capacity to annotate massive volumes of data while maintaining quality is a function of the model development lifecycle that enterprises often underestimate. It’s resource intensive and requires specialized expertise.
At the heart of any successful machine learning/artificial intelligence (ML/AI) initiative is a commitment to high-quality training data and a pathway to quality data that is proven and well-defined. Without this quality data pipeline, the initiative is doomed to fail.
Computer vision or data science teams often turn to external partners to develop their data training pipeline, and these partnerships drive model performance.
There is no one definition of quality: “quality data” is completely contingent on the specific computer vision or machine learning project. However, there is a general process all teams can follow when working with an external partner, and this path to quality data can be broken down into four prioritized phases.
Annotation criteria and quality requirements
Training data quality is an evaluation of a data set’s fitness to serve its purpose in a given ML/AI use case.
The computer vision team needs to establish an unambiguous set of rules that describe what quality means in the context of their project. Annotation criteria are the collection of rules that define which objects to annotate, how to annotate them correctly, and what the quality targets are.
Accuracy or quality targets define the lowest acceptable result for evaluation metrics like accuracy, recall, precision, F1 score, et cetera. Typically, a computer vision team will have quality targets for how accurately objects of interest were classified, how accurately objects were localized, and how accurately relationships between objects were identified.
Workforce training and platform configuration
Platform configuration. Task design and workflow setup require time and expertise, and accurate annotation requires task-specific tools. At this stage, data science teams need a partner with expertise to help them determine how best to configure labeling tools, classification taxonomies, and annotation interfaces for accuracy and throughput.
Worker testing and scoring. To accurately label data, annotators need a well-designed training curriculum so they fully understand the annotation criteria and domain context. The annotation platform or external partner should ensure accuracy by actively tracking annotator proficiency against gold data tasks or when a judgement is modified by a higher-skilled worker or admin.
Ground truth or gold data. Ground truth data is crucial at this stage of the process as the baseline to score workers and measure output quality. Many computer vision teams are already working with a ground truth data set.
Sources of authority and quality assurance
There is no one-size-fits-all quality assurance (QA) approach that will meet the quality standards of all ML use cases. Specific business objectives, as well as the risk associated with an under-performing model, will drive quality requirements. Some projects reach target quality using multiple annotators. Others require complex reviews against ground truth data or escalation workflows with verification from a subject matter expert.
There are two primary sources of authority that can be used to measure the quality of annotations and that are used to score workers: gold data and expert review.
- Gold data: The gold data or ground truth set of records can be used both as a qualification tool for testing and scoring workers at the outset of the process and also as the measure for output quality. When you use gold data to measure quality, you compare worker annotations to your expert annotations for the same data set, and the difference between these two independent, blind answers can be used to produce quantitative measurements like accuracy, recall, precision, and F1 scores.
- Expert review: This method of quality assurance relies on expert review from a highly skilled worker, an admin, or from an expert on the customer side, sometimes all three. It can be used in conjunction with gold data QA. The expert reviewer looks at the answer given by the qualified worker and either approves it or makes corrections as needed, producing a new correct answer. Initially, an expert review may take place for every single instance of labeled data, but over time, as worker quality improves, expert review can utilize random sampling for ongoing quality control.
Iterating on data success
Once a computer vision team has successfully launched a high quality training data pipeline, it can accelerate progress to a production ready model. Through ongoing support, optimization, and quality control, an external partner can help them:
- Track velocity: In order to scale effectively, it’s good to measure annotation throughput. How long is it taking data to move through the process? Is the process getting faster?
- Tune worker training: As the project scales, labeling and quality requirements may evolve. This necessitates ongoing workforce training and scoring.
- Train on edge cases: Over time, training data should include more and more edge cases in order to make your model as accurate and robust as possible.
Without high-quality training data, even the best funded, most ambitious ML/AI projects cannot succeed. Computer vision teams need partners and platforms they can trust to deliver the data quality they need and to power life-changing ML/AI models for the world.
Alegion is the proven partner to build the training data pipeline that will fuel your model throughout its lifecycle. Contact Alegion at email@example.com.
This content was produced by Alegion. It was not written by MIT Technology Review’s editorial staff.
Why can’t tech fix its gender problem?
Not competing in this Olympics, but still contributing to the industry’s success, were the thousands of women who worked in the Valley’s microchip fabrication plants and other manufacturing facilities from the 1960s to the early 1980s. Some were working-class Asian- and Mexican-Americans whose mothers and grandmothers had worked in the orchards and fruit canneries of the prewar Valley. Others were recent migrants from the East and Midwest, white and often college educated, needing income and interested in technical work.
With few other technical jobs available to them in the Valley, women would work for less. The preponderance of women on the lines helped keep the region’s factory wages among the lowest in the country. Women continue to dominate high-tech assembly lines, though now most of the factories are located thousands of miles away. In 1970, one early American-owned Mexican production line employed 600 workers, nearly 90% of whom were female. Half a century later the pattern continued: in 2019, women made up 90% of the workforce in one enormous iPhone assembly plant in India. Female production workers make up 80% of the entire tech workforce of Vietnam.
Venture: “The Boys Club”
Chipmaking’s fiercely competitive and unusually demanding managerial culture proved to be highly influential, filtering down through the millionaires of the first semiconductor generation as they deployed their wealth and managerial experience in other companies. But venture capital was where semiconductor culture cast its longest shadow.
The Valley’s original venture capitalists were a tight-knit bunch, mostly young men managing older, much richer men’s money. At first there were so few of them that they’d book a table at a San Francisco restaurant, summoning founders to pitch everyone at once. So many opportunities were flowing it didn’t much matter if a deal went to someone else. Charter members like Silicon Valley venture capitalist Reid Dennis called it “The Group.” Other observers, like journalist John W. Wilson, called it “The Boys Club.”
The venture business was expanding by the early 1970s, even though down markets made it a terrible time to raise money. But the firms founded and led by semiconductor veterans during this period became industry-defining ones. Gene Kleiner left Fairchild Semiconductor to cofound Kleiner Perkins, whose long list of hits included Genentech, Sun Microsystems, AOL, Google, and Amazon. Master intimidator Don Valentine founded Sequoia Capital, making early-stage investments in Atari and Apple, and later in Cisco, Google, Instagram, Airbnb, and many others.
Generations: “Pattern recognition”
Silicon Valley venture capitalists left their mark not only by choosing whom to invest in, but by advising and shaping the business sensibility of those they funded. They were more than bankers. They were mentors, professors, and father figures to young, inexperienced men who often knew a lot about technology and nothing about how to start and grow a business.
“This model of one generation succeeding and then turning around to offer the next generation of entrepreneurs financial support and managerial expertise,” Silicon Valley historian Leslie Berlin writes, “is one of the most important and under-recognized secrets to Silicon Valley’s ongoing success.” Tech leaders agree with Berlin’s assessment. Apple cofounder Steve Jobs—who learned most of what he knew about business from the men of the semiconductor industry—likened it to passing a baton in a relay race.
Predicting the climate bill’s effects is harder than you might think
Human decision-making can also cause models and reality to misalign. “People don’t necessarily always do what is, on paper, the most economic,” says Robbie Orvis, who leads the energy policy solutions program at Energy Innovation.
This is a common issue for consumer tax credits, like those for electric vehicles or home energy efficiency upgrades. Often people don’t have the information or funds needed to take advantage of tax credits.
Likewise, there are no assurances that credits in the power sectors will have the impact that modelers expect. Finding sites for new power projects and getting permits for them can be challenging, potentially derailing progress. Some of this friction is factored into the models, Orvis says. But there’s still potential for more challenges than modelers expect.
Putting too much stock in results from models can be problematic, says James Bushnell, an economist at the University of California, Davis. For one thing, models could overestimate how much behavior change is because of tax credits. Some of the projects that are claiming tax credits would probably have been built anyway, Bushnell says, especially solar and wind installations, which are already becoming more widespread and cheaper to build.
Still, whether or not the bill meets the expectations of the modelers, it’s a step forward in providing climate-friendly incentives, since it replaces solar- and wind-specific credits with broader clean-energy credits that will be more flexible for developers in choosing which technologies to deploy.
Another positive of the legislation is all its long-term investments, whose potential impacts aren’t fully captured in the economic models. The bill includes money for research and development of new technologies like direct air capture and clean hydrogen, which are still unproven but could have major impacts on emissions in the coming decades if they prove to be efficient and practical.
Whatever the effectiveness of the Inflation Reduction Act, however, it’s clear that more climate action is still needed to meet emissions goals in 2030 and beyond. Indeed, even if the predictions of the modelers are correct, the bill is still not sufficient for the US to meet its stated goals under the Paris agreement of cutting emissions to half of 2005 levels by 2030.
The path ahead for US climate action isn’t as certain as some might wish it were. But with the Inflation Reduction Act, the country has taken a big step. Exactly how big is still an open question.
China has censored a top health information platform
The suspension has met with a gleeful social reaction among nationalist bloggers, who accuse DXY of receiving foreign funding, bashing traditional Chinese medicine, and criticizing China’s health-care system.
DXY is one of the front-runners in China’s digital health startup scene. It hosts the largest online community Chinese doctors use to discuss professional topics and socialize. It also provides a medical news service for a general audience, and it is widely seen as the most influential popular science publication in health care.
“I think no one, as long as they are somewhat related to the medical profession, doesn’t follow these accounts [of DXY],” says Zhao Yingxi, a global health researcher and PhD candidate at Oxford University, who says he followed DXY’s accounts on WeChat too.
But in the increasingly polarized social media environment in China, health care is becoming a target for controversy. The swift conclusion that DXY’s demise was triggered by its foreign ties and critical work illustrates how politicized health topics have become.
Since its launch in 2000, DXY has raised five rounds of funding from prominent companies like Tencent and venture capital firms. But even that commercial success has caused it trouble this week. One of its major investors, Trustbridge Partners, raises funds from sources like Columbia University’s endowments and Singapore’s state holding company Temasek. After DXY’s accounts were suspended, bloggers used that fact to try to back up their claim that DXY has been under foreign influence all along.
Part of the reason the suspension is so shocking is that DXY is widely seen as one of the most trusted online sources for health education in China. During the early days of the covid-19 pandemic, it compiled case numbers and published a case map that was updated every day, becoming the go-to source for Chinese people seeking to follow covid trends in the country. DXY also made its name by taking down several high-profile fraudulent health products in China.
It also hasn’t shied away from sensitive issues. For example, on the International Day Against Homophobia, Transphobia, and Biphobia in 2019, it published the accounts of several victims of conversion therapy and argued that the practice is not backed by medical consensus.
“The article put survivors’ voices front and center and didn’t tiptoe around the disturbing reality that conversion therapy is still prevalent and even pushed by highly ranked public hospitals and academics,” says Darius Longarino, a senior fellow at Yale Law School’s Paul Tsai China Center.