AI Study: A managed team labels data with 25% higher quality than crowdsourcing

Thứ Năm, 20 tháng 6, 2019

AI Study: A managed team labels data with 25% higher quality than crowdsourcing

Read more useful articles at: Tech Deeps

Presented by CloudFactory

A study released at the 2019 Open Data Science Conference (ODSC) in Boston demonstrated that managed teams outperformed crowdsourced workers on accuracy and overall cost on a series of the same data labeling tasks. Data science platform developer Hivemind hired a managed workforce and a leading crowdsourcing platform workforce to determine which team delivered higher quality, and at what relative cost.

Data labeling and the 'race to useable data'

If you're building AI anywhere in your organization, you're in a "race to useable data," according to a 2019 report released by Cognilytica. In its report, the analyst firm specializing in AI, evaluated requirements for data preparation, engineering, and labeling solutions. They found 80 percent of AI project time is spent on aggregating, cleaning, labeling, and augmenting data to be used in machine learning models (ML). That leaves just 20 percent for the activities that drive strategic value: algorithm development, model training and tuning, and ML operationalization.

Data labeling is a repetitive but critical activity that requires varying attention to detail, depending on the use case. For example, consider the importance of high accuracy in data labeling that will train a system to navigate city streets safely. Data labeling can tie up the time and attention of some of your highest-paid resources, such as data scientists and ML engineers.

You can deploy people more strategically with a virtual data production line, but like any well-executed strategy, there will be important tradeoffs to consider. Depending on the question you want your data to answer, you could use crowdsourcing or a managed service. Each workforce option comes with advantages and disadvantages. Hivemind designed its study to understand these dynamics in more detail. This article outlines key takeaways from the study, and you can download the full report here.

Same series of tasks, two data labeling workforces

Hivemind hired a managed workforce and anonymous workers from a leading crowdsourcing platform to complete a series of the same data labeling tasks, ranging from basic to more complicated, to determine which team delivered the highest-quality structured datasets and at what relative cost.

Task A: Easy transcription

Workers were asked to open a PDF, locate three trade numbers, and transcribe them. The crowdsourced workforce transcribed at least one of the numbers incorrectly in 7 percent of cases. When compensation was doubled, this error rate fell to just under 5 percent. The managed workers only made a mistake in 0.4 percent of cases, a significant difference, both in a statistical sense and in a practical sense, given its implication for data quality. Overall, on this task, the crowdsourced workers had an error rate of more than 10x the managed workforce.

Comparison of accuracy on data labeling between a managed workforce and a leading crowdsourcing platform workforce. CloudFactory.

Task B: Sentiment analysis

Workers were presented with the text of a company review from a review website and asked to rate the sentiment of the review from one to five. Actual ratings, or what we call ground truth, were removed. Managed workers had consistent accuracy, getting the rating correct in about 50 percent of cases. The crowdsourced workers seemed to have a problem, particularly with poor reviews. Their accuracy was almost 20 percent, essentially the same as guessing, for 1- and 2-star reviews. For 4- and 5-star reviews, there was little difference between the workforce types.

Task C: Extracting information from unstructured text

Workers were given a title and description of a product recall and asked to classify the recall by hazard type using a drop-down menu of 11 options, including "other" and "not enough information provided." The crowdsourced workers achieved accuracy of 50 percent to 60 percent, regardless of the recall's word count. The managed workers achieved higher accuracy, 75 percent to 85 percent. The managed workers' accuracy was 25 percent higher than that of the crowdsourced team.

Comparison of product recall hazard classification on data labeling between a managed workforce and a leading crowdsourcing platform workforce. CloudFactory.

Your odds in the race to useable data might just come down to your workforce choice. As you evaluate data labeling providers, be sure to ask about their track record for data quality, speed, and ability to scale as your datasets grow. By strategically deploying people and technology, you can accelerate innovation and your speed to market.

Download the full report, Crowd vs. Managed Team: A Study on Quality Data Processing at Scale, for details about methodology, workforce cost, and study results. Contact us about questions or feedback, or help you need with data labeling.

Damian Rochman is VP of Products and Platform Strategy, CloudFactory.

Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they're always clearly marked. Content produced by our editorial team is never influenced by advertisers or sponsors in any way. For more information, contact sales@venturebeat.com.

VentureBeat

Read more useful articles at: Tech Deeps

Tech Deeps | Technology News, Reviews, Science and Tricks

Thứ Năm, 20 tháng 6, 2019