Categories
Audio Posts In English

Side Hustle or Scam? What to Know About Data Annotation Work


Scale AI Illustrations As CEO Says US Risks Losing AI 'Ammunition' Edge to China

On TikTok, Reddit, and elsewhere, posts are popping up from users claiming they’re making $20 per hour—or more—completing small tasks in their spare time on sites such as DataAnnotation.tech, Taskup.ai, Remotasks, and Amazon Mechanical Turk.

As companies have rushed to build AI models, the demand for “data annotation” and “data labeling” work has increased. Workers complete tasks such as writing and coding, which tech companies then use to develop artificial intelligence systems, which are trained using large numbers of example data points. Some models require all of their input data to be labeled by humans, a technique referred to as “supervised learning.” And while “unsupervised learning,” in which AI models are fed unlabeled data, is becoming increasingly popular, AI systems trained using unsupervised learning still often require a final step involving data labeled by humans.

[time-brightcove not-tgx=”true”]

There are no precise estimates of how many people engage in data annotation work. A 2022 Google Research paper approximates the number to be in the millions, and that in future that could grow to be billions. A 2021 study estimated that 163 million people have made profiles on online labor platforms, 14 million of whom have obtained work through the platform at least once, and 3.3 million of whom have completed at least 10 projects or earned at least $1,000. (Though this number is likely to be an overestimate for data annotation, because not all work carried out on online labor platforms is data annotation work.)

Data annotation sites, often subsidiaries of larger companies, can offer legitimate avenues for earning money. As the AI industry continues to grow, demand for human labellers has grown with it. But potential users should be aware that the data labeling industry is poorly regulated, and because the industry is opaque, it can be difficult to navigate. Here’s what to know.

How does someone get started in data annotation?

To qualify for the programs, workers must begin by completing an assessment. The duration of the initial assessment can vary, but users commonly report times as short as an hour and as long as three hours. If a user passes the assessment, they should start to receive invitations for paid work through the site. If the user isn’t accepted into the program, they typically don’t hear anything after completion of the assessment. 

Tasks on the assessment can vary in nature. There is a trend towards more highly-skilled data annotation work, says Sonam Jindal, who leads the AI, Labor and the Economy program at the Partnership on AI, a nonprofit. “We’re going to start seeing that as you have a need to have higher quality AI models, you also need higher quality data,” she says. “We can figure out if something is a cat or a dog, that’s great. Moving on to more advanced tasks—to have more advanced AI that is useful in more specialized real world scenarios—you will need more specialized skill sets for that.”

How much money does the work pay? 

In the U.S., sites often offer around $20 per hour for tasks such as labeling photos and completing writing exercises. More specialized data annotation work can provide higher pay. For example, DataAnnotation.tech offers $40 for coding tasks, and Outlier.ai offers $60 per hour for chemistry tasks.

Outside of the U.S., data labellers are typically paid a lot less, says Jindal. But despite the higher price tag, there are reasons companies may prefer U.S.-based workers, such as tasks that require specific cultural knowledge or skills that are prevalent in the U.S. 

What have people’s experiences been like? 

On online discussion boards, users report a wide range of experiences with data annotation work. Many describe positive experiences—straightforward onboarding processes, an ample supply of tasks, and good pay.

“I have been working at [DataAnnotation.Tech] for almost 2 years,” one user writes. “You make money by the task or by the hour, depending on the project. They pay via PayPal. I have only worked very part-time in the past couple of years and am nearing the $3k mark. In all honesty, I quit for quite a while during my full-time job, but am back at it. I am currently working on two projects, one for $20 per hour and one for $25 per hour. I am making about $400-$500 a week. This is not permanent, as tasks come and go, but it is a great side income to work on if you needed extra work from a laptop or computer.”

Read More: The Workers Behind AI Rarely See Its Rewards. This Indian Startup Wants to Fix That

But some report less positive experiences, such as being told they had passed the assessment, but then never being offered any tasks. More worryingly, some users report their accounts being deactivated with large amounts of earnings yet to be paid out. One user writes that their account was deactivated with $2,869 worth of work unpaid, and that they emailed the companies’ support contacts, but did not hear back.

Data annotation sites often use algorithmic management to keep their costs low, which can result in the poor treatment that many workers experience, says Milagros Miceli, who leads the Data, Algorithmic Systems, and Ethics research group at Weizenbaum-Institut in Berlin. And because the data annotation industry is poorly regulated, companies rarely face consequences for substandard treatment of workers, she says.

Amazon.com Illustrations Ahead Of Earnings Figures

What is the data used for?

Some companies, such as Amazon Mechanical Turk and Upwork, operate in a relatively transparent manner, with the same brand for both purchasers of data labeling labor and for workers. But others don’t. Remotasks is the worker-facing subsidiary of data labeling provider Scale AI, a multi-billion dollar San Francisco-based business with clients including OpenAI, Meta, and the U.S. military. Similarly, Taskup.ai, DataAnnotation.tech, and Gethybrid.io are reportedly subsidiaries of Surge AI, another data labeling provider that serves clients including Anthropic and Microsoft.

Companies say that this secrecy is required to protect sensitive commercial information, such as new product development plans, from leaking, says Miceli. But they also prefer secrecy because it reduces the chances that they will be linked to potentially exploitative conditions, such as low wages and exposure to traumatic content.

A Scale AI spokesperson directed TIME to a blog post that says that Remotasks was established as separate to Scale AI to protect customer confidentiality and cites examples of steps Scale AI has taken to ensure workers are treated fairly. The spokesperson also said that “Remotasks does not engage in projects that require exposure to sensitive images / videos, and in the event such content appears in a dataset it can be reported and removed from the workstream.”

Surge AI, Taskup.ai, DataAnnotation.tech, and Gethybrid.io did not respond to a request for comment in time for publication.

Data work is fundamentally undervalued, argues Jindal, suggesting that data workers could be paid royalties on the products that they help create. 

“Their knowledge and information is being captured in data and used to train these AI models that are called artificial intelligence,” she says. “It’s actually their human intelligence—our collective human intelligence—that’s being embedded in these models.”