Getting Started with MTurk for Data Cleaning and Preparation

2 min readJun 19, 2017

Amazon Mechanical Turk (MTurk) is great at helping with data-wrangling to prepare data sets for analysis. MTurk Workers can help normalize, categorize, de-duplicate, and enrich your data sets to help you get from data to insights faster. Here are some helpful tutorials and resources that describe how MTurk can be used to help clean and prepare data for analysis:

Tutorial: Categorizing Names With The Requester Website

This tutorial is a great starting point if you’re looking to categorize or enrich your data set with metadata to help organize or analyze it. The tutorial steps through an example where we seek to classify passenger names from the Titanic based on their port of departure. The tutorial steps through creating the task, gathering results, and paying Workers.

Tutorial: Categorizing Names With The Requester Website

Using the Amazon Mechanical Turk (MTurk) Requester website, Requesters can create Projects containing Human…

blog.mturk.com

Tutorial: How to verify crowdsourced training data using a Known Answer Review Policy

This tutorial describes how to use MTurk to categorize images. It also explains how to use a crowdsourcing quality-control mechanism known as “Known Answers” or “Golden Answers” to get high quality results. It’s written in the context of gathering data to train an algorithm, but can apply equally well to preparing and cleaning data in general. Check out the complete tutorial here.

Tutorial: How to verify crowdsourced training data using a Known Answer Review Policy

“Known Answers” or “Golden Answers” are a common mechanism used by customers to track the quality of crowdsourced tasks…

blog.mturk.com

Case Study: How Redfin used MTurk for Data Cleanup

In a blog post from Seattle-based real estate startup Redfin, the company describes how they used MTurk to clean up data where automated means were insufficient. They step you through the process, offer tips and tricks, and explain how it helped them build up over 3,000 cleaned database records. Read the full blog post here on the Redfin Engineering blog.

Getting Started with MTurk for Data Cleaning and Preparation

Tutorial: Categorizing Names With The Requester Website

Tutorial: Categorizing Names With The Requester Website

Using the Amazon Mechanical Turk (MTurk) Requester website, Requesters can create Projects containing Human…

Tutorial: How to verify crowdsourced training data using a Known Answer Review Policy

Tutorial: How to verify crowdsourced training data using a Known Answer Review Policy

“Known Answers” or “Golden Answers” are a common mechanism used by customers to track the quality of crowdsourced tasks…

Case Study: How Redfin used MTurk for Data Cleanup

Written by Amazon Mechanical Turk