Getting Started with MTurk for Data Cleaning and Preparation

Amazon Mechanical Turk
2 min readJun 19, 2017

Amazon Mechanical Turk (MTurk) is great at helping with data-wrangling to prepare data sets for analysis. MTurk Workers can help normalize, categorize, de-duplicate, and enrich your data sets to help you get from data to insights faster. Here are some helpful tutorials and resources that describe how MTurk can be used to help clean and prepare data for analysis:

Tutorial: Categorizing Names With The Requester Website

This tutorial is a great starting point if you’re looking to categorize or enrich your data set with metadata to help organize or analyze it. The tutorial steps through an example where we seek to classify passenger names from the Titanic based on their port of departure. The tutorial steps through creating the task, gathering results, and paying Workers.

Tutorial: How to verify crowdsourced training data using a Known Answer Review Policy

This tutorial describes how to use MTurk to categorize images. It also explains how to use a crowdsourcing quality-control mechanism known as “Known Answers” or “Golden Answers” to get high quality results. It’s written in the context of gathering data to train an algorithm, but can apply equally well to preparing and cleaning data in general. Check out the complete tutorial here.

Case Study: How Redfin used MTurk for Data Cleanup

In a blog post from Seattle-based real estate startup Redfin, the company describes how they used MTurk to clean up data where automated means were insufficient. They step you through the process, offer tips and tricks, and explain how it helped them build up over 3,000 cleaned database records. Read the full blog post here on the Redfin Engineering blog.


