CommonCrawl is a free, publicly accessible "crawl" of the web - that is, an archive of web content that has been downloaded and saved for future analysis. As of late 2011, their archive is 40TB in size, encompassing over 5 billion pages and documents. In this project, you will work in groups of 1-2 people to process a subset of the CommonCrawl dataset using MapReduce. But, what portion of data you process, and how you process it, is up to you!
- 10% - Project Idea document - Grading Rubric
- Due: Monday, Jan 30th by 11:55pm
- 30% - Project Proposal - Grading Rubric
- Due: Monday, Feb 13th by 11:55pm
- 60% - Final Report, Source code, and In-class Presentation - Grading Rubric
- Due: Friday, Mar 2nd by 11:55pm
Writing applications in a scalable fashion is a central challenge to cloud computing. In this project, you will work in groups of 1-2 people to write an application for Amazon's cloud platform that takes advantage of their monitoring and management features.
- 10% - Project Idea Document - Grading Rubric
- Due: Monday, Mar 26th by 11:55pm
- 15% - Initial Project Demonstration and Project Meeting - Grading Rubric
- Due: Monday, Apr 9th (in class)
- 60% - Final Project Demonstration - Grading Rubric
- Due: Monday, Apr 23rd (in class)
- 15% - Final Report and Source code - Grading Rubric
- Due: Wed, Apr 25th by 11:55pm