Project 2 - Cloud Applications
Cloud computing vendors provide extensive storage and data management APIs to application authors. In this project, you will work in groups of 1-2 people to write a client application for Amazon's cloud platform that takes advantage of their distributed and redundant computing infrastructure.
First, pick an application that you would like to write. All of these applications are similar in one respect: they are client programs that run on your local computer (Windows/Mac/Linux), but they require "cloud services" in order to function. You should be able to create any of these programs by writing only the client component, and using the cloud storage/database services provided by Amazon.
- Recursively synchronize a directory of files between networked computers. A file modified on one computer should be transferred to all other computers owned by the same user.
- Maintain a backup of all files in "the cloud". Allow the user to manually restore the complete directory (if missing, or on a new computer), or to restore an earlier version of a specific file.
- Gracefully handle offline machines. If a computer has been offline and reconnects, it should obtain the latest files from "the cloud".
- Once a new/modified file has been uploaded by the original computer into "the cloud", the original computer does not need to remain online for other connected machines to synchronize.
"iMessage" / "Skype chat" clone
- Send chat messages between networked computers. Chat messages can include pictures.
- Maintain a "friend list" of users available to chat. Monitor their status (online/offline is sufficient).
- Maintain a list of messages in "the cloud", such that if client A sends a message to client B, and client B is offline, then client B will get the message upon connecting later in time.
- Chat history: If client A logs onto a new device, client A should see the past n messages in chat history, even if those were not sent from this device. 30 is a reasonable number for n.
- Listen to MP3-format music from an online service with a single global catalog.
- Allow users to search the global catalog by artist name, album name, or track name.
- Maintain per-user playlists of "starred" (favorite) songs and synchronize them between all clients using the same user account.
- Share playlists to other uses and keep the playlist current when updated by original creator.
- Allow a single user to access a shared collection of "documents" (text, images) from multiple devices.
- Content added or updated to one device is added/updated to all synchronized devices.
- Client should act as a "cache" with a finite storage capacity. Frequently-accessed documents should be stored locally, and other documents should be retrieved from cloud service only accessed by user. (In contrast to the "Dropbox" clone above, where all files must be synchronized between devices.)
- I'm open to other ideas along the lines of the four listed above. I'm also open to web-based applications instead of desktop applications, if you have prior experience in that area. Talk to me first...
All applications must also meet the following requirements:
- Use usernames/passwords to authenticate users. New accounts should be created through the client application.
- Use Amazon S3 for file storage
- Use Amazon DynamoDB or SimpleDB for database storage
- Use both text and binary data types
Note about project ideas: the focus of this project is not creating polished native clients for Mac/Windows/Linux, etc... You only need a functional user interface that allows you to demonstrate application operation. Any additional UI glitz is completely optional. Some applications, like the Dropbox clone, don't even need a GUI to be functional.
Second, pick a programming language that you feel comfortable with. This list of approved languages is based on the official Amazon software development kits (SDKs). Other languages are theoretically possibly, but you'll have to write support for Amazon cloud services from scratch or find an unofficial SDK elsewhere online. Note that Java and .NET both provide plugins for your Eclipse or Visual Studio IDE that may simplify development. Other languages will need to use the AWS website to access similar functionality.
- Java (use Eclipse IDE + fancy AWS toolkit)
- C# / .NET (use Visual Studio IDE + fancy AWS toolkit)
Third, build your development environment!
Java language: Follow these instructions
C# / .NET language: Follow these instructions
Other languages (Python, PHP, Ruby, Node.js):
- Beats me, you're the expert! How do you usually code in these systems? I'd start by installing whatever is normally needed to execute code in this environment on your computer. And then I would check with the Amazon SDK documentation to see what is needed to install their toolkit...
The fundamental design of these projects (doing all the work in a desktop client) is **insecure**. Think about it this way: Your client app must contain your AWS access ID and secret key in order to function. An attacker could easily extract that information from the program binary and rig up a new hostile client program, and destroy/steal your data while posing as you. No sane person should write a client program that contains your privileged Amazon credentials, and then distribute that program to untrusted users. Instead, the client program should communicate with a custom server program (which can run on Amazon EC2) using a limited protocol. The server program should verify that the client's request is legitimate, and then use the AWS access ID / key to manipulate the data on the client's behalf. Unfortunately, we don't have enough time in this class to write both a client component and server component. Thus, as you write the client, just keep this thought in mind: "In a real system, all of this code that uses the Amazon SDK to manipulate data would be placed into a separate server application."
Part 1 - Project Idea
Create an idea document that provides an overview of the application that you will write from both a user interface and programming perspective. Each page of this document should focus on a separate user action, and contain the following items:
- Description of the user action - What is the user doing right now in your application? (Double-click on a new track on "Spotify", modify a file on disk and have "Dropbox" upload it, receive a new "iMessage" text from a friend, etc...)
- A design mockup *picture* - What does the user see on the screen during this event? Provide fake data to make the mockup look realistic. You don't have to actually write functional code to create this. You could draw it in whatever design program you prefer.
- Description of the data operations - What is being uploaded to / downloaded from the cloud during this action? What data to you expect to store in Amazon? Are you doing some sort of search / database query in order to perform this action? What do you expect the Amazon system to do? Be detailed! This will be converted to code next...
This idea document should cover at least 6 user actions, starting with the actions that are most significant and unique to your application. (To put it another way, the action of creating the username and password is pretty dull and obvious, and is not unique to your application)
Upload the document in PDF format to Sakai.
Part 2 - Project Implementation
In this stage of the project, you will make the mockup functional. :-)
There are two deliverables for this part:
- A checkpoint in-class demonstration halfway through the time allotted for part 2 (see timeline below). At this checkpoint, you should demonstrate your work to date, as well as discuss any problems you have overcome and problems that are currently unresolved.
- Time for demonstration: ~6 minutes, depending on the number of groups.
- A final in-class demonstration at the end of the time allotted for part 2. Here, you should show off your finished application, and explain how it works behind the scenes.
- Time for demonstration: ~15 minutes, depending on the number of groups.
Part 3 - Project Reporting
There are three deliverables for this part:
- Full source code to all components
- Installation and execution instructions - what steps would a classmate need to take to compile and run your application? Include any Amazon environment setup that you needed to do in advance of the first client application being run.
- A report documenting your completed project. This 2-3 page report should contain the following sections:
- Introduction - what does your application do?
- Algorithm details - how does your application transfer, store, and process data? What does the client do locally?
- Infrastructure used - what AWS services does your application rely on, and what do these services do?
- Fault tolerance - what failures will your application tolerate, and how does it do so? What failures will your application not currently tolerate? What failures (according to the marketing literature) will the AWS services you application relies on tolerate? If you had an additional 6 weeks to focus purely on fault tolerance, how would you modify your application to tolerate these additional failure modes?
- Final thoughts and feedback - What was the easiest and hardest part of the project? What suggestions would you have if this project is repeated for future students?
- References - What sources did you use in building your project? Provide links to public source code, tutorials, discussion forums, mailing lists, etc..
Submission instructions: Upload the source code (in a compressed tarball or zip file, please), installation instructions (in PDF format), and the final report (in PDF format) to Sakai.
|Due Date||Working On||Deliverables at End|
|Tue, Mar 25th by 11:55pm||
Part 1: Brainstorming for idea and developing mockup
|6 page document with mockups and data descripions|
|Tue, Apr 8th (in class)||Part 2: Checkpoint and initial demo||In-class demo (of work to date)|
|Tue, Apr 29th (in class)||Part 2: Implementation complete!||In-class demo (of final system)|
|Wed, Apr 30th by 11:55pm||Part 3: Report writeup and any final polishing
||Source code and final report|
- 10% - Project Idea Document and Mockup [Grading Rubric]
- 15% - Checkpoint / Initial Demonstation [Grading Rubric]
- 60% - Final Demonstation [Grading Rubric]
- 15% - Final Report and Source code [Grading Rubric]
Q: What is the difference between a NoSQL database (Amazon DynamoDB, ...) and a relational SQL database (Oracle, SQL Server, MySQL, PostgreSQL, ...) ?
Today's web-based applications generate and consume massive amounts of data. For example, an online game might start out with only a few thousand users and a light database workload consisting of 10 writes per second and 50 reads per second. However, if the game becomes successful, it may rapidly grow to millions of users and generate tens (or even hundreds) of thousands of writes and reads per second. It may also create terabytes or more of data per day. Developing your applications against Amazon DynamoDB enables you to start small and simply dial-up your request capacity for a table as your requirements scale, without incurring downtime. You pay highly cost-efficient rates for the request capacity you provision, and let Amazon DynamoDB do the work over partitioning your data and traffic over sufficient server capacity to meet your needs. Amazon DynamoDB does the database management and administration, and you simply store and request your data. Automatic replication and failover provides built-in fault tolerance, high availability, and data durability. Amazon DynamoDB gives you the peace of mind that your database is fully managed and can grow with your application requirements.
While Amazon DynamoDB tackles the core problems of database scalability, management, performance, and reliability, it does not have all the functionality of a relational database. It does not support complex relational queries (e.g. joins) or complex transactions. If your workload requires this functionality, or you are looking for compatibility with an existing relational engine, you may wish to run a relational engine on Amazon RDS or Amazon EC2. While relational database engines provide robust features and functionality, scaling a workload beyond a single relational database instance is highly complex and requires significant time and expertise. As such, if you anticipate scaling requirements for your new application and do not need relational features, Amazon DynamoDB may be the best choice for you. (Answer courtesy of Amazon DynamoDB FAQ)
- NoSQL databases explained (from the perspective of MongoDB, a NoSQL database vendor)
- What are the differences between NoSQL and a traditional RDBMS? (discussion thread on StackExchange)
Q: How does Amazon DynamoDB differ from Amazon SimpleDB? Which should I use?
Both services are non-relational databases that remove the work of database administration. Amazon DynamoDB focuses on providing seamless scalability and fast, predictable performance. It runs on solid state disks (SSDs) for low-latency response times, and there are no limits on the request capacity or storage size for a given table. This is because Amazon DynamoDB automatically partitions your data and workload over a sufficient number of servers to meet the scale requirements you provide. In contrast, a table in Amazon SimpleDB has a strict storage limitation of 10 GB and is limited in the request capacity it can achieve (typically under 25 writes/second); it is up to you to manage the partitioning and re-partitioning of your data over additional SimpleDB tables if you need additional scale. While SimpleDB has scaling limitations, it may be a good fit for smaller workloads that require query flexibility. Amazon SimpleDB automatically indexes all item attributes and thus supports greater query functionality at the cost of performance and scale. (Answer courtesy of Amazon SimpleDB FAQ)
Q: When should I use Amazon DynamoDB vs Amazon S3?
Amazon DynamoDB stores structured data, indexed by primary key, and allows low latency read and write access to items ranging from 1 byte up to 64KB. Amazon S3 stores unstructured blobs and suited for storing large objects up to 5 TB. In order to optimize your costs across AWS services, large objects or infrequently accessed data sets should be stored in Amazon S3, while smaller data elements or file pointers (possibly to Amazon S3 objects) are best saved in Amazon DynamoDB. (Answer courtesy of Amazon DynamoDB FAQ)
- Amazon Services
- Java / Eclipse
- AWS SDK for Java Documentation
- AWS SDK for Java API Reference
- AWS SDK for Java Tips & Tricks
- AWS Java Development Forum
- .NET / Visual Studio