Lab 12: Network Socket Programming with Python
Overview
In this lab, you will use the Python programming language to implement a simple HTTP client that can download files from a webserver.
References
There are a number of official and un-official references for Python that you may find useful. Be warned, however, that Python syntax and libraries have changed between version 2.x and 3.x. Thus, for this lab, you should prefer newer Python 3.x references where possible.
- Official - Python documentation (for Python 3.x)
- Official - Python Standard Library documentation (for Python 3.x)
- Official - Python sockets documentation (for Python 3.x)
- TutorialsPoint Python Examples
- Differences between Python 2.x and 3.x
- Google University Python Tutorial (written for the older Python 2.7 syntax, but still useful)
- Python Sockets Examples (written for the older Python 2.7 syntax, but still useful)
Lab - Getting Started
To begin this lab, start by obtaining the necessary boilerplate code. Enter the class repository:
unix> cd ~/bitbucket/2013_spring_ecpe170_boilerplate/
Pull the latest version of the repository, and update your local copy of it:
unix> hg pull
unix> hg update
Copy the files you want from the class repository to your private repository:
(In this case, it is one file you want)
unix> cp ~/bitbucket/2013_spring_ecpe170_boilerplate/lab12/download.py ~/bitbucket/2013_spring_ecpe170/lab12/
Enter your private repository now, specifically the lab12 folder:
unix> cd ~/bitbucket/2013_spring_ecpe170/lab12
Add the new files to version control in your private repository:
unix> hg add download.py
Commit the new files in your personal repository, so you can easily go back to the original starter code if necessary
unix> hg commit -m "Starting Lab 12 with boilerplate code"
Push the new commit to the bitbucket.org website
unix> hg push
Lab Submission:
(1) All source code and lab report PDF must be submitted via Mercurial. Place the source files inside the lab12 folder that was previously created.
Lab Part 1 - Python Setup
Install the latest (3.2+) version of Python. Typically, Ubuntu distributions only come with the older (but still widely used) 2.x Python. We don't care about backwards compatibility, so full speed ahead!
Also, install the image viewer eog, used by the image download program later in the lab. (This viewer is present on standard Ubuntu distributions, but not all the variants).
unix> sudo apt-get install python3 eog
Verify that Python is working. Ask it to print the version number, which will be useful in debugging errors/quirks later: Note: This is a CAPITAL letter V!
unix> python3 -V
The output should be similar to: Python 3.2.3
Lab Part 2 - Hello World
Using your favorite text editor, create a file (hello.py) with the usual "Hello World" starter code:
#!/usr/bin/python3
print("Hello World")
Mark the file as "executable" so you can run it as a program:
unix> chmod +x hello.py
Now execute the Python program:
unix> ./hello.py
Check hello.py into version control when finished.
Lab Report:
(1) What is the line that starts with #! doing? Where in ECPE 170 have you seen this before?
Lab Part 3 - Python Basic Skills
Write a Python3 program called demo.py. This program should be invoked via:
unix> ./demo.py <word1> <word2>
Demonstrate your knowledge of fundamental Python skills by performing the following operations in the program:
- Determine how many arguments have been provided to the script on the command line. If there are two arguments (*not* including the program name itself), print them out one at a time. Otherwise, exit immediately.
- Concatenate the two string arguments together, save them to a new variable called onestring, and then print onestring.
- Using a for-loop, write a sequence of numbers from 1 to 10 in increments of 1 to a file on disk.
Note: This demo program can be very short! No need to get fancy - save that for the HTTP download client.
Check demo.py into version control when finished.
Lab Part 4 - HTTP Basic Skills
Before writing a program that communicates with an HTTP server, you are going to manually test your knowledge of HTTP. The netcat client program allows you to open a TCP socket to a port and send ASCII characters. It will print both the characters that you send and the characters that the server sends.
To invoke the netcat client to connect to www.google.com on port 80:
unix> netcat -C www.google.com 80
The -C argument specifies that when you hit the enter key on the keyboard, netcat will send the \r\n (carriage return, line feed) sequence of two characters, which is required for the HTTP protocol.
Once the connection to the web server is open, you can send an HTTP request. Here is an example HTTP request to download the file at http://www.google.com/about/
GET /about/ HTTP/1.1
Host: www.google.com
Connection: close
<<SERVER RESPONSE STARTS HERE>>
Note that the HTTP client (in this case, you!) must send an extra blank line after the last request line. This trigger tells the web server to begin processing the request. (Technically, the web server is looking for a \r\n\r\n sequence of characters). After the request is sent, the reply should immediately follow on the same connection.
Lab Report:
(2) Document the HTTP request and the server response when you manually download the HTML file at http://ecs-network.serv.pacific.edu/ecpe-170/lab/ via Netcat.
(By "document", you should provide the full client request and a partial server response (top 40-50 lines is sufficient for me to tell if you downloaded the right file). The script utility can make this capture easy for you - see below.)
(3) Document the HTTP request and the server response when you manually download the HTML file at http://www.yahoo.com/ via Netcat
(4) Document the HTTP request and the server response when you manually download the PNG image file at http://www.google.com/images/logos/google_logo_41.png via Netcat
Note: Is there a good reason why it doesn't make sense to include the server response (at least, the data portion) in your lab report? On a related note, if your Terminal window hangs during this step, at least you'll know why!
Requirements for your HTTP request:
- Use the HTTP 1.1 protocol
- Specify the Host field, which is the domain name of the server that should answer your request. (In HTTP/1.1, there could be multiple servers -- for example, gmail.google.com and www.google.com -- listening on the same IP address).
- Specify that the web server close the socket connection immediately after sending the requested file. (This allows for a more simple client implementation.)
Tip 1: Tired of important text scrolling off the top of your terminal window? Adjust the "scrollback" option. Go to Edit->Profile Preferences->Scrolling and set the scrollback to "Unlimited" (via the check box) or at least set it to a large fixed number. (2048 lines? 4096 lines?)
Tip 2: Want to use the script utility to make documentation easy? The following command will tell script run the command "netcat -C www.google.com 80" interactively, save all keyboard input and program output to the file connection_log_google.txt, and stop saving when the netcat program exits.
unix> script -c "netcat -C www.google.com 80" connection_log_google.txt
Lab Part 5 - HTTP Download Client
Write a Python3 program called download.py to retrieve files from a web server via HTTP. Although this program could retrieve files of any type, we will use it solely to retrieve image files, and then display them after they have been downloaded. Your program will take 1 argument, the full URL of the image to display, e.g.:
unix> ./download.py http://www.google.com/images/logos/google_logo_41.png
Python has a built-in HTTP client module. Be warned, however: You *cannot* use it for this lab, and zero points will be awarded if you do! The reason for not using this module is because it hides how the HTTP protocol works, and the purpose of this lab is to actually learn about the protocol operation. Instead, you must use the lower-level socket module.
Tip: There is substantial boilerplate code provided for this exercise, and further instructions and hints contained within.
Your program will be tested with the following image URLs:
- http://www.google.com/images/srpr/logo3w.png
- http://imgsrc.hubblesite.org/hu/db/images/hs-2006-01-a-800_wallpaper.jpg
- http://imgsrc.hubblesite.org/hu/db/images/hs-2010-13-a-2560x1024_wallpaper.jpg
- http://ut-images.s3.amazonaws.com/wp-content/uploads/2009/09/SED_wall_1920x1200.jpg
- http://history.nasa.gov/ap11ann/kippsphotos/69-H-1096.jpg
- http://www.nasa.gov/centers/dryden/images/content/690557main_SCA_Endeavour_over_Ventura.jpg
Lab Report - Wrapup:
(1) What was the best aspect of this lab?
(2) What was the worst aspect of this lab?
(3) How would you suggest improving this lab in future semesters?