Lab 8: Network Socket Programming (Basic)
In this lab, you will use the Python programming language to implement a simple HTTP client that can download files from a webserver.
There are a number of official and un-official references for Python that you may find useful. Be warned, however, that Python syntax and libraries have changed between version 2.x and 3.x. Thus, for this lab, be sure any reference you use is for Python 3.x.
- Official - Python documentation (for Python 3.x)
- Official - Python Standard Library documentation (for Python 3.x)
- Official - Python sockets documentation (for Python 3.x)
Lab - Getting Started
To begin this lab, start by obtaining the necessary boilerplate code. Enter the class repository:
unix> cd ~/bitbucket/2016_spring_ecpe170_boilerplate/
Pull the latest version of the repository, and update your local copy of it:
unix> hg pull
unix> hg update
Copy the files you want from the class repository to your private repository:
unix> cp ~/bitbucket/2016_spring_ecpe170_boilerplate/lab08/* ~/bitbucket/2016_spring_ecpe170/lab08
Enter your private repository now, specifically the lab08 folder:
unix> cd ~/bitbucket/2016_spring_ecpe170/lab08
Add these files to version control in your private repository:
unix> hg add display.py
Commit the new files in your personal repository, so you can easily go back to the original starter code if necessary
unix> hg commit -m "Starting Lab 8 with boilerplate code"
Push the new commit to the bitbucket.org website
unix> hg push
Install the latest (3.4+) version of Python. Typically, Ubuntu distributions only come with the older (but still widely used) 2.x Python. We don't care about backwards compatibility, so full speed ahead!
unix> sudo apt-get install python3
Verify that Python is working. Ask it to print the version number, which will be useful in debugging errors/quirks later: Note: This is a CAPITAL letter V!
unix> python3 -V
The output should be similar to: Python 3.4.x
Python IDE: PyCharm
If you want an friendly IDE to write Python code in, complete with handy features such as code completion and auto-indenting, install the free Community Edition of PyCharm.
Installation Procedure: (alas, PyCharm is not included in the standard package library)
- Insure that Java7 JRE is installed on your system: sudo apt-get install openjdk-7-jre
- Download the Community Edition for Linux (select the corresponding tab) from http://www.jetbrains.com/pycharm/download/
- Extract the .tar.gz archive file that was downloaded
- On the command line, navigate to the extracted archive folder and enter the /bin subdirectory inside
- Run the installer script at the command line: ./pycharm.sh
- After the installer completes, you will be able to launch PyCharm from the Ubuntu "start" menu like programs that were installed via the package manager
- By default, PyCharm uses Python 2.7. Switch the default interpreter to use Python 3.4
- Settings -> Default Project -> Project Interpreter, then change the Project Interpreter dropdown to Python 3.4
Note: I will still run your Python programs at the command line, not through the IDE. So, don't embed any custom settings in the IDE project properties.
Lab Part 1 - Demo Client and Server
Mark the files as "executable" so you can run it as a program. (The + symbol means 'add flag', and 'x' represents the executable flag)
unix> chmod +x client.py
unix> chmod +x server.py
Now, in two separate terminal windows, execute the demonstration program:
Terminal Windows 1:
(Run the server program and tell the server to listen on port 5678)
unix> ./server.py 5678
Terminal Window 2:
(Run the client program, and tell the client to connect to the server located at IP 127.0.0.1 and listening on port 5678. That IP is equivalent to 'localhost', i.e. your computer)
unix> ./client.py 127.0.0.1 5678
(1) What is first line of the python script that starts with #! doing? Where in ECPE 170 have you seen this before?
Lab Part 2 - HTTP Basic Skills
Before writing a program that communicates with an HTTP server, you are going to manually test your knowledge of HTTP. The netcat client program allows you to open a TCP socket to a port and send ASCII characters. It will print both the characters that you send and the characters that the server sends.
To invoke the netcat client to connect to www.google.com on port 80:
unix> netcat -C www.google.com 80
The -C argument specifies that when you hit the enter key on the keyboard, netcat will send the \r\n (carriage return, line feed) sequence of two characters, which is required for the HTTP protocol.
Once the connection to the web server is open, you can send an HTTP request. Here is an example HTTP request to download the file at http://www.google.com/about/
GET /about/ HTTP/1.1
<<SERVER RESPONSE STARTS HERE>>
Note that the HTTP client (in this case, you!) must send an extra blank line after the last request line. This trigger tells the web server to begin processing the request. (Technically, the web server is looking for a \r\n\r\n sequence of characters). After the request is sent, the reply should immediately follow on the same connection.
(2) Document the HTTP request and the server response when you manually download the HTML file at http://ecs-network.serv.pacific.edu/ecpe-170/lab/ via Netcat.
(By "document", you should provide the full client request and a partial server response (top 40-50 lines is sufficient for me to tell if you downloaded the right file). The script utility can make this capture easy for you - see below.)
(3) Document the HTTP request and the server response when you manually download the HTML file at http://www.yahoo.com/ via Netcat
(4) Document the HTTP request and the server response when you manually download the PNG image file at http://www.google.com/images/logos/google_logo_41.png via Netcat
Note: Is there a good reason why it doesn't make sense to include the server response (at least, the data portion) in your lab report? On a related note, if your Terminal window hangs during this step, at least you'll know why!
Requirements for your HTTP request:
- Use the HTTP 1.1 protocol
- Specify the Host field, which is the domain name of the server that should answer your request. (In HTTP/1.1, there could be multiple servers -- for example, gmail.google.com and www.google.com -- listening on the same IP address).
- Specify that the web server close the socket connection immediately after sending the requested file. (This allows for a more simple client implementation.)
Tip 1: Tired of important text scrolling off the top of your terminal window? Adjust the "scrollback" option. Go to Edit->Profile Preferences->Scrolling and set the scrollback to "Unlimited" (via the check box) or at least set it to a large fixed number. (2048 lines? 4096 lines?)
Tip 2: Want to use the script utility to make documentation easy? The following command will tell script run the command "netcat -C www.google.com 80" interactively, save all keyboard input and program output to the file connection_log_google.txt, and stop saving when the netcat program exits.
unix> script -c "netcat -C www.google.com 80" connection_log_google.txt
Lab Part 3 - HTTP Download Client
Write a Python3 program called display.py to retrieve files from a web server via HTTP. Although this program could retrieve files of any type, we will use it solely to retrieve image files. and then display them after they have been downloaded.
- Your program must download image files from a web server via HTTP using the Python3 programming language and the socket module.
- Warning: Python has a built-in HTTP client module and URL request module. You *cannot* use these (or anything similar) for this lab, however, and zero points will be awarded if you do! The reason for not using this module is because it hides how the HTTP protocol works, and the purpose of this lab is to actually learn about the protocol operation. Instead, you must use the lower-level socket module.
- The HTTP Request sent to the server must be printed out on the screen
- The HTTP Response received from the server must be printed out on the screen (just the header, not the data)
- The entire response from the server must be downloaded in increments of max_recv bytes per system call to recv(). (This is a variable defined in the boilerplate code, and is a reasonable value such as 64kB).
- Downloaded files must be saved to local disk with the same file name as on the server.
- Downloaded files must be displayed on screen by invoking the eog image viewer utility (see code at the end of the boilerplate file to do this)
- Your program will take 2 command-line arguments: the web server port number, and the URL of the image to display, e.g.:
unix> ./display.py --port=80 --url=http://www.google.com/images/logos/google_logo_41.png
Tip: There is boilerplate code provided for this exercise, and step-by-step instructions contained within.
Your program will be tested with the following image URLs:
(Optional) Lab Report:
(1) How would you suggest improving this lab in future semesters?