Lab 9: Network Socket Programming (Intermediate)

Lab

Overview

In this lab, you will use the Python programming language to implement a simplified DNS client that can send requests for basic IPv4 and IPv6 addresses.

DNS (the Domain Name System) is a hierarchical, distributed database that stores information associated with domain names, e.g. myhost.com. The most common piece of information is the IPv4 or IPv6 address(es) associated with that domain name. However, DNS can also store a wide variety of other domain-related information, such as the associated domain mail servers, name servers, domain name aliases, reverse DNS lookups (obtaining a DNS name from an IP address), anti-spam records such as Sender Policy Framework, and many other record types.

Lab - Getting Started

To begin this lab, start by obtaining the necessary boilerplate code. Enter the class repository:

unix>  cd ~/bitbucket/2018_fall_ecpe170_boilerplate/

Pull the latest version of the repository, and update your local copy of it:

unix>  hg pull
unix>  hg update

Copy the files you want from the class repository to your private repository:

unix>  cp ~/bitbucket/2018_fall_ecpe170_boilerplate/lab09/* ~/bitbucket/2018_fall_ecpe170/lab09/

Enter your private repository now, specifically the lab09 folder:

unix>  cd ~/bitbucket/2018_fall_ecpe170/lab09

Add these files to version control in your private repository:

unix>  hg add dns.py dns_tools.py

Commit the new files in your personal repository, so you can easily go back to the original starter code if necessary

unix>  hg commit -m "Starting Lab 9 with boilerplate code"

Push the new commit to the bitbucket.org website

unix>  hg push

Lab Part 1 - Wireshark Tracing

Wireshark is a powerful network packet capture and analysis tool. You are going to use Wireshark to capture network packets, filter to select only DNS packets, and then inspect them.

Install Wireshark:

unix> sudo apt-get install wireshark
# Select YES when prompted if non-superusers should be allowed to capture packets

Configure Wireshark so non-root users can capture packets (by adding to Wireshark group):

unix> sudo dpkg-reconfigure wireshark-common
unix> sudo usermod -a -G wireshark <YOUR-LINUX-USERNAME>

Logout (or restart Linux) for this change to take effect.

Run Wireshark:

unix> wireshark &

(If no interfaces are visible after running Wireshark, re-run it with root privileges via 'sudo')

In the Wireshark GUI, select whatever 'ensXX' interface is correct for your specific computer and click 'Start' to begin packet capture. (For a description of why network interfaces are named like this, see this detailed discussion on Predictable Network Interface Names)

Back at the command line, use the dig utility to send a DNS query to Google's public DNS service at 8.8.8.8 for the IPv4 address of www.pacific.edu, and the IPv6 address of www.google.com. 32-bit IPv4 addresses are referred to as "A records", and 128-bit IPv6 addresses are referred to as "AAAA records". Note that the +noedns option is used to get dig to operate in a legacy mode that produces identical output to our Python client in Part 2 of the lab.

unix>  dig www.pacific.edu A @8.8.8.8 +noedns
#   Should produce the following IPv4 result:
#   ;; ANSWER SECTION:
#   www.pacific.edu. 59 IN A 138.9.110.12

unix>  dig www.google.com AAAA @8.8.8.8 +noedns
#   Should produce the following IPv6 result:  
#   ;; ANSWER SECTION:
#   www.google.com. 299 IN AAAA 2607:f8b0:4005:802::1013

After running the dig commands, go back to Wireshark and press the RED square button to stop packet capture.

Did any packets appear in Wireshark? If not, you must be capturing from the wrong interface. Pick something other than 'ensXX' (the default choice) and try again.

Assuming you captured some packets, locate the filter panel in Wireshark. Enter 'dns' in the field and press <enter> to filter the packets so that only DNS messages (requests/replies are shown).

Mark (by right-clicking on a packet and choosing the desired option) only the query (request) and query response (reply) packets that correspond to the DNS lookup for www.pacific.edu and www.google.com that were done above. Depending on how long you captured packets for, and how busy your computer system was, you may have also captured DNS requests/replies for other hosts, but we are not interested in those.

Submission:
(1) Using the File->Export Specified Packets option, save only the marked packets to a file called dns.pcapng. The file format should be PCAPNG, which is Wireshark's native format. Submit this file with your lab.

Note: Wireshark does a very good job of decoding the DNS packet and showing the contents of each field. By clicking on a decoded lined item, you can see the corresponding raw byte values in the panel at bottom. You should refer back to this Wireshark export file for comparison purposes during Part 2 of the lab, where you are writing your own DNS client instead of using the pre-written dig program.

Lab Part 2 - DNS Query Client

Write a Python3 program called dns.py to send DNS queries to a server and parse the replies. Note that there is substantial boilerplate code provided that already parses DNS replies, so you need to focus on simply sending a correctly-formatted DNS query.

NOTE: Carefully study WireShark query packet to see how the query is formed. Remember that you need to send the query packets as bytes. Also study carefully how the response packet is formed. Use the decode function from dns_tools.py to decode the received response bytes.

Program requirements:

Your program must send DNS queries over UDP to a DNS server using the Python3 programming language and the socket module.

Warning: Python has a built-in implementation of getaddrinfo() which will do a basic DNS lookup if all you need is an IP address. Also, a variety of external tools also exist for more sophisticated DNS lookups, such as DNSPython, DNSimple, and dnslib. Unfortunately, you *cannot* use these libraries (or anything similar) for this lab, and zero points will be awarded if you do! The reason for not using these libraries is because they hide how the DNS protocol works, and the purpose of this lab is to actually learn about the protocol operation. Instead, you must use the lower-level socket module.

Your program must send requests for either IPv4 ("A) or IPv6 ("AAAA") addresses
Your program must take 3 command-line arguments:

The type of of address requested (denoted with the --type flag), which can have the value 'A' or 'AAAA'
The host name being queried (denoted with the --name flag)
The IP address of the DNS server to query (denoted with the --server flag)

Your DNS queries should follow the proper endianness standard for a network protocol. (With incorrect endianness, you will not get a valid reply from the server)

Example program invocations and output querying against Google's public DNS service at 8.8.8.8:

unix>  ./dns.py --type=A --name=www.pacific.edu --server=8.8.8.8
Sending request for www.pacific.edu, type A, to server 8.8.8.8, port 53
Server Response
---------------
Message ID: 49821
Response code: No error
Counts: Query 1, Answer 1, Authority 0, Additional 0
Question 1:
Name: www.pacific.edu
Type: A
Class: IN
Answer 1:
Name: 0xc00c
Type: A, Class: IN, TTL: 59
RDLength: 4 bytes
Addr: 138.9.110.12 (IPv4)


unix>  ./dns.py --type=AAAA --name=www.google.com --server=8.8.8.8
Sending request for www.google.com, type AAAA, to server 8.8.8.8, port 53
Server Response
---------------
Message ID: 64164
Response code: No error
Counts: Query 1, Answer 1, Authority 0, Additional 0
Question 1:
Name: www.google.com
Type: AAAA
Class: IN
Answer 1:
Name: 0xc00c
Type: AAAA, Class: IN, TTL: 299
RDLength: 16 bytes
Addr: 2607:f8b0:4005:802::1012 (IPv6)

Your program will be tested with the following domains, and perhaps others, as desired:

www.pacific.edu (IPv4 and IPv6)
www.google.com (IPv4 and IPv6)
cs.stanford.edu (IPv4)
bogus.stanford.edu (IPv4)

(Optional) Lab Report:
(1) How would you suggest improving this lab in future semesters?

Reference Information

The information given below is quite elaborate and for reference only. You should rely on your Wireshark packet study to formulate the query packets and analyze the response packets.

DNS Protocol - Overview

DNS messages are carried in UDP (User Datagram Protocol) packets on port 53. Although DNS is an unreliable protocol, a DNS client can implement a simple timeout/retransmit mechanism to query a server again in the case of packet lost. DNS queries consist of a single UDP request from the client followed by a single UDP reply from the server. (DNS is also used over the TCP protocol in the case of larger requests/replies, but you do not need to implement this mode of operation in this lab.)

DNS messages follow a format with 5 sections:

Message Header (required)
Question Section (optional, but required in your request)
Answer Section (optional - the server will create this in its response to your query)
Authority Section (ignored for this lab)
Additional Section (ignored for this lab)

DNS Protocol - Message Header Section

The Message Header section must be present in all messages. It has the following format:

0	1	5	6	7	8	9	12
MessageID
QR	OPCODE	AA	TC	RD	RA	Reserved	RCODE
QDCount (# of items in Question Section)
ANCount (# of items in Answer Section)
NSCount (# of items in Authority Section)
ARCount (# of items in Authority Section)

The Message Header fields are explained as follows:

Field	Size	Explanation
MessageID	2 bytes	Random 16-bit unsigned integer generated by requester. Responder copies number back in reply to identify transaction.
QR	1 bit	Query response bit. 0 = Query, 1 = Reply
OPCODE	4 bits	Identifies request/operation type. Value of 0 represents standard query.
AA	1 bit	Authoritative answer - valid in responses only. Value of 1 represents authoritative response.
TC	1 bit	Truncation. Bit is set if message is too large to fit in standard UDP packet
RD	1 bit	Recursion desired. 0 = Recursion not desired, 1 = Recursion desired.
RA	1 bit	Recursion available - valid in responses only. 0 = Recursion not available, 1 = Recursion available.
Reserved	3 bits	Unused / reserved bits - set to zero
RCODE	4 bits	Identifies response type to the query. Valid in responses only. Values are: 0 - No error 1 - Format error 2 - Server failure 3 - Name error 4 - Not implemented 5 - Refused
QDCount	2 bytes	Unsigned integer - Number of entries in question section (Typically, there is only 1 question per message, but the standard supports multiple questions)
ANCount	2 bytes	Unsigned integer - Number of entries in answer section
NSCount	2 bytes	Unsigned integer - Number of entries in authority section
ARCount	2 bytes	Unsigned integer - Number of entries in additional section

DNS Protocol - Question Section

A DNS message can have at least one question section.

The Question Section fields are as follows:

Field	Size	Explanation
QName	Variable	The domain name being queried. See extended description below.
QType	2 bytes	Unsigned integer - The Resource Record being requested. Common values include: 1: A record (IPv4) 28: AAAA record (IPv6) (and many other record types not required for this lab...)
QClass	2 bytes	Unsigned integer - The Resource Record class being requested. A value of 1 means 'Internet'.

The QName field is best explained with an example. Take the domain name "www.mydomain.com". Split that domain into its three component parts:

www (length of 3)
mydomain (length of 8)
com (length of 3)

Then, encode each part separately, with the leading byte representing the length of the following letters:

0x03 0x77 0x77 0x77 (for "length of 3" followed by "www")
0x08 0x6D 0x79 0x64 0x6F 0x6D 0x61 0x69 0x6E (for "length of 8" followed my "mydomain")
0x03 0x63 0x6F 0x6D (for "length of 3" followed by "com")

Finally, add a closing byte of 0x00 to signify the end of the domain string.

DNS Protocol - Answer Section ("Resource Records")

A DNS message can have at least one answer section containing a resource record, which is information about a specific domain. Each resource record has the following fields:

Field	Size	Explanation
Name	2 bytes	The domain name being queried. Unlike in the query, however, this field is a pointer, and contains the offset in bytes from the start of the DNS message to the domain name provided in the question section. (Depending on DNS servers, this field may also be variable length and provide the full domain name here).
Type	2 bytes	Unsigned integer - The Resource Record type being requested. Common values include: 1: A record (IPv4) 28: AAAA record (IPv6) (and many other record types not required for this lab...)
Class	2 bytes	Unsigned integer - The class being requested. A value of 1 means 'Internet'.
TTYL	4 bytes	Unsigned integer - The time in seconds that a record may be cached. (TTL = "Time to Live")
RLength	2 bytes	Unsigned integer - Length of the resource-record specific data in bytes
RData	Variable	Resource-record specific data (of length specified by RLength). An IPv4 address is a 4-byte field. An IPv6 address is a 16-byte field.

DNS Protocol - Authority Section

This section is ignored for this lab. You do not need to generate this section in your requests or parse this section in replies.

DNS Protocol - Additional Section

This section is ignored for this lab. You do not need to generate this section in your requests or parse this section in replies.

Resources

"DNS For Rocket Scientists": http://www.zytrax.com/books/dns/ (Specifically, chapter 15 on the DNS protocol format)
RFC 1034 and RFC 1035