Lab 9: Endianness

Overview

In this lab, you will learn to work with big and little endian data representations, which are important whenever data is being transferred between computer systems via the network or files. In addition, you will also measure the performance overhead of OS system calls like fread() (which reads data from disk), and learn to use these system calls in an efficient manner.

Lab - Getting Started

To begin this lab, start by obtaining the necessary boilerplate code. Enter the class repository:

unix>  cd ~/bitbucket/2013_spring_ecpe170_boilerplate/

Pull the latest version of the repository, and update your local copy of it:

unix>  hg pull
unix>  hg update

Copy the files you want from the class repository to your private repository:

unix>  cp -R ~/bitbucket/2013_spring_ecpe170_boilerplate/lab09/* ~/bitbucket/2013_spring_ecpe170/lab09

Enter your private repository now, specifically the lab09 folder:

unix>  cd ~/bitbucket/2013_spring_ecpe170/lab09

Add the new files to version control in your private repository:

unix>  hg add ../lab09

Commit the new files in your personal repository, so you can easily go back to the original starter code if necessary

unix>  hg commit -m "Starting Lab 9 with boilerplate code"

Push the new commit to the bitbucket.org website

unix>  hg push

You should see the following code:

Makefile that can compile both programs - a data writer, and a data reader
endian_writer.c / endian_writer.h - Data writer program
endian_reader.c / endian_reader.h - Data reader program
sample_data.tar.gz - Sample data for comparison

The data writer program writes a file containing records with bogus data. Each record in the data file has the following elements:

Size	Variable Data Type	Variable Name
1 byte	uint8_t	"my8"
1 byte	uint8_t	Empty space (so each record size is an even power of 2 bytes)
2 bytes	uint16_t	"my16"
4 bytes	uint32_t	"my32"
8 bytes	uint64_t	"my64"

Total size of each record: 16 bytes

The program starts with a record where every element is 1, and writes that record to disk. Then, it increments each element by 1, and writes that new record to disk. The process repeats (with elements wrapping back around to 0 when they overflow) for as many records as desired.

Compile the Endian Writer program. Note that this Makefile actually compiles two binaries, the writer program used now, and the reader program used later.

unix>  make

Use the program to write 1000 records:

unix>  ./endian_writer_program 1000

View the file in a hex editor, which displays each byte value. You can either use a GUI or command-line viewer, as shown below:

unix>  ghex data.dat &                 # GUI viewer option
unix>  hexdump -C data.dat | less      # Command-line viewer option

Lab Part 1 - Endian Independent Writer

For the first part of this lab, you will write data to disk in an endian-independent manner. Although the Endian Writer boilerplate program works, it is sloppy with regard to endianness. The endianness of the output file depends on the endianness of the computer the writer program is run on! (Hence, the file is not portable across computer systems). It is your job to fix it! After making the changes required below, I should be able to run your program on an x86 (little endian) or PowerPC (big endian) computer system and get exactly the same output.

Tasks:
(1) Augment the Endian Writer code to allow the program to detect whether it is run on a little endian or big endian machine. (See the function endian_identify())
(2) Augment the Endian Writer code to allow the program to always write the data file in big endian format regardless of whether it is run on a big endian or little endian machine!
(In practical terms, this means that if you run the program on a little endian x86 machine, you will need to byte swap the data before writing it to disk. If you run the program on a big endian machine, do not byte swap the data.)
(3) Check your code (including Makefile) into version control, and push to BitBucket.

Two references will be useful to you:

Intel Endianness White Paper - Useful background information and helpful code examples including byte swap macros. (Note that copying and pasting the code from the PDF may be a bit touchy depending on the PDF viewer. Also, note that the last macro should only be 5 lines, but the text has wrapped onto adjacent lines. For your convenience, I've copied just the macros out into a C header file.)
GNU Endian conversion functions - To use these functions, add a #define _BSD_SOURCE and include the file <endian.h> in your program. Function documentation is in the system man pages under man endian These functions are slightly more efficient than the generic Intel macros provided, because they contain processor-specific optimizations.

Sanity checking:

To test your results, the boilerpate code includes an example data file created with the command:

unix>  ./endian_writer_program 65535

This sample data file is in big endian format, and thus matches the output that your program should produce in Part 1.

You can compare this sample data file to your own by uncompressing it and using the diff command to find differences between the two files. (In typical Unix fashion, diff is *silent* if there are no differences to report. Silence is golden in this case...)

unix>  tar -xzf sample_data.tar.gz
unix>  diff sample_data_file.dat data.dat

If there *is* a difference, use the ghex editor to visually inspect the file and identify the problem

unix>  ghex sample_data_file.dat &
unix>  ghex data.dat &

Lab Part 2 - Endian Independent Reader

After writing data, it is now time to read data from disk in an endian-independent manner. As you might have guessed, the Endian Reader program reads a file containing records with bogus data. But once again, it is written in a sloppy manner with regards to endianness. It is your job to fix it, which should be very easy based on your work in part 1!

Tasks:
(1) Allow the Endian Reader program to detect whether it is run on a little endian or big endian machine.
(2) Allow the Endian Reader program to always read the big endian data file regardless of whether this program is run on a big endian or little endian machine!
(In practical terms, this means that if you run the program on a little endian x86 machine, you will need to byte swap the data after you read it from disk. If you run the program on a big endian machine, do not byte swap the data after reading it.)
(3) Check your code (including Makefile) into version control, and push to BitBucket.

After making these changes, I should be able to run your program on an x86 (little endian) or PowerPC (big endian) computer system and get exactly the same output.

For sanity checking, the last record written to the file in the first program should be identical to the last record read from the file in the second program.

Compile the Endian Reader Program:

unix>  make

The reader program takes one argument: The number of records to read with one system call. 0 is a special value, meaning read only one element at a time.

Use the program to read in the data file one element at a time, which is mode "0" (zero), the naive algorithm. Other modes will be described later.

unix>  ./endian_reader_program 0

The reader program self-checks all data it reads in, and will abort (with an error message) if the expected input value does not match the actual input value.

Lab Part 3 - System Call Overhead

For the third part of the lab, you will improve the performance (efficiency) of the data reading program. Take a close look at the read_record_naive() function - how much data does it retrieve from disk with any one system call to fread()? Not much, right? A general goal is always to do as much work as possible with as few system calls as possible.

To see this play out in practice, implement the following functions:

Function Name	Description
read_record_naive()	*Already implemented!* Reads data from disk one element at a time (i.e. 1 byte, then another 1 byte, then 2, 4, and 8)
read_record_1()	Reads data from disk one record at a time (16 bytes from disk in a single fread() call), and then splits it up into individual elements.
read_record_2()	Reads data from disk two records at a time (32 bytes from disk in a single fread() call). The first record is split up into individual elements and returned. The second record is saved (in memory) for a future call to read_record_2(). (Thus, fread() will only be used every other time that read_record_2() is called.)
read_record_n()	Reads data from the disk n records at a time (n*16 bytes from disk in a single fread() call), and saves the last n-1 records for later.

The boilerplate reader code already has hooks for all these functions in main(). For example, invoking endian_reader_program 0 at the command line will run the naïve function, while invoking endian_reader_program 1 will run read_record_1().

Once you are happy that all of your functions are performing the equivalent work, proceed to measuring their respective execution time on a large data set.

First, use the writer program to produce 50 million records (which will be a 762MB file). Note: If you are using an old computer, you must use a smaller file size. See note below regarding the reason why.

unix>  ./endian_writer_program 50000000

Then, run several variants of the reader program over the resulting data file, reading one element at a time, and then 1, 2, 4, 8, 16, 32, and 64 records at a time:

unix>  time ./endian_reader_program 0
unix>  time ./endian_reader_program 1
unix>  time ./endian_reader_program 2
unix>  time ./endian_reader_program 4
unix>  time ./endian_reader_program 8
unix>  time ./endian_reader_program 16
unix>  time ./endian_reader_program 32
unix>  time ./endian_reader_program 64

Note 1: This test is sensitive to file size. Linux tries to cache frequently accessed files in empty RAM to improve performance. This is NOT the same thing as the L1/L2/L3 cache, but is instead a cache of the hard disk stored in main memory (RAM). A ~700MB file should fit fine into this cache even given the limited memory allotted to your virtual machine. The disk will never be accessed after the file is initially loaded into the cache. As such, the reader program is almost exclusively limited by the performance of the fread() system call obtaining data from the disk cache. We do *not* want this test in Part 3 to be limited by the hard drive access speed.

Note 2: Your results may be noisy, due to background tasks and virtual machine overhead. If your graph doesn't show a clear trend, try closing all background tasks, repeating the experiments a few times and averaging the results.

Lab Report:
(1) Fill out the table below for a 50 million element data file (762MB)
(2) Provide a graph of the results in the table.
No credit will be given for sloppy graphs that lack X and Y axis labels, a legend, and a title.

Table 1: Read performance for a data file size of XYZ million elements

Records Read per fread() Call	Program Execution Time (sec)	User TIme (sec)	System Time (sec)
0 (naive approach, 1 item per call)
1
2
4
8
16
32
64

Lab Part 4 - Larger Data Files and OS Disk Cache

For the final exercise of this lab, you will repeat the same reader experiments as above, but with a much larger data file. You will also make use of the iotop utility, which interactively monitors disk activity in the same way that top interactively monitors CPU activity.

Run iotop (must be root):

unix>  sudo iotop -o -d 4

-o flag specifies only show processes that are using the disk
-d 4 flag specifies update the display every 4 seconds
Press q to exit when finished.

Lab Report:
(3) How much RAM is allocated to your virtual machine? Use the free -m system command, and look for the "total" field. If it's more than 3GB (3072MB), you'll want a bigger data file than 250 million elements below. Unlike before, in Part 4 now we want the data file to be larger than available RAM.
(4) Show the output produced by the iotop utility while at least one of the 50 million element reader experiments is being conducted (i.e. run both the reader program and iotop at the same time in different terminal windows).
(5) Repeat the reader experiment with a much larger data file: 250 million elements (3.7GB). Fill out the same table used previously, and provide a graph of the new results
Let LibreOffice auto-scale the graph. I'm interested in the shape, not the absolute values of the bars.
(6) Show the output of the iotop utility while at least one of the 250 million element experiments are being conducted
(7) Why does the graph for the 250 million element experiment look noticeably different than the graph for the 50 million element experiment?
Obviously, the bars are higher because the test takes longer. This question is concerned about why the shape of the curve is noticeably different. Your answer should speculate about the cache hit / cache miss rate of the disk cache in main memory. Then, support your speculation about whether or not the disk is being accessed by your iotop evidence.
(8) Why don't the user and system times add up to the total execution time in the 250 million element experiments?

Don't forget to delete the huge data.dat file when finished!

Lab Report - Wrapup:
(1) What was the best aspect of this lab?
(2) What was the worst aspect of this lab?
(3) How would you suggest improving this lab in future semesters?