Lab 2: C Programming (Language, Toolchain, and Makefiles)

Overview

This lab will give you hands-on experience with the C programming language, the development toolchain (pre-processor, compiler, assembler, linker), and automating the compilation process using Makefiles.

Lab - Getting Started

To begin this lab, start by obtaining the necessary boilerplate code. Please download the zip file lab02.zip and extract it to see all parts (part1, part2, and part3)

Log onto Linux and open a command prompt.

Lab Part 1 - Compiler Basics

Let's start with a simple program.

First, ensure you are in your personal repository. Make a folder inside your ~ as 2022_spring_ecpe170. This is the folder to keep all your labs for ECPE 170. For lab02 you need to make its folder inside 2022_spring_ecpe170. Hence, you will have the empty directory: "~/2022_spring_ecpe170/lab02".

unix>  cd ~/2022_spring_ecpe170/lab02

You already downloaded lab02.zip. Extract it. It has three folders: part1, part2, and part3. Copy all those directories to "~/2022_spring_ecpe170/lab02"

Now, enter the subdirectory for lab02/part1:

unix>  cd part1

Launch your favorite Linux text editor - gedit is the default for this class. In time, I will give you a glimpse into the vim editor, but let us use gedit for now. Use gedit to edit the file hello.c:

unix>  gedit hello.c &

Enter the following "Hello World" program written in the C programming language, and save it when finished.

hello.c:

#include <stdio.h>
int main(void)
{
  printf("Sawubona Mhlaba!\n"); /* Zulu for "hello, world" */
  return 0;
}

Compile it and run using GCC, an open-source compiler. (GCC stands for the "GNU Project C and C++ Compiler")

unix>  gcc hello.c -o hello_program

This tells GCC to take the hello.c input file and preprocess+compile+assemble+link it into an executable file with the name hello_program.

Run your program

unix>  ./hello_program
Sawubona Mhlaba!

Congrats, you're now an expert! Looks pretty easy, right?

Lab Part 2 - Toolchain / Multiple Source Files

Next, enter the "part2" directory of "lab02" in your private repository. Get a directory listing to see what files are present. You should see a demo program consisting of two source code files (main.c and file2.c) and two header files (main.h and file2.h).

Compile them all using a single GCC command, and run the resulting program.

unix> gcc main.c file2.c -o program
unix> ./program

Let's peel back the covers of what GCC is actually doing here. The simple GCC command ("compile my source code into a program") actually involves several discrete stages of processing: preprocessing, compiling, assembly, and linking. These are all implemented by separate tools and, except for the final linking stage, are done independently for each source code (.c) file. Thus, when doing C programming, the important thing to remember is that each file is compiled by itself, and then each resulting object file is linked together along with libraries (for features like printf()) to produce a single executable program.

Compile each .c file separately into its own .o object file. The (-c) argument configures GCC to only perform the first three steps of the process: pre-processing, compiling, and assembly. (i.e., no linking is done!)

unix>  gcc -c main.c
unix>  gcc -c file2.c

Get a directory listing to see the separate .o object files created by the compiler:

unix>  ls -ls

Use the file command to inspect the object file:

unix>  file main.o
main.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

The file command reports that the object file is a 64-bit ELF format for x86-64

ELF = Executable and Linkable Format. It is a common file format to store programs in
64-bit = The program has been compiled for a process that can manipulate data in 64-bit wide increments
x86-64 = Processor ISA (instruction set architecture)

x86 = "classic" Intel / AMD processors
x86-64 = "modern" Intel / AMD 64-bit processors

In order to combine these object files into a complete executable, the linker tool (ld) would be used. For example, to combine the files main.o, file2.o, and the standard C libraries (with the -lc flag), the following command could be used.

unix>  ld main.o file2.o -lc -o program

Although this command will produce a file named "program", the program will not actually be runnable. (If you try to run it with the command ./program, you will get a non-obvious "No such file or directory" error message).

Additional low-level software "glue" is required to produce a program binary that actually functions. To get all the pieces needed to produce a runnable program requires a command like one of the following. "collect2" is just a wrapper program around the "ld" loader. (Note: You do not need to actually get this command working, because it will be slightly different on different Linux distributions.)

For 64-bit OS, Ubuntu 20.04 (w/GCC 9.3.0):

/usr/lib/gcc/x86_64-linux-gnu/9/collect2 -plugin /usr/lib/gcc/x86_64-linux-gnu/9/liblto_plugin.so -plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper -plugin-opt=-fresolution=/tmp/ccoOOpOd.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --sysroot=/ --build-id --eh-frame-hdr -m elf_x86_64 --hash-style=gnu --as-needed -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie -z now -z relro -o program /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/Scrt1.o /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/9/crtbeginS.o -L/usr/lib/gcc/x86_64-linux-gnu/9 -L/usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/9/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/9/../../.. main.o file2.o -lgcc --push-state --as-needed -lgcc_s --pop-state -lc -lgcc --push-state --as-needed -lgcc_s --pop-state /usr/lib/gcc/x86_64-linux-gnu/9/crtendS.o /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/crtn.o

Once the full linking process is complete, you could run the finished program:

unix> ./program

Lab Part 3 - Makefiles

Based on the tutorial here.

Enter the "part3" directory of "lab02" in your private repository. Get a directory listing to see what files are present. You should see a demo program consisting of three source code files (main.c, factorial.c, and output.c) and one header file (functions.h).

Next, try to compile and run this program.

Waiting....

Waiting...

The last step is potentially hard, right? Some inconsiderate programmer bundled up their source code, but didn't include any instructions on how to compile it! Let's take a wild guess and use the normal GCC syntax:

unix>  gcc main.c output.c factorial.c -o factorial_program

Ok, it seemed to work with the default options. Run the program and verify its output is correct:

unix>  ./factorial_program 
Olá mundo!!
The factorial of 5 is 120

That wasn't so hard. But what if the source code had hundreds of files? Or used special non-default GCC settings to compile? Further, what if you change one source file out of hundreds - do you need to recompile everything? We need a way to automate and standardize this compilation process!

On Linux, a special utility called make is used to automate compiling programs. The make utility can automatically determine which pieces of a large program need to be recompiled, and issue the commands to recompile them.

The make utility is configured with a plain text file called a Makefile. If you run make, this utility will look for a file with the name "Makefile" (or "makefile") in your directory, and then follow its instructions. Or, if you have several makefiles (for a complicated project), you can tell the utility what configuration file to use by with command: make -f <filename>

Remember the GCC command you just used to compile the program? Let's accomplish the same thing with a Makefile. This file has a very specific format that must be followed exactly in order to function. The format is as follows:

target: dependencies
<tab> system command

Note that <tab> is literally the ***TAB*** key! You cannot substitute spaces there! This is the NUMBER ONE *ERROR* in writing Makefiles! (The TAB key is required at the start of every command line in a Makefile; no exceptions).

Makefile #1 - Basic Design:

Using your favorite text editor, create a new Makefile with the name "Makefile-1". Inside this file, enter the basic Makefile configuration for our example program. The configuration is:

all:
	gcc main.c output.c factorial.c -o factorial_program

Now, execute this Makefile, and then run the compiled program. Note that we use the (-f) option to specify a Makefile name, because our file has the name "Makefile-1", not the default "Makefile".

unix>  make -f Makefile-1
gcc main.c output.c factorial.c -o factorial_program
unix>  ./factorial_program 
Olá mundo!!
The factorial of 5 is 120

How did this Makefile work? The target name is called "all". This is the special default target for Makefiles if no other target is specified. There are no dependencies for target all (in our configuration), so make safely executes the system commands specified, which is the GCC compilation line.

Keep Makefile-1 to put into your final ZIP file to submit through Canvas.

Lab report:
(1) Copy and paste in your functional Makefile-1

Makefile #2 - Using Dependencies:

Now, let's exploit dependencies to give our compilation process more flexibility. Dependencies are targets that must be executed *before* the system command (on the next line) is executed. Why do we use them? In C programming, you don't want to have to recompile every file every time. Rather, you only want to recompile the file that changed, and all files that depend on that file. This can be accomplished with dependencies in make.

Using your favorite text editor, create a new Makefile with the name "Makefile-2". Inside this file, enter this configuration:

all: factorial_program

factorial_program: main.o factorial.o output.o
	gcc main.o factorial.o output.o -o factorial_program

main.o: main.c
	gcc -c main.c

factorial.o: factorial.c
	gcc -c factorial.c

output.o: output.c
	gcc -c output.c

clean:
	rm -rf *.o factorial_program

Now, execute this Makefile, and then run the compiled program.

unix>  make -f Makefile-2
gcc -c main.c
gcc -c factorial.c
gcc -c output.c
gcc main.o factorial.o output.o -o factorial_program
unix>  ./factorial_program 
Olá mundo!!
The factorial of 5 is 120

Same output as before! But, our Makefile now has several additions:

More flexibility - if only one source file changes, make won't recompile everything.
Housekeeping - make can clean up after itself by deleting temporary files (.o) or even the compiled program

What happens if you run make again, without changing any of the source files?

unix>  make -f Makefile-2
 make: Nothing to be done for `all'.

Make tells you that no compilation is necessary, because nothing has changed.

Run the housekeep portion to clean up the object files and compiled program:

unix>  make clean -f Makefile-2
rm -rf *.o factorial_program

Keep Makefile-2 to put into your final ZIP file to submit through Canvas.

Lab report:
(2) Copy and paste in your functional Makefile-2
(3) Describe - in detail - what happens when the command "make -f Makefile-2" is entered. How does make step through your Makefile to eventually produce the final result?

Makefile #3 - Using Variables and Comments:

The second Makefile was very redundant. Let's try to simplify it with some variables that can be reused, and add comments at the same time.

Using your favorite text editor, create a new Makefile with the name "Makefile-3". Inside this file, enter this configuration:

# The variable CC specifies which compiler will be used.
# (because different unix systems may use different compilers)
CC=gcc

# The variable CFLAGS specifies compiler options
#   -c :    Only compile (don't link)
#   -Wall:  Enable all warnings about lazy / dangerous C programming 
CFLAGS=-c -Wall

# The final program to build
EXECUTABLE=factorial_program

# --------------------------------------------

all: $(EXECUTABLE)

$(EXECUTABLE): main.o factorial.o output.o
	$(CC) main.o factorial.o output.o -o $(EXECUTABLE)

main.o: main.c
	$(CC) $(CFLAGS) main.c

factorial.o: factorial.c
	$(CC) $(CFLAGS) factorial.c

output.o: output.c
	$(CC) $(CFLAGS) output.c

clean:
	rm -rf *.o $(EXECUTABLE)

Now, execute this Makefile, and then run the compiled program.

unix>  make -f Makefile-3
gcc -c -Wall main.c
gcc -c -Wall factorial.c
gcc -c -Wall output.c
gcc main.o factorial.o output.o -o factorial_program
unix>  ./factorial_program 
Olá mundo!!
The factorial of 5 is 120

Same output as before! But, our Makefile now uses the CC and CFLAGS variables to specify the compiler and compiler options in one place, simplifying changes. Note that you can pick any variable name you want, but CC and CFLAGS are widely used standards.

Keep Makefile-3 to put into your final ZIP file to submit through Canvas.

Lab report:
(4) Copy and paste in your functional Makefile-3

Run the housekeep portion to clean up the object files and compiled program:

unix>  make clean -f Makefile-3
rm -rf *.o factorial_program

Makefile #4 - Professional-Level:

The third Makefile, although better, still wasted a lot of space describing each and every object file, even though each target does exactly the same thing! That would get tedious for a large program.

In addition, the third Makefile has a more subtle problem. What happens if the header file (functions.h) changes? Note that it is not (currently) listed as a dependency anywhere. Thus, make would not notice that it had changed, and would not re-compile your program when asked. (It would instead report that there was no work to do). Let's fix this!

Using your favorite text editor, create a new Makefile with the name "Makefile-4". Inside this file, enter this configuration:

# The variable CC specifies which compiler will be used.
# (because different unix systems may use different compilers)
CC=gcc

# The variable CFLAGS specifies compiler options
#   -c :    Only compile (don't link)
#   -Wall:  Enable all warnings about lazy / dangerous C programming 
#  You can add additional options on this same line..
#  WARNING: NEVER REMOVE THE -c FLAG, it is essential to proper operation
CFLAGS=-c -Wall

# All of the .h header files to use as dependencies
HEADERS=functions.h

# All of the object files to produce as intermediary work
OBJECTS=main.o factorial.o output.o

# The final program to build
EXECUTABLE=factorial_program

# --------------------------------------------

all: $(EXECUTABLE)

$(EXECUTABLE): $(OBJECTS)
	$(CC) $(OBJECTS) -o $(EXECUTABLE)

%.o: %.c $(HEADERS)
	$(CC) $(CFLAGS) -o $@ $<

clean:
	rm -rf *.o $(EXECUTABLE)

Now, execute this Makefile, and then run the compiled program.

unix>  make -f Makefile-4
gcc -c -Wall -o main.o main.c
gcc -c -Wall -o factorial.o factorial.c
gcc -c -Wall -o output.o output.c
gcc main.o factorial.o output.o -o factorial_program
unix>  ./factorial_program 
Olá mundo!!
The factorial of 5 is 120

Same output as before!

A few notes on what has changed here:

A new variable - HEADERS - was added that lists all the .h files in the project. The object (.o) target line in the Makefile depends on HEADERS. (Thus, if a header file changes, all of the object files will be rebuilt. Perhaps overkill, but definitely safe!)
A new variable - OBJECTS - that lists all the intermediary .o files in the project. The final executable program target depends on all of the object files to be built.
New syntax: %.o: %.c. This rule says that all files in the current directory ending in .o are targets. Each target depends on the *corresponding* .c source code file and all of the .h header files specified in HEADERS. The rule then says that to generate the .o file, make needs to compile the .c file using the compiler defined in the CC variable and the options set in the CFLAGS variable. The output file (specified with the -o flag) uses the special symbol $@. This symbol is replaced with the name of the file on the left side of the : character, i.e. the object file name. The file to compile is named by the special symbol $<, which is replaced with the first item in the dependencies list, which is the corresponding .c source code file.

Keep Makefile-4 to put into your final ZIP file to submit through Canvas.

Lab report:
(5) Copy and paste in your functional Makefile-4
(6) Describe - in detail - what happens when the command "make -f Makefile-4" is entered. How does make step through your Makefile to eventually produce the final result?
(7) To use this Makefile in a future programming project (such as Lab 3...), what specific lines would you need to change?
(8) Take one screen capture of the your code, clearly showing the "Part 3" source folder that contains all of your Makefiles, along with the original boilerplate code.

A final note on Makefiles. It's very rare to have multiple Makefiles in a single project. With only one Makefile, you don't need to specify the filename when running make anymore. Make a copy of your Makefile-4 with the simple name Makefile:

unix>  cp Makefile-4 Makefile

Now you can use your Makefile in the simplest manner possible. For example, clean your directory:

unix>  make clean
rm -rf *.o factorial_program

Compile your program:

unix>  make
gcc -c -Wall -o main.o main.c
gcc -c -Wall -o factorial.o factorial.c
gcc -c -Wall -o output.o output.c
gcc main.o factorial.o output.o -o factorial_program

(Optional) Lab Report:
(1) How would you suggest improving this lab in future semesters?

Submission:
(1) Check your full lab report under the lab02 directory
(2) Check your lab code (including all Makefiles) into the lab02 directory
(3) Submit the compressed zip file (lab02.zip) containing all required files and folders to Canvas.