HowTo

Write a C program using Multiple Files

Prof. Bartenstein

Overview

Why Use Multiple Files for a C Program?

When writing C programs for academic assignments or simple tasks, we often put all of our C code for the program in a single file. We know how to put multiple functions in a single file, and for simple programs, a single file is not only sufficient, but also the simplest solution.

As programs get more complicated and need more code, it can often be useful to split up the code into multiple files just to make it easier to manage. Splitting up code can also be be required for code re-use. If the same code can be used in multiple programs, it makes sense to put the reusable code in it's own file and keep the non-reusable program specific code in different files.

This web page gives more information on how to design, create, and build multi-file C programs.

Design Considerations for Multi-File Programs

Multiple files are used for two reasons - to organize the code into logical units and to enable re-use of the code. When designing a C program, you must decide whether multiple files would be useful, and whether you can re-use code you have already written, or might re-use code you need to write for this program.

If you are using multiple files, it makes the most sense to collect all the functions that work with a single data construct or concept in a single file. For example, if I write a program to deal with lines on a Cartesian coordinate system, I might put all the code that deals with (x,y) coordinates in a single file. I might put code that handles lines in a separate file. Then I might put the user interface code and a main function in a third file.

Why have both a Header and a Code File?

It would be possible to collect all the code together by using:


	#include "line.c"
	#include "coordinate.c"

in the main file so that it would appear to the compiler as if all the code were in a single file. This is very seldom done for several reasons:

Global variables no longer apply to just the main file, but all of the included files as well.
The code cannot be "right side up" because the functions need to be declared before they are used but, in order to be right side up, the functions should not be defined before they are used. If you include line.c, then you must include both the declaration and definition of the code in the same place.
The compiler gets a very big file to compile, which could make the compiler very slow.
And so on.

The accepted alternative is not to include all of the code. Instead, divide the code pertaining to a subset of the program into two files: a code file and a "header" file. For example, I might put the function definitions, data structures, and global variables into the code file like coordinate.c. I can then put all of the information required to invoke the functions, such as the function declarations, in the header file: coordinate.h.

If we use header and code files, then every file that wants to invoke functions from a different file can include the header file for those functions. For example, in our main file, we could code:


#include "line.h"
#include "coordinate.h"

Then, the main file can invoke all of the line and coordinate functions without having to inlcude all of the function definitions, global variables, and data structures.

Coding a Header File

Guarding Header Files from Duplicate Includes

Since a header (.h) file is designed to be included in other files, including other header files, and since including the same header file twice often causes C compiler errors, it would be nice to have a mechanism to avoid includnig a file a second time in a compile. C macro processing gives us the capability to achieve this goal. For a header file named "coordinate.h", I would code the following guards:


#ifndef COORDINATE_H
#define COORDINATE_H

/* The header information goes here */

#endif

This mechanism chooses a macro variable, in this case COORDINATE_H that is not likely to be used for any other reason. Macro variables are considered undefined until the compiler finds a #define statement for that variable. Therefore, the first time the coordinate.h file is included, the macro variable COORDINATE_H will not be defined, and the code between the #ifndef statement and the #endif statement will be included. However, the first thing that happens in this code is that the macro variable COORDINATE_H is defined to an empty value. Therefore, if the coordinate.h file is ever included again in the program flow, the #ifndef COORDINATE_H statement will fail, causing the compiler to skip to the #endif statement, conveniently skipping over a second inclusion of the header information.

I like the convention that uses a macro variable spelled exactly like the file name, but in all upper case (so it's clear it's a macro variable), and with illegal characters like periods replaced with underscores.

What Should a Header File Contain

The header file needs to contain all the information required to invoke functions. This is often as follows:

Includes: Sometimes, we want to use types defined in other files. For example, functions that work with lines might want to use (x,y) coordinates as arguments. In this case, the line.h file may #include "coordinate.h" at the top of the header file (after the guard). Then, if there is a typedef in coordinate.h for coord, we can use a coord type in an argument to our line functions.
Macro Variables: Sometimes, we want to define macro variables that limit things like array sizes or the amount of space available for values. If we want our users to be aware of these limits, we can put #define statements in the header files, assuming our users will read the header files and make user of those limitations. Other times, we will assume the users of a function don't need to worry about these limits. In that case, it's probably better to put the #define statements in the code file.
Type Definitions: Often, we will design functions to take as arguments or return a data structure of a specific type. I find it very useful to include a typedef statement for the type of data the file works with in the header file. That way, that type defined by the typedef can be used as either an argument type or a return type.
Function Declarations: The most important part of the header file are the function declarations that specifiy the return type, the function name, and the function argument list (including the types of arguments.)
Function Documentation: Sometimes, it is convenient to put short user documentation for the functions declared in this file in comments in the header file. Then users of these functions have an easy place to look to learn how to use the functions.

Note that we often do not code function definitions, global variables, or even structure definitions in the header files. Those are often better kept in the code file.

Writing the Code File

The code file (.c file) in a multi-file program should include several things.

Includes: The associated header file should be #include. This allows the compiler to check the function declarations with the function definitions that appear below. If the function definitions below need to invoke any other functions that are not local to this file (including system library functions), these should be #included as well.
Structure Definitions: Structure definitions may be included in the code file. This allows all the functions definied in this file to access the fields of the structures.
Global Variables: Any global variables required within this file. Global variables should be avoided (they increase functional binding), but sometimes a global variable is the best solution.
Local Function Declares: If any helper functions are required in this file that you don't expect a user of these functions to invoke, in order to avoid upside down code, these functions need to be declared at the top of the file.
External Function Definitions: The most important code in the code file are the definitions of the functions that are declared in the header file. This is where the implementations of those functions exist.
Internal Function Definitions: The code file must also include the function definitions of the functions that are only used within this file.

Building a Multi-File C Program

There are two ways to build a multi-file C program: the simple way, and a more complicated (but faster) way to compile. This section describes both.

Building a Multi-File C Program Simply

The simplest way to build a multi-file program is to specify all the C code files on the compiler command line. For example, if we have coordinate functions in coordinate.c, line functions in line.c, and a main function in tryLine.c, then we could invoke the gcc compiler using the command line:


	> gcc -ggcc -Wall -std=c18 -o tryLine tryLine.c line.c coordinate.c

This tells the compiler to compile all three C files together, and link edit them to produce a single executable file called tryLine that contains all the functions from all three C files.

The advantage of this strategy is that it is relatively simple. The disadvantage is that every time one of the C files changes, you need to recompile all three. When there are just three files, that's not a big problem, but in the industry, it's not uncommon to create executables made up of hundreds of C files. Compiling all of those files may take several hours, and that is a long time to wait when you have changed just a single line of code.

Building a Multi-File C Program Quickly

When a C compiler compiles and builds a C program, it goes through several stages. The first stage is to compile a single file and turn that file into object (.o) code. The object code contains all the machine code instructions derived from a single C code file.

Once all C files have been turned into object code, the second step, called link editing the code, is to merge all the object files into a single executable file.

When using the simple strategy of specifying multiple C files on a single command line, the gcc compiler performs both conversion to object code and link editing in a single invocation. It is possible, using different parameters, to stop gcc from performing a link edit, but instead, save the object code in a .o file before doing the link edit. Once all the object files have been created, it is possible to link edit all of the object files together into an executable file in a seperate gcc invocation.

The first time you build a program, the amount of work required is the same, no matter which strategy you use. All C files need to be converted to object files, and then all those object files need to be link edited together.

The real advantage of the more complicated strategy is after the first build. If you find an error in one file, you can edit that C file and retranslate that single file into object code, but you don't need to retranslate all the other C files in your program. If you saved the object code from those other files, all you need to do is link edit the old object code for the other files with your modified object code for the file you changed to produce an updated executable file. Imagine a production system with 300 C files. Now, instead of recompiling all 300 C files to fix a single line of code, all you have to do is create object code from a single C file, and re-link the result... something that takes seconds instead of hours!

The implementation of this strategy is different depending on your build strategy. Using a Makefile, it can be as simple as creating a make rule for the executable that specifies all the object code required. For example:


tryLine : tryLine.o line.o coordinate.o

There are often internal Make rules to create .o from .c, and to create an executable file from a list of .o depdendencies. However, there are some levels of Make and UNIX where this won't work, and it definitely loses the fact that, for example, line.o depends not only on line.c but also line.h.

Contents: