HowTo |
Write a C program using Multiple Files | Prof. Bartenstein |
When writing C programs for academic assignments or simple tasks, we often put all of our C code for the program in a single file. We know how to put multiple functions in a single file, and for simple programs, a single file is not only sufficient, but also the simplest solution.
As programs get more complicated and need more code, it can often be useful to split up the code into multiple files just to make it easier to manage. Splitting up code can also be be required for code re-use. If the same code can be used in multiple programs, it makes sense to put the reusable code in it's own file and keep the non-reusable program specific code in different files.
This web page gives more information on how to design, create, and build multi-file C programs.
Multiple files are used for two reasons - to organize the code into logical units and to enable re-use of the code. When designing a C program, you must decide whether multiple files would be useful, and whether you can re-use code you have already written, or might re-use code you need to write for this program.
If you are using multiple files, it makes the most sense to collect all the functions that work with a single data construct or concept in a single file. For example, if I write a program to deal with lines on a Cartesian coordinate system, I might put all the code that deals with (x,y) coordinates in a single file. I might put code that handles lines in a separate file. Then I might put the user interface code and a main function in a third file.
It would be possible to collect all the code together by using:
#include "line.c"
#include "coordinate.c"
in the main file so that it would appear to the compiler as if all the code were in a single file. This is very seldom done for several reasons:
The accepted alternative is not to include all of the code. Instead, divide the code pertaining to a subset of the program into two files: a code file and a "header" file. For example, I might put the function definitions, data structures, and global variables into the code file like coordinate.c. I can then put all of the information required to invoke the functions, such as the function declarations, in the header file: coordinate.h.
#include "line.h"
#include "coordinate.h"
Then, the main file can invoke all of the line and coordinate functions without having to inlcude all of the function definitions, global variables, and data structures.
Since a header (.h) file is designed to be included in other files, including other header files, and since including the same header file twice often causes C compiler errors, it would be nice to have a mechanism to avoid includnig a file a second time in a compile. C macro processing gives us the capability to achieve this goal. For a header file named "coordinate.h", I would code the following guards:
#ifndef COORDINATE_H
#define COORDINATE_H
/* The header information goes here */
#endif
This mechanism chooses a macro variable, in this case COORDINATE_H that is not likely to be used for any other reason. Macro variables are considered undefined until the compiler finds a #define statement for that variable. Therefore, the first time the coordinate.h file is included, the macro variable COORDINATE_H will not be defined, and the code between the #ifndef statement and the #endif statement will be included. However, the first thing that happens in this code is that the macro variable COORDINATE_H is defined to an empty value. Therefore, if the coordinate.h file is ever included again in the program flow, the #ifndef COORDINATE_H statement will fail, causing the compiler to skip to the #endif statement, conveniently skipping over a second inclusion of the header information.
I like the convention that uses a macro variable spelled exactly like the file name, but in all upper case (so it's clear it's a macro variable), and with illegal characters like periods replaced with underscores.
The header file needs to contain all the information required to invoke functions. This is often as follows:
#include "coordinate.h" at the top of the header file (after the guard). Then, if there is a typedef in coordinate.h for coord, we can use a coord type in an argument to our line functions.#define statements in the header files, assuming our users will read the header files and make user of those limitations. Other times, we will assume the users of a function don't need to worry about these limits. In that case, it's probably better to put the #define statements in the code file.typedef statement for the type of data the file works with in the header file. That way, that type defined by the typedef can be used as either an argument type or a return type.Note that we often do not code function definitions, global variables, or even structure definitions in the header files. Those are often better kept in the code file.
The code file (.c file) in a multi-file program should include several things.
#include. This allows the compiler to check the function declarations with the function definitions that appear below. If the function definitions below need to invoke any other functions that are not local to this file (including system library functions), these should be #included as well.There are two ways to build a multi-file C program: the simple way, and a more complicated (but faster) way to compile. This section describes both.
The simplest way to build a multi-file program is to specify all the C code files on the compiler command line. For example, if we have coordinate functions in coordinate.c, line functions in line.c, and a main function in tryLine.c, then we could invoke the gcc compiler using the command line:
> gcc -ggcc -Wall -std=c18 -o tryLine tryLine.c line.c coordinate.c
This tells the compiler to compile all three C files together, and link edit them to produce a single executable file called tryLine that contains all the functions from all three C files.
The advantage of this strategy is that it is relatively simple. The disadvantage is that every time one of the C files changes, you need to recompile all three. When there are just three files, that's not a big problem, but in the industry, it's not uncommon to create executables made up of hundreds of C files. Compiling all of those files may take several hours, and that is a long time to wait when you have changed just a single line of code.
When a C compiler compiles and builds a C program, it goes through several stages. The first stage is to compile a single file and turn that file into object (.o) code. The object code contains all the machine code instructions derived from a single C code file.
Once all C files have been turned into object code, the second step, called link editing the code, is to merge all the object files into a single executable file.
When using the simple strategy of specifying multiple C files on a single command line, the gcc compiler performs both conversion to object code and link editing in a single invocation. It is possible, using different parameters, to stop gcc from performing a link edit, but instead, save the object code in a .o file before doing the link edit. Once all the object files have been created, it is possible to link edit all of the object files together into an executable file in a seperate gcc invocation.
The first time you build a program, the amount of work required is the same, no matter which strategy you use. All C files need to be converted to object files, and then all those object files need to be link edited together.
The real advantage of the more complicated strategy is after the first build. If you find an error in one file, you can edit that C file and retranslate that single file into object code, but you don't need to retranslate all the other C files in your program. If you saved the object code from those other files, all you need to do is link edit the old object code for the other files with your modified object code for the file you changed to produce an updated executable file. Imagine a production system with 300 C files. Now, instead of recompiling all 300 C files to fix a single line of code, all you have to do is create object code from a single C file, and re-link the result... something that takes seconds instead of hours!
The implementation of this strategy is different depending on your build strategy. Using a Makefile, it can be as simple as creating a make rule for the executable that specifies all the object code required. For example:
tryLine : tryLine.o line.o coordinate.o
There are often internal Make rules to create .o from .c, and to create an executable file from a list of .o depdendencies. However, there are some levels of Make and UNIX where this won't work, and it definitely loses the fact that, for example, line.o depends not only on line.c but also line.h.