HowTo

Use gdb to Debug C programs

Prof. Bartenstein

The GNU debugger (gdb) is a very powerful debugging tool that allows you to investigate exactly how programs work or don't work. It takes a little effort to learn how to use gdb, but that effort will repay you for the rest of your life. Even if you end up in a career that does not use C code, the debugging concepts you learn by learning about gdb will be useful in many other environments!

On this page, the unix_command format is used for unix commands that can be typed at the UNIX prompt (which usually ends in ">", but can be configured.) The gdb_command format is used for gdb sub-commands that can be typed at the gdb prompt (gdb) when gdb is running.

Compiling code for gdb

C compilers in general, and the gcc compiler specifically, convert C code into executable files. In order to keep executable files small and efficient, the default is to discard most cross-reference information associated with the C code. The gdb debugger will work with minimal information, but has very limited capability. To do anything useful with gdb, we need to tell the compiler to keep the extra cross reference information, using the -g flag, or (to use the "most expressive style available") -ggdb.

Without -g or -ggdb, gdb will have information about function names, but no information about specific C lines of code. That means that you can set a breakpoint at the beginning of the function, but will not be able to step through that function, or see the values of variables.

If you use -g or -ggdb when compiling, then you will be able to see and breakpoint at specific lines of code as well as all the variables you use in your C program.

Invoking gdb

The gdb debugger is invoked with the gdb command. The gdb command has a wide variety of flags and command line options. Most often, we invoke gdb with the name of the executable file we want to debug. For example, if I have a helloWorld executeable file in the current directory, I can run gdb using gdb helloWorld

When you invoke gdb with an executable file as an argument, the gdb command will load the executable program into memory, read all the cross-reference information associated with that program, but before running the program, gdb will open a prompt for a gdb sub-command to ask you what you want to do next. The gdb sub-command is indicated by a (gdb) prompt.

When you get the prompt, you probably need to set one or more breakpoints (see Setting and Using breakpoints in gdb), and then run your program. To run your program, use the command run command line arguments. This will cause gdb to start running your program from the beginning, using the command line arguments you specify to set argv and argc. On most platforms (except Cygwin and some other UNIX emulators), you can use UNIX redirection in your command line arguments. For example, you can run -noprompt testarg <test.txt. This will set argc to 3, argv[1] to "-noprompt", argv[2] to "testarg", and redirect standard input to read from the text.txt file.

When you invoke the run command, gdb will run your program until a breakpoint is reached, your program ends, or you encounter a violation; whichever comes first.

Once your program is running, when it stops, you will get the gdb prompt (gdb) again. At that point, you can perform debugging tasks like looking at variables, setting more breakpoints, or stepping through your code. If you are stoped at a breakpoint, you can also invoke the continue command to continue running your code until the next breakpoint, or your program ends, or issues a violation; whichever comes first.

When you have finished debugging your program, use the quit command to exit gdb and return to the UNIX prompt.

General gdb command information

The gdb debugger is designed to make enterring gdb sub-commands as simple as possible. Some of the features include:

Commands may be abbreviated to the shortest un-ambiguous set of letters. For example, the break command may be abbreviated to bre, or even just b since there are no other gdb commands that start with "b".
If you hit enter without typing anything, gdb will rerun the previous command for you. This is very useful when single stepping through code. For example, if you want to single step through a function, use the next command to make the first step, but then just hit enter over and over to continue running the next line of code.
The gdb command keeps a history of all of the commands you have used in this session (or at least the last few hundred.) You can recall a command from the history by hitting the up arrow. Once you are in the history, you can go backwards using the up arrow, or forwards using the down arrow.
If you forget the details of specific gdb commands, use the gdb help command, e.g. help break to get the details of the breakpoint command. The help command with no argument gives all the options for help.
When gdb stops at a breakpoint, it stops before the instruction at that breakpoint has been executed, but (as long as you have cross-reference information from -g, it will print the C instruction that is the next instruction to be executed.

Single stepping through code

There are two gdb commands designed to execute a single line of code, step and next. The only difference between step and next occurs when there is a function invocation on the line of code to be executed. If a function is invoked, the step command will step into the function at stop at the first line of the function. The next command will execute the lower level function until it returns and stop at the next line of code in the same function it started in. Use the step command to debug what happens in the lower level functions as well as the higher level functions. If you trust the lower level functions to work correctly, use the next command to skip over that level of detail. You may want to skip back and forth between step and next, depending on what is getting invoked. For example, you probably don't want to step into C library code, such as the code for a printf invocation, so if the line of code invokes printf, you probably want to use next for that line instead of step.

If you are in a lower level function, but don't care about what happens in that function, use the finish command. The finish command runs all the remaining instructions in the current function until that function returns to its caller. The finish command allows you to resume debugging the caller if you mistakenly stepped into a lower level function. (There is also a return function to return immediately without executing the rest of the lower level function. This is not very useful.)

Most often, you will single step through code by starting with the step or next command, and then hitting enter to rerun that command to step through several lines of code. At each line of code, you may want to perform other debugging activities, such as looking at variable values, or setting breakpoints. If you do so, don't forget to retype the step or next command instead of just hitting enter.

When you are done single stepping, you may want to continue to the next breakpoint, or just quit out of gdb altogether.

Setting and Using breakpoints in gdb

It's almost impossible to use gdb without setting breakpoints, and breakpoints can be quite simple, but can be customized quite a bit. The simplest breakpoint identifies a line of code in your program. For example the command break 23 tells gdb to set a breakpoint at line 23 of the code you are currently running. If line 23 does not contain any code (for instance, it's an empty line, or a comment line), then gdb sets the breakpoint at the next line in the file that does have a C instruction. By default, gdb will stop before executing the line of code at the breakpoint every time it gets to that line. For example, if the breakpoint is inside a loop, gdb will stop at then line for every execution of the body of the loop.

GDB keeps a list of breakpoints, and assigns each breakpoint a number. When you set a breakpoint, gdb tells you which number breakpoint that is. That number can be used in other commands to do things like enable or disable the breakpoint, or delete it altogether.

There are several ways to identify the line to break at. The simplest way is to use the line number, but it is also possible to specify a file name of another .c file, followed by a colon, and the line number in that file. For example break list.c:8 to set a breakpoint at line 8 of file list.c. You may also specify a function name to set a breakpoint at the first instruction in that function. For example break main will set a breakpoint at the first instruction in the main function.

By default, gdb will stop every time it hits a breakpoint, but it is possible (and very powerful) to set a condition on the breakpoint. A conditional breakpoint will evaluate the condition each time the breakpoint is reached, but only stop if the condition is true. The condition is any C code which evalutes to a boolean true or false value (non-zero or zero in C), and is specified using an "if" argument in the break command. For example break 23 if i>=59 tells gdb to set a breakpoint at line 23 of the current file. Every time line 23 is encounterred, gdb interprets the code "i>=59". If the result is true, which means the value of the "i" variable is greater than or equal to 59, then gdb will stop and present the (gdb) prompt. If the result is false, i<59, then gdb keeps running the code. Conditional breakpoints allow you to set a breakpoint that only stops when the really interesting stuff occurs.

A similar feature of breakpoints is the capability to ignore the breakpoint a specified number of times. This can be set with the ignore command. For example, ignore 3 5 tells gdb to ignore breakpoint number 3 the first 5 times it is encounterred, but stop when that breakpoint is encounterred the sixth time.

Another very powerful feature of breakpoints is the capability to specify a list of commands to get executed when that breakpoint is encounterred. Use the command commands to start defining the commands that will be executed when the last defined breakpoint is hit. Then, type the commands themselves, followed by end to finish the set of commands. For example:

(gdb) commands Type commands for breakpoint(s) 3, one per line. End with a line saying just "end". >print i >continue >end (gdb)

This tells gdb that every time it reaches breakpoint 3 (the last breakpoint defined), it should print the value of the i variable, and then continue to the next breakpoint. Use this to keep track of the current value of the i variable in the loop without having to stop for each iteration of the loop.

The info breakpoints command will print the entire list of breakpoints, along with any statistics, conditions, or commands associated with those breakpoints. Use the delete breakpoints command to delete breakpoints.

Printing the values of Variables

When stopped at a breakpoint, gdb alows you to display the values of all of the variables available at that point in the program. The print command is the simplest way to do this. The print command interprets the C expression defined in the command, and writes the result to the terminal. Most often, the expression is just the name of a variable, as in print area to print the value of the "area" variable.

The gdb command knows almost always knows the type of the result, and uses a "pretty printer" to format the results in an easy to read result. You can override the output format by specifying flags in the print statement. For example, to print a value in hexadecimal, you can use print /x myVar. The GDB manual has an Output Formats section that has all the details.

The expression argument to the print command can be a more complicated expression, such as print n+3 or print mask<<3. In fact, the expression may be any valid C line of code, and can invoke valid functions or do whatever C can do. This makes the print command powerful and very useful.

When you invoke the print command, gdb evaluates the expression to a value, saves that value in a gdb "pseudo-variable", and prints both the variable and the value to the screen. For example, if you type print area1+area2, gdb will add the values of the area1 and area2 variables, and print to the screen something like $4 = 58. In this case, the pseudo-variable is $4 and the value of the result is 58. You can then use this result in other print expressions. For example, to find the average of area1 and area2, you could then type print $4/2, which would produce $5 = 29. Each print statement creates its own unique pseudo-variable.

An alternative is to use the printf command. The printf command works much like the printf C library function. The first argument is a formatting string, and the remaining arguments specify the values to use to replace the tags in the format string. For example, you can use the command printf "Coordinate=(%d,%d)",x,y to print two values in an easy to read format on one line.

The Program Stack in GDB

When you run a C program under gdb, gdb invokes the main function first. The main function may invoke a sub-function, and that sub-function may, in turn, invoke a third level function, and so on. Eventually the lowest level function returns to its caller, which eventually returns to its caller, and so on, until you get back to the main function. When the main function returns, the program ends. UNIX keeps track of function calls by maintaining a "program stack". Every time a function is invoked, information about that invocation is pushed onto the bottom of the program stack (raising all the lower level function invocations up 1.) When a function returns to its caller, the function invocation information is popped from the bottom of the program stack. The main function is at the top of the stack.

The gdb program reveals information about the stack with the where command. The where command prints the entire stack, starting at the bottom of the stack - the currently executing function, identified as #0. The caller of the current function is #1, and its caller is #2, all the way up to the top of the stack, the main function. (This seems upside down, but is that way for historical reasons.)

By default, the context in gdb is the currently executing function at the top of the stack. The gdb command allows you to modify the context using the up or down commands. The up command changes the context to the context of the caller. After moving up, you can now look at the calller's local variables, but no longer have access to the current function's local variables.

Another way to change the context is to use the finish command to finish the currently executing function and return to it's caller.

Debugging the Underlying Assembler

The executable that GDB helps us debug contains machine language instructions; not high level code. The gdb command does an outstanding job of hiding the machine level details, and making it seem like we are actually executing C instructions instead of running at the machine level. With a little prodding, gdb can reveal the underlying infrastructure as well. This section describes the gdb commands and techniques to debug at the assembler level along with the C level.

Working with the x86 Assembler Instructions

The gdb disassemble command prints the underlying x86 instructions that implement the C instructions of the currently executing function. Use disassemble /m to include both C and x86 information.

The gdb next and step command single step through C instruction lines. Similarly, the command nexti and stepi step through one x86 instruction at a time. Like step and next, stepi steps into function calls, but nexti skips over a called function's instructions and does not stop until the function returns.

By default, stepi and nexti do not print the x86 instruction that is about to execute, but if you run set disassemble-nextline-on, the next x86 instruction will get printed after nexti or stepi.

Warning: some library code is "protected". That means that you are not allowed to see the x86 code that makes up those instructions. If you "stepi" into a protected library function, then you are debugging in the dark. Use the finish command to get back to unprotected code.

Since x86 instructions do not have C file line numbers, in order to set a breakpoint at a specific x86 instruction, we need to find the address of that instruction, and use the gdb command break *0xhex_address. The "*" indicates that this is not a line number, but an address, and the 0x specifies that what follows is a hexadecimal address, not a base 10 address.

Working with X86 Assembler Data

At the X86 level, variable names are rarely used. Instead, memory is referenced using one of the X86 addressing modes; almost always using registers and offsets. X86 also keeps many values in registers as well as memory.

It's easy to access register values in gdb because gdb defines pseudo-variables for each register (using a dollar-sign ($) prefix instead of a percent sign). You can use these pseudo-variables in a print statement to get register values. For example, to get the current value in the %rax register, use print $rax. To print the stack pointer in hex, use the /x format, as in print /x $rsp.

GDB also hasa an info reg command, but that's not as useful.

You can use more sophisticated expressions such as print *((int *)$rbp-4) to print the integer that is 4 bytes in front of the location pointed to by the rbp register, but this quickly becomes complicated and not very useful. A better alternative is to use the gdb "x" command to Examine Memory.

Examining Memory

The "x" command allows you to print information in memory using a format and interpretation of your own choosing. The general format of the "x" command is x /nfu address, where address is the location in memory to start the display, and where the nfu flags are defined as follows:

n

is the number of values to print (the default is 1). If this number is negative, values are printed backwards in memory.

f

is the format of each value, as follows:

x: hexadecimal
z: hexadecimal padded on the left with zeroes
d: two's complement binary expressed as a decimal integer
o: Octal
t: Binary
u: Raw binary express as an unsigned decimal
f: IEEE 754 floating point expressed as a decimal real number
a: Address expressed in hexadecimal
c: ASCII expressed as a single charater
s: ASCII null terminated string
i: Variable length x86 instruction

u

specifies the unit size (or width) of each value; where:

b: byte (1 byte, 8 bits)
h: halfword (2 bytes, 16 bits)
w: word (4 bytes, 32 bits)
g: doubleword (8 bytes, 64 bits) Note that this last one is "g" is in "giant-word", not "q" as in "quad-Word".

Unit size is not required for format options of c, s, or i because gdb already knows the size.

Here are some examples of the x command:

x /4i 0x1004011b5: Show 4 instructions starting at address 0x1004011b5.
x /4i $rip: Show 4 instructions starting at the instruction pointer
x /tw &age: Show the 32 bit binary value of the age variable
x /8cb argv[0]: Display the 8 single byte characters pointed to by argv[0]
x /d $rbp-0x20: Display the value 32 bytes before where the rbp register is pointing as a decimal value
x /6dg $rbp-64: Show a vector of 6 long (8 byte) decimal values starting at 64 bytes before the current value of the %rbp register

Contents: