Overview

Why use Format Strings?

One of the challenges of programming languages is to figure out how to create output, for example, messages to users, that contain a formatted printable version of the values of variables in the program. Variables which contain character strings require little or no conversion, but variables which contain binary data are a little more challenging. The binary data needs to be converted from its underlying format (e.g. two's complement binary or IEEE 754 floating point) into a decimal representation, and the decimal representation needs to be converted into printable characters that can be inlcuded in the output.

Other languages meet this challenge with some relatively sophisticated constructs, such as the cout feature in C++ or the python print function. Because the C language is a quite a bit older than these other examples, it uses the concept of a format string to do this conversion.

What are Format Strings?

A format string is a string of characters which defines the format of the output. Most of the characters in a format string are treated as literal characters, and copied verbatim to the output. However, the format string may also contain one or more format specifiers. A format specfier starts with a per-cent sign (%) and represents a place in the format string where the format specifier should be replaced by the character representation of a value in your program. The printf (or printf-like) function reads the format string and replaces each format specifier with the character representation of a value passed into printf as extra arguments. An example might help. For the C code: printf("Age is %d\n",age); The format string is "Age is %d", which contains the format specifier %d. The %d tells printf to replace the %d with the decimal value of the next parameter, which is a variable called age. If the age variable has the value 19, then printf would convert the format string to "Age is 19\n" before writing it to standard output. The "\n" represents a single new-line character, so "Age is 19" would appear on its own line.

Where are Format Strings Used?

Many C library functions use the concept of a format string. There are two flavors of format strings; format strings used to produce output, which are the most common format strings, and format strings used to scan input. Here's a list of the most useful format string functions:

printf: Converts a format string to an output string, and writes that output string to standard output.
fprintf: Converts a format string to an output string and writes it to the stream specified as the first argument.
sprintf: Converts a format string to an outpout string, and writes the result into memory at the location specified by the first argument.
scanf: Reads from standard input and based on the format string, converts text from standard input into binary formats, and stores them based on the arguments to scanf.
fscanf: Reads from the input stream defined as an argument based on the format string, converts text from the input stream into binary formats, and stores them based on the arguments to fscanf.
sscanf: Reads from a string in memory specified by the first argument based on the format string, covnerts text from the input string into binary formats, and stores the results based on the remaining arguments to sscanf.

Format String Argument Lists

Almost all functions in the C language have a fixed number of arguments. Functions which use format strings are an exception to this rule. When you call a function that uses a format string, the number of arguments to the function depends on the number of format specifiers in the format string.

For example, if I want to create a formatted string that contains both the name of a student and his grade, I would use a printf statement such as printf("Student %s, grade %s\n",student,grade);. In this example, the first argument is the format string, which contains two format specfiers. Therefore, I need two more arguments to printf, in this case student and grade for a total of three arguments. Compare that to the example from above, printf("Age is %d\n",age);, where there are only two arguments.

Format Specifiers

Format Specfiers in General

A format specifier always starts with a per-cent sign (%), followed by several optional values that control the formatting of the result, but always ending in a single character type field. The full specification of a format specifier is:

   %[parameter][flags][width][.precision][length]type

Where italics represent something that is replaced, and square brackets indicate an optional item.

The format specifier is used to define:

The data type of the input value
The kind of value to produce as output. For example, should an int value be converted into decimal, octal, or hexadecimal?
What is the minimum length of the result?
If the result has a decimal point, how many decimal places should be printed?
If the result is shorter than the minimum length, how should it be padded?
If the result is shorter than the minimum lenght, how should it be justified? Left justified or right justified?
Which parameter in the argument list should be used for this format string?

Obviously, to convey all of this information, format specifiers can become quite complex. However, usually defaults are good enough. Most of the specifications are only required if you are doing something complex like printing out a formatted table.

List of Format Specifier Types

The following table shows the most common format specifiers, the type of argument associated with that specifier, and what the result will be, including a formated example.

	Arg Type	Conversion	Example
%d	int	ASCII decimal	"-19"
%u	unsigned int	ASCII decimal	"243"
%o	int	ASCII Octal	"0363" or "37777777755"
%x	int	ASCII Hexadecimal (lowercase)	"0f3" or "ffffffed"
%X	int	ASCII Hexadecimal (uppercase)	"0F3" or "FFFFFFED"
%lx	long	ASCII Hexadecimal	"0f3" or "ffffffffffffffed"
%hx	short	ASCII Hexadecimal	"0f3" or "ffed"
%hhx	char	ASCII Hexadecimal	"f3" or "ed"
%p	type *	ASCII Hexadecimal	"7ffffffffe2c"
%f	double	ASCII Real Number - default is 6 digits precision	"3.141592"
%e	double	ASCII Scientific Notation	3".141592e0" or "6.022140e23"
%g	double	ASCII Shorter of Real or Scientific Notation	"3.141592" or "6.022140e23"
%c	char	Single ASCII character	"p"
%s	char *	Null terminated string	"Hello world!"
%%	--	Per-cent Sign	"%"

Fixing Type Mismatch Compiler Warnings

It is very common to get type mismatch compiler warnings when using printf and printf like functions. These warnings are caused by the fact that the format specifier types indicate a very specific type of input value, such as an unsigned int. There are many cases where C will convert a value to the expected type without issuing warning messages, but many times C will issue a warning message. The easiest way to fix these warning messages it to add a length specification to the format specifier.

The following is a list of C data types, and the format length specifier required for that type.

char: hhd, hhx, or hho for decimal, hex, and octal respectively.
short: hd, hx, or ho for decimal, hex, and octal, respectively.
int: d, x, or o for decimal, hex, and octal, respecitively.
long int: ld, lx, or lo for decimal, hex, and octal, respectively.

Lengths are not required for floating point numbers because all floating point numbers are converted to doubles before they are used by printf, and the conversion to double can occur without warning messages.

Format String References

This web page covers the basics, but there's a lot more about format string and format specfications out there. For more detail, here are some recommended web sites to look up:

Wikipedia printf format string: A great general overview with lots of detail
Linux manual page: printf: The UNIX manual page that describes printf and format strings.
w3resource C printf() tutorial: A complete tutorial that covers printf and format strings.
Alvin Alexander's printf cheat sheet: Nicely formattted to quickly find that printf capability you want to use.

Contents: