|
|
|
Writing Bug-Free C Code
A Programming Style That Automatically Detects Bugs in C Code
by Jerry Jongerius / January 1995
|
|
|
|
Chapter 3: Rock-Solid Base
Would you build a skyscraper without a proper, solid foundation? Of
course not. Would you build a large application without a
rock-solid base system? I wouldn't, and yet I get the feeling that
this is happening every day. Do you consider the standard C library
to be a rock-solid base? Before you answer this, I need to clarify
what I consider to be rock-solid.
A rock-solid function must first of all be bug-free itself. The
function must provide a clean, intuitive interface. What hope would
you ever have if you were constantly making mistakes in how a
function is called? Function names must clearly state what the
function does. What good is a function named DoIt()? The function
must detect and report invalid function arguments. The function
should be fault-tolerant. The program you are working on should not
crash simply because you called a function incorrectly.
|
Before you can write bug-free code, you must have
a bug-free, rock-solid base.
|
Do you now consider the standard C library to be a rock-solid base?
Many functions are, but many functions are not. Consider the heap
management routines in the C library, specifically, the free()
function. The free() function deallocates a block of memory that
was previously allocated through the malloc() function call.
What happens when you pass free() a completely random value, or a
value that you have already previously passed to free()? Your
program may bomb immediately. If it doesn't, the heap may be
corrupted. If it isn't, some memory may have been overwritten. The
point is that not all C library functions are rock-solid. Why not
first code a layer on top of the system base that is rock-solid?
|
Not all C library functions are rock-solid. A layer that is
rock-solid needs to be coded.
|
As you code your program, you need to consider the current program as
it stands as the base for whatever new features you are putting in.
Once done, this is the new base for the next set of features. As
you code, make sure that the current base is rock-solid; that it is
fault-tolerant and that it catches incorrect usage of functions. If
the base is not rock-solid, you need to make it rock-solid.
3.1 System Functions Contain Bugs
Your underlying operating system or development environment has bugs
in it. Since there is no such thing as a completely bug-free
system, try to find out as much as you can about the environment you
are working on. Try to obtain bug lists if they are available.
Microsoft has recently started the Microsoft Developer Network. It
is a program that is intended to get as much information and
technical resources as possible into the hands of developers. The
program distributes information in the form of a CD and a
Windows-based browser. I highly recommend this program to
developers. The Microsoft Developer Network can be reached
through the Internet at
http://msdn.microsoft.com
|
3.1.1 A MS-DOS Bug
An example of a bug that was found in the MS-DOS operating system
involves lost clusters. It was detected by an
automated testing procedure §7.24.
As the automated testing function would
run, invariably it would eventually run out of disk space, but the
function never created any file large enough to even come close to
running out of disk space. As it turned out, disk clusters were
being lost by the operating system on a regular basis and eventually
the disk would run out of free clusters. The only way to recover
the lost clusters was to run the CHKDSK program provided by the
operating system.
This bug is a confirmed bug in MS-DOS 3.2 and MS-DOS 3.3, the only
versions available when the bug was discovered. The bug does not
exist in MS-DOS 5.0 and later versions.
C program that shows MS-DOS version 3.3 lost cluster bug
#include <fcntl.h>
#include <sys\types.h>
#include <sys\stat.h>
#include <io.h>
#include <stdio.h>
int main(void)
{
char c;
int fh = creat( "it.tmp", S_IREAD|S_IWRITE );
lseek( fh, 81920, SEEK_SET );
write( fh, &c, 1 );
chsize( fh, 81920 );
lseek( fh, 122880, SEEK_SET );
write( fh, &c, 1 );
close( fh );
return 0;
} /* main */
|
Running this program repeatedly under MS-DOS 3.2 or MS-DOS 3.3 will
cause the lost cluster bug. Run CHKDSK to recover the lost clusters.
|
3.1.2 Windows Bugs
As any early Windows developer will tell you, Windows was a buggy
operating system. Over the course of five years, I accumulated over
six three-ring binders full of correspondence with Microsoft support
concerning bugs in the Windows system.
With the introduction of Windows 3.0, I noticed a significant drop in
the number of problems that I was reporting. I attribute this
reduction to the fact that Windows 3.0 was a protected-mode
operating system (previously it was not). Many types of errors
cause a CPU fault and the error can be pinpointed immediately.
However, the Windows 3.0 system was still a little shaky when it
came to DOS boxes, the emulation of a MS-DOS PC in a Windows window.
With the introduction of Windows 3.1, the number of bugs that I
report has dropped dramatically. I consider Windows 3.1 to be a
stable operating environment. In fact, on my development machine, I
stay in Windows all day and use DOS boxes to run my MS-DOS
applications. My machine may crash once a week. Not too bad, but
still room for improvement.
However, there is a particularly nasty bug I have found that still
exists in Windows 3.1. Luckily, the likelihood that you will ever
encounter the bug is almost nil. The bug is that sometimes the
Windows multitasker does not switch to the correct PSP when a
Windows application performs disk I/O. What this means is that a
file handle that should be correct is not, because the program is in
the open file context of another application!
A little background may be needed to explain why this can even occur.
When Windows multitasks applications, it has to keep a tremendous
amount of information around for each application. One piece of
that information is the open files context. In order to
significantly speed up task switching in Windows, the open files
context is not switched until the task actually performs an
operation that requires the open files context. Reading and writing
a disk file is an example of an operation that would cause a true
open files context switch. Since most applications rarely perform
disk I/O relative to the amount of CPU time they need, this delayed
context switch of the open files context speeds up multitasking.
The bug occurs when, in the course of multitasking applications, the
Windows kernel incorrectly thinks that it is truly in the open files
context of the currently running application, when, in fact, it is
in the open files context of another application. Microsoft has
been slow to fix this problem since the circumstances that cause the
bug to occur are extreme.
The only reason I found this bug in the first place is because of an
automated testing procedure I used to test the correctness of a new
module. This procedure caused almost continuous I/O to occur and
every once in a great while, an I/O would fail under low memory
conditions.
A Windows 3.1 bug. There is another bug that I found that anyone can
reproduce easily on any Windows 3.1 machine. First, run the
standard Notepad application. Go into the file open dialog and
click with the mouse on the OK button. Now press and hold down the
space bar. While continuing to hold down the space bar, press the
ESC key. This causes a General Protection Fault. The problem has
to do with the dialog manager inside the Windows kernel. The fault
occurs because both an IDOK and an IDCANCEL are being sent to the
dialog callback procedure when in reality, only one message should
be sent. This is a problem with all dialogs in all applications.
However, the application may or may not fault. It just depends upon
how the application was written to respond to the IDOK and IDCANCEL
messages. This bug is fixed in Windows 3.11.
|
3.1.3 What to Do
The point in demonstrating to you that bugs do exist in MS-DOS and
Windows is to emphasize that sometimes even system level functions
fail. Code that you think never fails is bound to fail sometime.
My reaction to having system level functions fail me has been to
provide another layer of code between my application code and the
system level functions that checks for assumptions that I am making.
At some point, you write code that makes an assumption. Consider
the following code.
Code with an assumption
int fh = open(....);
if (fh!=-1) {
close(fh);
}
|
Do you see what the assumption is? The assumption being made in this
code is that the close() function never fails. Well then, why not
assert this? The close() returns zero for success, otherwise
non-zero for failure.
|
Provide a code wrapper around all system calls.
|
The best solution is to provide a wrapper function around each and
every system call. Assert any assumptions that are being made
within this wrapper function. Placing the assert in the wrapper
function once instead of every place it is being called is a lot
less error prone.
3.2 Using Macros to Aid Porting
With all the different machine architectures that are in use today,
how in the world do you write code so that it can be ported easily?
C provides an excellent mechanism for conditional compilation, but
this is only a small part of the solution.
How do you handle segmented versus non-segmented architectures? What
about C and C++? There are slight differences between the two
languages.
One solution that works really well is to abstract out the
interdependencies between the environments into a set of macros so
that the code base does not have to change.
|
Use macros as an aid to porting so that your code base does not
change at all.
|
3.2.1 Segmented/Flat Architecture Macros
A number of #defines that provide a basis for further development are
as follows.
Porting aids for Microsoft C8 segmented architecture
#define FAR _far
#define NEAR _near
#define FASTCALL _fastcall
#define PASCAL _pascal
#define EXPORT _export
#define BASEDIN(seg) _based(_segname(#seg))
Porting aids for flat model programs
#define FAR
#define NEAR
#define FASTCALL
#define PASCAL
#define EXPORT
#define BASEDIN(seg)
|
FAR and NEAR. These are used to abstract out the near- and
far-segmented architecture. NEAR implies a 16-bit pointer and FAR
implies a 32-bit pointer. When porting to a non-segmented
architecture, these can be defined to be nothing.
FASTCALL and PASCAL. These are used to specify the calling
convention of a function primarily for optimization purposes. When
porting to a non-segmented architecture, these can be defined to be
nothing.
EXPORT. This define is applicable to Windows DLL programming.
Otherwise, it can be defined to be nothing.
BASEDIN. This define is primarily used by the CSCHAR macro to place
character strings within a code segment primarily for optimization
purposes. When porting to a non-segmented architecture, it can be
defined to be nothing.
In most cases, these macros are used in other macros or in typedef's
so that the code base is not cluttered up. For example, to declare
a far pointer to a char so that it works equally well under a
segmented and non-segmented architecture, you could do the following.
Using a far pointer to a char, not a good idea
char FAR*lpMem;
|
However, using char FAR* will just clutter up all the source files
with FAR. The solution is to use a typedef to declare what a far
pointer to a char is once.
LPSTR typedef, a better idea
typedef char FAR*LPSTR;
|
Now LPSTR should be used instead of char FAR*. The concept of trying
to hide how something works provides an abstraction that aids
porting and allows for clean source code.
|
Try to hide how something works to provide an abstraction that aids
|
3.2.2 Using EXTERNC
If you are coding under C++, name mangling can sometimes be a
problem. This happens under Windows DLL coding when a .def file is
used. Functions that are exported must be specified in the .def
file, but name mangling can make it almost impossible to type in the
names manually. Luckily, C++ provides a solution in the form of a
linkage specification. The #define's you can use are as follows.
EXTERNC macro
/*--- EXTERNC depends upon C/C++ ---*/
#ifdef __cplusplus
#define EXTERNC extern "C"
#else
#define EXTERNC
#endif
|
Under C++, EXTERNC gets defined to be extern "C". Under C, EXTERNC
gets defined to be nothing. EXTERNC is used in function prototypes
as follows.
Using EXTERNC
EXTERNC type APIENTRY FunctionName( argument-types );
|
You need to use EXTERNC only in the prototype for a function, not in
the source code where the function is actually written. This is how
Microsoft C8 works.
3.3 Using WinAssert() Statements
The WinAssert() statement is the classic assertion statement with a
few twists. Why use assertion statements? The key reason is to
verify that decisions and assumptions made at design-time are
working correctly at run-time. There is a difference between
WinAssert() and
CompilerAssert() §2.1.4.
Both check design-time assumptions, but CompilerAssert() provides
the check at compile-time, whereas WinAssert() provides the check
at run-time.
|
Assertion statements provide run-time checking of design-time
assumptions.
|
Microsoft C8 assert() macro, do not use
#define assert(exp) \
( (exp) ? (void) 0 : _assert(#exp, __FILE__, __LINE__) )
|
An assert macro is provided by the Microsoft C8 library in assert.h.
It takes any boolean expression. Nothing happens when the assert
macro is true. If the assert macro is false, however, the _assert()
function is called with a string pointer to the text of the boolean
expression, a string pointer to the text of the source file and an
integer line number where the error occurred. Usage of the
stringizing operator (#) is described in
§2.2.7
. This information
is then formatted and displayed by _assert().
A problem with this macro is that it ends up placing too many strings
in the default data segment. One easy solution is to remove the
#exp argument, which is turning the boolean expression into text.
After all, the file and line number are all that are needed to look
up the boolean expression. Also, every time the assert() macro is
used, a new string __FILE__ is created. Some compilers are able to
optimize these multiple references into one reference, but why not
just fix the problem? My solution to the problem is to declare a
short stub function at the top of each source file which references
__FILE__. WinAssert() then calls this stub function with the
current line number.
There is an additional problem that is specific to the Windows
programming environment. In Windows DLLs, it is possible to declare
a function that, when called, does not switch to the DLLs data
segment, but instead keeps the current caller's data segment. If
you were to use an assertion statement in one of these functions,
the string pointer to the filename would be incorrect. The solution
is to declare the __FILE__ string to be a code segment
(CSCHAR §2.1.8)
variable. This way, it does not matter what data
segment is current.
|
An interesting twist that has been added to WinAssert() is that it
supports writing code that is fault-tolerant. If a design-time
assumption has failed, should you really be executing a section of
code? I say no! The WinAssert() statement may be followed by a
semicolon or by a block of code. The block of code will be executed
only if the assertion succeeds.
|
Use WinAssert() on a block of code to produce code that is
fault-tolerant.
|
WinAssert(), non-fault-tolerant syntax
WinAssert(expression);
WinAssert(), fault-tolerant syntax
WinAssert(expression) {
(block of code)
}
|
The WinAssert() is implemented through a set of macros as follows.
WinAssert() implementation
#define USEWINASSERT CSCHAR szSRCFILE[]=__FILE__; \
BOOL static NEAR _DoWinAssert( int nLine ) { \
ReportWinAssert(szSRCFILE, nLine); \
WinAssert(nLine); \
return(FALSE); \
}
#define AssertError _DoWinAssert(__LINE__)
#define WinAssert(exp) if (!(exp)) {AssertError;} else
|
If WinAssert() is used in a source file, USEWINASSERT must appear at
the top of the source file somewhere.
In addition to the WinAssert() macro, an AssertError() macro is
provided for those times that you want to unconditionally force an
error to be reported.
The reporting process of an assertion failure starts by calling a
function that is local (private) to the source file. The function
is _DoWinAssert() and the argument is the line number where the
failure occurred. The body of _DoWinAssert() is straightforward
except for the inclusion of WinAssert(nLine). Since the line number
is never zero, this appears to have no purpose. This trick forces
_DoWinAssert() to be compiled into the module, even if there are no
references to the function in the rest of the file. Otherwise,
Microsoft C8 removes the unreferenced function.
Another subtle problem is that if _DoWinAssert() is declared to be a
LOCAL function (described in Chapter 6),
the optimizing Microsoft C8
compiler will not build a stack frame for this function. For this
reason, it has the static NEAR attributes instead of the LOCAL
attribute, which allows the stack frame to be built.
In addition to these defines, WinAssert() requires that
ReportWinAssert() be defined somewhere. I define it in a DLL so
that the function needs to be coded only once.
ReportWinAssert() function prototype
EXTERNC void APIENTRY ReportWinAssert( LPSTR, int );
|
Once done, any other application has access to it. ReportWinAssert()
allows you to display the assertion error in whatever way is
appropriate at your organization. In my ReportWinAssert(), I log
the filename, line number and stack trace to a log file and issue a
system modal message box requesting that the user report the error.
See §A.7 for example implementations.
Using WinAssert()
#include <app.h>
USEWINASSERT
.
.
.
void LOCAL TestingWinAssert( int nValue )
{
WinAssert(nValue>0) {
...
}
} /* TestingWinAssert */
|
One of the key things you must remember is that the argument to
WinAssert() must have absolutely no side effect on any variables.
It must only reference variables.
WinAssert(), used incorrectly
WinAssert((x/=2)>0);
WinAssert(), used correctly
x /= 2;
WinAssert(x>0);
|
This is in case a policy of removing assertion statements from the
code before releasing the product is enforced. While I do not
recommend that you remove assertion statements, you still want to
play it safe. You do not want to end up accidentally removing code
that is needed to make your program run correctly.
3.4 Naming Conventions
One of the most important aspects of programming is the use of a
consistent naming convention. Without one, your program ends up
being just a jumble of various techniques and hence hard to
understand. With a naming convention, your program is more readable
and easier to understand and maintain.
I will describe the naming conventions that I have used to code a
large application that have worked quite well for me.
3.4.1 Naming Macros
Macro names should always be in uppercase and may contain optional
underscore characters. For macros that take arguments, I prefer not
to use the underscore character anywhere (e.g., NUMSTATICELS()).
For macros that define constant numeric values, underscore
characters are OK (e.g., MAX_BUFFER_SIZE).
|
Macro names should be in uppercase.
|
Macro names in uppercase stand out and draw attention to where they
are located. Some macros that are universally used throughout
almost all code are allowed to be in mixed upper- and lowercase. An
example of a macro like this is the
WinAssert() macro §3.3.
There are many times that a set of macros contain a common
subexpression. When this happens, I create another macro that
contains the common sub-expression. The sole purpose of this type
of macro is that it is to be used by other macros and not in the
source code. A naming convention I use to help me remember that the
macro is private to other macros is to name it with a leading
underscore character.
|
Macros beginning with an underscore are to be used only in other
macros, not explicitly in source code.
|
3.4.2 Naming Data Types
I can remember the difficulty I had coming up with good data type
names when I first started to code. I was using mixed upper- and
lowercase for data type names and variable names. However, it
became harder and harder to read the program. I always ended up
wanting the variable name to be spelled the same as the data type
name but could not do this, so I ended up calling it something
different which made the program hard to understand.
The convention that I finally settled upon is that all data types
should be in uppercase. The variable names can then be spelled the
same, but in mixed upper- and lowercase. This convention may at
first seem awkward, but in practice I have found that it works well.
|
New data types should be in uppercase.
|
Data type names should also avoid using the underscore character.
This is because macro names may use the underscore character and it
is best to avoid any possible confusion or ambiguity over whether or
not an uppercase name is a data type name or macro name.
3.4.3 Declaring Data Types
New data types must be declared with a typedef statement. While it
is possible to use a macro to create what looks like a new data
type, it is not a true data type and is subject to subtle coding
problems.
|
New data types must be declared with a typedef statement, not a macro
definition.
|
Consider the data type PSTR, shown below, which is a character
pointer.
Using typedef to create a new data type
typedef char*PSTR;
Using macros to create a new data type, a bad practice
#define PSTR char*
Using the new PSTR data type
PSTR pA, pB;
|
In the above example, what is the type of pA and what is the type of
pB? In the case of using typedef to create the new data type, the
type of pA and pB is a character pointer, which is as expected.
However, in the case of using the macro to create the new data type,
the type of pA is a character pointer and the type of pB is a
character. This is because PSTR pA, pB really represents char *pA,
pB which is not the same as char *pA, *pB.
This example shows the danger in using macros to declare new data
types in the system. Therefore, you should avoid using macros to
declare new data types.
3.4.4 Naming Variables
All variables should be named using the Hungarian variable naming
convention with mixed upper- and lowercase text and no underscore
characters.
|
Variables should be named using Hungarian notation.
|
The Hungarian naming convention states that you should prefix all
variable names with a short lowercase abbreviation for the data type
of the variable name. (See Table 3-1).
| Prefix | Data Type |
| a | array of given type |
| b | BOOL: true/false value |
| by | BYTE |
| c | char |
| dw | DWORD |
| h | handle or abstract pointer |
| l | long |
| lp | long pointer |
| n | int |
| p | pointer |
| w | WORD |
Table 3-1: Hungarian Notation Prefixes
For example, nSize is an integer, bOk is a BOOL and hIcon is an
abstract handle. Prefixes may be combined to produce a more
descriptive prefix. An example would be lpnCount, which is a long
pointer to an integer and lpanCounts, which is a long pointer to an
array of integers.
The advantage of Hungarian notation is that you are much more likely
to catch a simple programming problem early in the coding cycle,
even before you compile. An example would be nNewIndex = lIndex+10.
Just by glancing at this you can see that the left-hand side is an
integer and the right-hand side is a long integer. This may or may
not be what you intended, but the fact that this can be deduced
without seeing the original data declarations is a powerful concept.
|
Hungarian notation allows you to know a variable's data type without
seeing the data declaration.
|
The Hungarian notation handles all built-in data types, but what
about derived types? A technique that I have found useful is to
select an (uppercase) data type name that has a natural mixed upper-
and lowercase name.
An example from the Windows system is the HICON
data type and the hIcon variable name. As another example, let's
consider a queue entry data type called LPQUEUEENTRY. A variable
name for this could be lpQueueEntry.
|
This convention works great for short data type names like HICON, but
not so well for long data type names like LPQUEUEENTRY. The
resulting variable name lpQueueEntry is just too long to be
convenient. In this case, an abbreviation like lpQE should be used.
However, make sure that lpQE is not also an abbreviation for
another data type in your system.
Whatever technique you use to derive variable names from data type
names is fine provided that there is only one derivation technique
used in your entire program. A bad practice would be to use lpQE in
one section of code and lpQEntry in another section of code.
|
A data type must have a single and unique variable derivation.
|
3.4.5 Naming Functions
Functions should be named using the module/verb/noun convention in
mixed upper- and lowercase text with no underscore characters.
|
Functions should be named using the module/verb/noun convention.
|
Suppose you have just started a project from scratch. There is only
one source file and the number of functions in it is limited. You
are naming functions whatever you feel like and coding is
progressing rapidly. Two months go by and you are working on a new
function that needs to call a specialized memory copy routine you
wrote last month. You start to type in the function name, but then
you hesitate. Did you call the function CopyMem() or MemCopy()?
You do not remember, so you look it up real quick.
This actually happened to me and the solution was simple. Follow the
Microsoft Windows example of naming functions using the verb/noun or
action/object technique. So, the function should have been called
CopyMem().
This solved my immediate problem, but not the long-term problem. It
wasn't long before I had thousands of function names, some with
similar sounding verb/noun names. My solution was to prefix the
verb/noun with the module name.
Suppose you have a module that interfaces with the disk operation
system of your environment. An appropriate module name would be Dos
and several possible function names are DosOpenFile(), DosRead(),
DosWrite() and DosCloseFile().
Module names should contain two to five characters, but an optimum
length for the module name is three to four characters. You can
almost always come up with a meaningful abbreviation for a module
that fits in three to four characters.
3.5 Chapter Summary
- A rock-solid layer needs to be built upon all system level interfaces
because system level interfaces contain bugs. You cannot assume
that they are bug-free. I have been burned too many times to trust
system level calls blindly.
- When you do make assumptions in calling system code, it is best to
WinAssert() these situations in a code wrapper. When a function
does fail and you assumed that it never would, you want to know
about it so that you can fix the problem and reevaluate your
assumption.
- Macros can be used as an abstraction layer to allow your code to be
ported without having to change your source files. Instead, just
change the macros contained in your include files and recompile.
The macro name in the source file is specifying what to do. The
macro body in the include file is specifying how to do it.
- It is a good idea to use WinAssert()'s generously in your code. The
WinAssert() provides run-time checking of design-time assumptions
and signals a design flaw when it occurs. Use the fault-tolerant
form of WinAssert() whenever possible.
- When programming in a large project, it is crucial that a consistent
naming convention be used throughout the entire project. This helps
prevent misunderstandings that produce buggy code.
- Macro names should be in uppercase. A macro beginning with an
underscore character is intended to be used only in other macros,
not explicitly in source code.
- New data types should be in uppercase and declared with a typedef
statement, not a macro definition.
- Variables should be named using the Hungarian notation. This
notation allows you to know the type of a variable without seeing
the data declaration.
- Functions should be named using the module/verb/noun convention.
Copyright © 1993-1995, 2002-2009 Jerry Jongerius
This book was previously published by Person Education, Inc.,
formerly known as Prentice Hall. ISBN: 0-13-183898-9
|
|