Writing Bug-Free C Code: Rock-Solid Base

Writing Bug-Free C Code
A Programming Style That Automatically Detects Bugs in C Code
by Jerry Jongerius / January 1995

<< Previous

Index

Next >>

Chapter 3: Rock-Solid Base

3.1 System Functions Contain Bugs
3.2 Using Macros to Aid Porting
3.3 Using WinAssert() Statements

3.4 Naming Conventions
3.5 Chapter Summary

Would you build a skyscraper without a proper, solid foundation? Of course not. Would you build a large application without a rock-solid base system? I wouldn't, and yet I get the feeling that this is happening every day. Do you consider the standard C library to be a rock-solid base? Before you answer this, I need to clarify what I consider to be rock-solid.

A rock-solid function must first of all be bug-free itself. The function must provide a clean, intuitive interface. What hope would you ever have if you were constantly making mistakes in how a function is called? Function names must clearly state what the function does. What good is a function named DoIt()? The function must detect and report invalid function arguments. The function should be fault-tolerant. The program you are working on should not crash simply because you called a function incorrectly.

Before you can write bug-free code, you must have a bug-free, rock-solid base.

Do you now consider the standard C library to be a rock-solid base? Many functions are, but many functions are not. Consider the heap management routines in the C library, specifically, the free() function. The free() function deallocates a block of memory that was previously allocated through the malloc() function call.

What happens when you pass free() a completely random value, or a value that you have already previously passed to free()? Your program may bomb immediately. If it doesn't, the heap may be corrupted. If it isn't, some memory may have been overwritten. The point is that not all C library functions are rock-solid. Why not first code a layer on top of the system base that is rock-solid?

Not all C library functions are rock-solid. A layer that is rock-solid needs to be coded.

As you code your program, you need to consider the current program as it stands as the base for whatever new features you are putting in. Once done, this is the new base for the next set of features. As you code, make sure that the current base is rock-solid; that it is fault-tolerant and that it catches incorrect usage of functions. If the base is not rock-solid, you need to make it rock-solid.

3.1 System Functions Contain Bugs

Your underlying operating system or development environment has bugs in it. Since there is no such thing as a completely bug-free system, try to find out as much as you can about the environment you are working on. Try to obtain bug lists if they are available.

Microsoft has recently started the Microsoft Developer Network. It is a program that is intended to get as much information and technical resources as possible into the hands of developers. The program distributes information in the form of a CD and a Windows-based browser. I highly recommend this program to developers. The Microsoft Developer Network can be reached through the Internet at http://msdn.microsoft.com

3.1.1 A MS-DOS Bug

An example of a bug that was found in the MS-DOS operating system involves lost clusters. It was detected by an automated testing procedure §7.24. As the automated testing function would run, invariably it would eventually run out of disk space, but the function never created any file large enough to even come close to running out of disk space. As it turned out, disk clusters were being lost by the operating system on a regular basis and eventually the disk would run out of free clusters. The only way to recover the lost clusters was to run the CHKDSK program provided by the operating system.

This bug is a confirmed bug in MS-DOS 3.2 and MS-DOS 3.3, the only versions available when the bug was discovered. The bug does not exist in MS-DOS 5.0 and later versions.

C program that shows MS-DOS version 3.3 lost cluster bug
#include <fcntl.h>
#include <sys\types.h>
#include <sys\stat.h>
#include <io.h>
#include <stdio.h>

int main(void)
{
    char c;
    int fh = creat( "it.tmp", S_IREAD|S_IWRITE );
    lseek( fh, 81920, SEEK_SET );
    write( fh, &c, 1 );
    chsize( fh, 81920 );
    lseek( fh, 122880, SEEK_SET );
    write( fh, &c, 1 );
    close( fh );
    return 0;

} /* main */

Running this program repeatedly under MS-DOS 3.2 or MS-DOS 3.3 will cause the lost cluster bug. Run CHKDSK to recover the lost clusters.

3.1.2 Windows Bugs

As any early Windows developer will tell you, Windows was a buggy operating system. Over the course of five years, I accumulated over six three-ring binders full of correspondence with Microsoft support concerning bugs in the Windows system.

With the introduction of Windows 3.0, I noticed a significant drop in the number of problems that I was reporting. I attribute this reduction to the fact that Windows 3.0 was a protected-mode operating system (previously it was not). Many types of errors cause a CPU fault and the error can be pinpointed immediately. However, the Windows 3.0 system was still a little shaky when it came to DOS boxes, the emulation of a MS-DOS PC in a Windows window.

With the introduction of Windows 3.1, the number of bugs that I report has dropped dramatically. I consider Windows 3.1 to be a stable operating environment. In fact, on my development machine, I stay in Windows all day and use DOS boxes to run my MS-DOS applications. My machine may crash once a week. Not too bad, but still room for improvement.

However, there is a particularly nasty bug I have found that still exists in Windows 3.1. Luckily, the likelihood that you will ever encounter the bug is almost nil. The bug is that sometimes the Windows multitasker does not switch to the correct PSP when a Windows application performs disk I/O. What this means is that a file handle that should be correct is not, because the program is in the open file context of another application!

A little background may be needed to explain why this can even occur. When Windows multitasks applications, it has to keep a tremendous amount of information around for each application. One piece of that information is the open files context. In order to significantly speed up task switching in Windows, the open files context is not switched until the task actually performs an operation that requires the open files context. Reading and writing a disk file is an example of an operation that would cause a true open files context switch. Since most applications rarely perform disk I/O relative to the amount of CPU time they need, this delayed context switch of the open files context speeds up multitasking.

The bug occurs when, in the course of multitasking applications, the Windows kernel incorrectly thinks that it is truly in the open files context of the currently running application, when, in fact, it is in the open files context of another application. Microsoft has been slow to fix this problem since the circumstances that cause the bug to occur are extreme.

The only reason I found this bug in the first place is because of an automated testing procedure I used to test the correctness of a new module. This procedure caused almost continuous I/O to occur and every once in a great while, an I/O would fail under low memory conditions.

A Windows 3.1 bug. There is another bug that I found that anyone can reproduce easily on any Windows 3.1 machine. First, run the standard Notepad application. Go into the file open dialog and click with the mouse on the OK button. Now press and hold down the space bar. While continuing to hold down the space bar, press the ESC key. This causes a General Protection Fault. The problem has to do with the dialog manager inside the Windows kernel. The fault occurs because both an IDOK and an IDCANCEL are being sent to the dialog callback procedure when in reality, only one message should be sent. This is a problem with all dialogs in all applications. However, the application may or may not fault. It just depends upon how the application was written to respond to the IDOK and IDCANCEL messages. This bug is fixed in Windows 3.11.

3.1.3 What to Do

The point in demonstrating to you that bugs do exist in MS-DOS and Windows is to emphasize that sometimes even system level functions fail. Code that you think never fails is bound to fail sometime.

My reaction to having system level functions fail me has been to provide another layer of code between my application code and the system level functions that checks for assumptions that I am making. At some point, you write code that makes an assumption. Consider the following code.

Code with an assumption
int fh = open(....);
if (fh!=-1) {
    close(fh);
    }

Do you see what the assumption is? The assumption being made in this code is that the close() function never fails. Well then, why not assert this? The close() returns zero for success, otherwise non-zero for failure.

Provide a code wrapper around all system calls.

The best solution is to provide a wrapper function around each and every system call. Assert any assumptions that are being made within this wrapper function. Placing the assert in the wrapper function once instead of every place it is being called is a lot less error prone.

3.2 Using Macros to Aid Porting

With all the different machine architectures that are in use today, how in the world do you write code so that it can be ported easily? C provides an excellent mechanism for conditional compilation, but this is only a small part of the solution.

How do you handle segmented versus non-segmented architectures? What about C and C++? There are slight differences between the two languages.

One solution that works really well is to abstract out the interdependencies between the environments into a set of macros so that the code base does not have to change.

Use macros as an aid to porting so that your code base does not change at all.

3.2.1 Segmented/Flat Architecture Macros

A number of #defines that provide a basis for further development are as follows.

Porting aids for Microsoft C8 segmented architecture
#define FAR _far
#define NEAR _near
#define FASTCALL _fastcall
#define PASCAL _pascal
#define EXPORT _export
#define BASEDIN(seg) _based(_segname(#seg))


Porting aids for flat model programs
#define FAR
#define NEAR
#define FASTCALL
#define PASCAL
#define EXPORT
#define BASEDIN(seg)

FAR and NEAR. These are used to abstract out the near- and far-segmented architecture. NEAR implies a 16-bit pointer and FAR implies a 32-bit pointer. When porting to a non-segmented architecture, these can be defined to be nothing.

FASTCALL and PASCAL. These are used to specify the calling convention of a function primarily for optimization purposes. When porting to a non-segmented architecture, these can be defined to be nothing.

EXPORT. This define is applicable to Windows DLL programming. Otherwise, it can be defined to be nothing.

BASEDIN. This define is primarily used by the CSCHAR macro to place character strings within a code segment primarily for optimization purposes. When porting to a non-segmented architecture, it can be defined to be nothing.

In most cases, these macros are used in other macros or in typedef's so that the code base is not cluttered up. For example, to declare a far pointer to a char so that it works equally well under a segmented and non-segmented architecture, you could do the following.

Using a far pointer to a char, not a good idea
char FAR*lpMem;

However, using char FAR* will just clutter up all the source files with FAR. The solution is to use a typedef to declare what a far pointer to a char is once.

LPSTR typedef, a better idea
typedef char FAR*LPSTR;

Now LPSTR should be used instead of char FAR*. The concept of trying to hide how something works provides an abstraction that aids porting and allows for clean source code.

Try to hide how something works to provide an abstraction that aids

3.2.2 Using EXTERNC

If you are coding under C++, name mangling can sometimes be a problem. This happens under Windows DLL coding when a .def file is used. Functions that are exported must be specified in the .def file, but name mangling can make it almost impossible to type in the names manually. Luckily, C++ provides a solution in the form of a linkage specification. The #define's you can use are as follows.

EXTERNC macro
/*--- EXTERNC depends upon C/C++ ---*/
#ifdef __cplusplus
#define EXTERNC extern "C"
#else
#define EXTERNC
#endif

Under C++, EXTERNC gets defined to be extern "C". Under C, EXTERNC gets defined to be nothing. EXTERNC is used in function prototypes as follows.

Using EXTERNC
EXTERNC type APIENTRY FunctionName( argument-types );

You need to use EXTERNC only in the prototype for a function, not in the source code where the function is actually written. This is how Microsoft C8 works.

3.3 Using WinAssert() Statements

The WinAssert() statement is the classic assertion statement with a few twists. Why use assertion statements? The key reason is to verify that decisions and assumptions made at design-time are working correctly at run-time. There is a difference between WinAssert() and CompilerAssert() §2.1.4. Both check design-time assumptions, but CompilerAssert() provides the check at compile-time, whereas WinAssert() provides the check at run-time.

Assertion statements provide run-time checking of design-time assumptions.

Microsoft C8 assert() macro, do not use
#define assert(exp) \
  ( (exp) ? (void) 0 : _assert(#exp, __FILE__, __LINE__) )

An assert macro is provided by the Microsoft C8 library in assert.h. It takes any boolean expression. Nothing happens when the assert macro is true. If the assert macro is false, however, the _assert() function is called with a string pointer to the text of the boolean expression, a string pointer to the text of the source file and an integer line number where the error occurred. Usage of the stringizing operator (#) is described in §2.2.7 . This information is then formatted and displayed by _assert().

A problem with this macro is that it ends up placing too many strings in the default data segment. One easy solution is to remove the #exp argument, which is turning the boolean expression into text. After all, the file and line number are all that are needed to look up the boolean expression. Also, every time the assert() macro is used, a new string __FILE__ is created. Some compilers are able to optimize these multiple references into one reference, but why not just fix the problem? My solution to the problem is to declare a short stub function at the top of each source file which references __FILE__. WinAssert() then calls this stub function with the current line number.

There is an additional problem that is specific to the Windows programming environment. In Windows DLLs, it is possible to declare a function that, when called, does not switch to the DLLs data segment, but instead keeps the current caller's data segment. If you were to use an assertion statement in one of these functions, the string pointer to the filename would be incorrect. The solution is to declare the __FILE__ string to be a code segment (CSCHAR §2.1.8) variable. This way, it does not matter what data segment is current.

An interesting twist that has been added to WinAssert() is that it supports writing code that is fault-tolerant. If a design-time assumption has failed, should you really be executing a section of code? I say no! The WinAssert() statement may be followed by a semicolon or by a block of code. The block of code will be executed only if the assertion succeeds.

Use WinAssert() on a block of code to produce code that is fault-tolerant.

WinAssert(), non-fault-tolerant syntax
WinAssert(expression);


WinAssert(), fault-tolerant syntax
WinAssert(expression) {
    (block of code)
    }

The WinAssert() is implemented through a set of macros as follows.

WinAssert() implementation
#define USEWINASSERT CSCHAR szSRCFILE[]=__FILE__;   \
  BOOL static NEAR _DoWinAssert( int nLine ) {      \
    ReportWinAssert(szSRCFILE, nLine);              \
    WinAssert(nLine);                               \
    return(FALSE);                                  \
    }
#define AssertError _DoWinAssert(__LINE__)
#define WinAssert(exp) if (!(exp)) {AssertError;} else

If WinAssert() is used in a source file, USEWINASSERT must appear at the top of the source file somewhere.

In addition to the WinAssert() macro, an AssertError() macro is provided for those times that you want to unconditionally force an error to be reported.

The reporting process of an assertion failure starts by calling a function that is local (private) to the source file. The function is _DoWinAssert() and the argument is the line number where the failure occurred. The body of _DoWinAssert() is straightforward except for the inclusion of WinAssert(nLine). Since the line number is never zero, this appears to have no purpose. This trick forces _DoWinAssert() to be compiled into the module, even if there are no references to the function in the rest of the file. Otherwise, Microsoft C8 removes the unreferenced function.

Another subtle problem is that if _DoWinAssert() is declared to be a LOCAL function (described in Chapter 6), the optimizing Microsoft C8 compiler will not build a stack frame for this function. For this reason, it has the static NEAR attributes instead of the LOCAL attribute, which allows the stack frame to be built.

In addition to these defines, WinAssert() requires that ReportWinAssert() be defined somewhere. I define it in a DLL so that the function needs to be coded only once.

ReportWinAssert() function prototype
EXTERNC void APIENTRY ReportWinAssert( LPSTR, int );

Once done, any other application has access to it. ReportWinAssert() allows you to display the assertion error in whatever way is appropriate at your organization. In my ReportWinAssert(), I log the filename, line number and stack trace to a log file and issue a system modal message box requesting that the user report the error. See §A.7 for example implementations.

Using WinAssert()
#include <app.h>
USEWINASSERT
  .
  .
  .
void LOCAL TestingWinAssert( int nValue )
{
    WinAssert(nValue>0) {
        ...
        }

} /* TestingWinAssert */

One of the key things you must remember is that the argument to WinAssert() must have absolutely no side effect on any variables. It must only reference variables.

WinAssert(), used incorrectly
WinAssert((x/=2)>0);


WinAssert(), used correctly
x /= 2;
WinAssert(x>0);

This is in case a policy of removing assertion statements from the code before releasing the product is enforced. While I do not recommend that you remove assertion statements, you still want to play it safe. You do not want to end up accidentally removing code that is needed to make your program run correctly.

3.4 Naming Conventions

One of the most important aspects of programming is the use of a consistent naming convention. Without one, your program ends up being just a jumble of various techniques and hence hard to understand. With a naming convention, your program is more readable and easier to understand and maintain.

I will describe the naming conventions that I have used to code a large application that have worked quite well for me.

3.4.1 Naming Macros

Macro names should always be in uppercase and may contain optional underscore characters. For macros that take arguments, I prefer not to use the underscore character anywhere (e.g., NUMSTATICELS()). For macros that define constant numeric values, underscore characters are OK (e.g., MAX_BUFFER_SIZE).

Macro names should be in uppercase.

Macro names in uppercase stand out and draw attention to where they are located. Some macros that are universally used throughout almost all code are allowed to be in mixed upper- and lowercase. An example of a macro like this is the WinAssert() macro §3.3.

There are many times that a set of macros contain a common subexpression. When this happens, I create another macro that contains the common sub-expression. The sole purpose of this type of macro is that it is to be used by other macros and not in the source code. A naming convention I use to help me remember that the macro is private to other macros is to name it with a leading underscore character.

Macros beginning with an underscore are to be used only in other macros, not explicitly in source code.

3.4.2 Naming Data Types

I can remember the difficulty I had coming up with good data type names when I first started to code. I was using mixed upper- and lowercase for data type names and variable names. However, it became harder and harder to read the program. I always ended up wanting the variable name to be spelled the same as the data type name but could not do this, so I ended up calling it something different which made the program hard to understand.

The convention that I finally settled upon is that all data types should be in uppercase. The variable names can then be spelled the same, but in mixed upper- and lowercase. This convention may at first seem awkward, but in practice I have found that it works well.

New data types should be in uppercase.

Data type names should also avoid using the underscore character. This is because macro names may use the underscore character and it is best to avoid any possible confusion or ambiguity over whether or not an uppercase name is a data type name or macro name.

3.4.3 Declaring Data Types

New data types must be declared with a typedef statement. While it is possible to use a macro to create what looks like a new data type, it is not a true data type and is subject to subtle coding problems.

New data types must be declared with a typedef statement, not a macro definition.

Consider the data type PSTR, shown below, which is a character pointer.

Using typedef to create a new data type
typedef char*PSTR;


Using macros to create a new data type, a bad practice
#define PSTR char*


Using the new PSTR data type
PSTR pA, pB;

In the above example, what is the type of pA and what is the type of pB? In the case of using typedef to create the new data type, the type of pA and pB is a character pointer, which is as expected. However, in the case of using the macro to create the new data type, the type of pA is a character pointer and the type of pB is a character. This is because PSTR pA, pB really represents char *pA, pB which is not the same as char *pA, *pB.

This example shows the danger in using macros to declare new data types in the system. Therefore, you should avoid using macros to declare new data types.

3.4.4 Naming Variables

All variables should be named using the Hungarian variable naming convention with mixed upper- and lowercase text and no underscore characters.

Variables should be named using Hungarian notation.

The Hungarian naming convention states that you should prefix all variable names with a short lowercase abbreviation for the data type of the variable name. (See Table 3-1).

Prefix Data Type

a array of given type

b BOOL: true/false value

by BYTE

c char

dw DWORD

h handle or abstract pointer

l long

lp long pointer

n int

p pointer

w WORD

Table 3-1: Hungarian Notation Prefixes

For example, nSize is an integer, bOk is a BOOL and hIcon is an abstract handle. Prefixes may be combined to produce a more descriptive prefix. An example would be lpnCount, which is a long pointer to an integer and lpanCounts, which is a long pointer to an array of integers.

The advantage of Hungarian notation is that you are much more likely to catch a simple programming problem early in the coding cycle, even before you compile. An example would be nNewIndex = lIndex+10. Just by glancing at this you can see that the left-hand side is an integer and the right-hand side is a long integer. This may or may not be what you intended, but the fact that this can be deduced without seeing the original data declarations is a powerful concept.

Hungarian notation allows you to know a variable's data type without seeing the data declaration.

The Hungarian notation handles all built-in data types, but what about derived types? A technique that I have found useful is to select an (uppercase) data type name that has a natural mixed upper- and lowercase name.

An example from the Windows system is the HICON data type and the hIcon variable name. As another example, let's consider a queue entry data type called LPQUEUEENTRY. A variable name for this could be lpQueueEntry.

This convention works great for short data type names like HICON, but not so well for long data type names like LPQUEUEENTRY. The resulting variable name lpQueueEntry is just too long to be convenient. In this case, an abbreviation like lpQE should be used. However, make sure that lpQE is not also an abbreviation for another data type in your system.

Whatever technique you use to derive variable names from data type names is fine provided that there is only one derivation technique used in your entire program. A bad practice would be to use lpQE in one section of code and lpQEntry in another section of code.

A data type must have a single and unique variable derivation.

3.4.5 Naming Functions

Functions should be named using the module/verb/noun convention in mixed upper- and lowercase text with no underscore characters.

Functions should be named using the module/verb/noun convention.

Suppose you have just started a project from scratch. There is only one source file and the number of functions in it is limited. You are naming functions whatever you feel like and coding is progressing rapidly. Two months go by and you are working on a new function that needs to call a specialized memory copy routine you wrote last month. You start to type in the function name, but then you hesitate. Did you call the function CopyMem() or MemCopy()? You do not remember, so you look it up real quick.

This actually happened to me and the solution was simple. Follow the Microsoft Windows example of naming functions using the verb/noun or action/object technique. So, the function should have been called CopyMem().

This solved my immediate problem, but not the long-term problem. It wasn't long before I had thousands of function names, some with similar sounding verb/noun names. My solution was to prefix the verb/noun with the module name.

Suppose you have a module that interfaces with the disk operation system of your environment. An appropriate module name would be Dos and several possible function names are DosOpenFile(), DosRead(), DosWrite() and DosCloseFile().

Module names should contain two to five characters, but an optimum length for the module name is three to four characters. You can almost always come up with a meaningful abbreviation for a module that fits in three to four characters.

3.5 Chapter Summary

A rock-solid layer needs to be built upon all system level interfaces because system level interfaces contain bugs. You cannot assume that they are bug-free. I have been burned too many times to trust system level calls blindly.
When you do make assumptions in calling system code, it is best to WinAssert() these situations in a code wrapper. When a function does fail and you assumed that it never would, you want to know about it so that you can fix the problem and reevaluate your assumption.
Macros can be used as an abstraction layer to allow your code to be ported without having to change your source files. Instead, just change the macros contained in your include files and recompile. The macro name in the source file is specifying what to do. The macro body in the include file is specifying how to do it.
It is a good idea to use WinAssert()'s generously in your code. The WinAssert() provides run-time checking of design-time assumptions and signals a design flaw when it occurs. Use the fault-tolerant form of WinAssert() whenever possible.
When programming in a large project, it is crucial that a consistent naming convention be used throughout the entire project. This helps prevent misunderstandings that produce buggy code.
Macro names should be in uppercase. A macro beginning with an underscore character is intended to be used only in other macros, not explicitly in source code.
New data types should be in uppercase and declared with a typedef statement, not a macro definition.
Variables should be named using the Hungarian notation. This notation allows you to know the type of a variable without seeing the data declaration.
Functions should be named using the module/verb/noun convention.