Writing Bug-Free C Code: General Tips

Writing Bug-Free C Code
A Programming Style That Automatically Detects Bugs in C Code
by Jerry Jongerius / January 1995

<< Previous

Index

Next >>

Chapter 7: General Tips

7.1 Design It Right the First Time
7.2 Good Design Beats an Optimizing Compiler
7.3 Evolutionary Coding
7.4 Set Goals
7.5 Code What to Do, not How to Do It
7.6 Virtually No Global Variables
7.7 Loop Variables
7.8 Use the Highest Compiler Warning Level
7.9 Use "static" to Localize Knowledge
7.10 Place Variables in the Block Needed
7.11 Arrays on the Stack
7.12 Pointers Contain Valid Addresses or NULL
7.13 Avoid Type Casting Whenever Possible
7.14 Use sizeof() on Variables, not Types

7.15 Avoid Deeply Nested Blocks
7.16 Keep Functions Small
7.17 Releasing Debugging Code in the Product
7.18 Stack Trace Support
7.19 Functions Have a Single Point of Exit
7.20 Do Not Use the Goto Statement
7.21 Write Bulletproof Functions
7.22 Writing Portable Code
7.23 Memory Models
7.24 Automated Testing Procedures
7.25 Documentation Tools
7.26 Source-Code Control Systems
7.27 Monochrome Screen
7.28 Techniques for Debugging Timing Sensitive Code

No matter how good your tools that help you detect bugs in your programs, the goal of every programmer should be to avoid bugs in the first place. Debugging tools serve a purpose, but relying on the tools to catch your programming mistakes does not make you a better programmer. What makes you a better programmer is learning from your mistakes.

A key to writing bug-free code is learning from your mistakes.

How do you rate your own ability to produce bug-free code? Whatever your answer is, how did you rate yourself last year? How do you expect to rate yourself next year?

The point of this exercise is to stress to you the importance of improving your coding techniques year after year. The goal is to be able to write more code next year than last year and to write it with fewer bugs.

7.1 Design It Right the First Time

If there is one thing that I have learned over the years that I would emphasize to a programmer just starting out, it is that old code rarely dies because it just stays around and around. This is especially true in large projects. And all successful small projects end up turning into large projects.

As time goes on in a large project, more and more code layers are built upon existing code layers. It does not take long before coding is no longer being done to the operating system level or windowing environment layer but to the layers that were coded on top of these system layers.

Now suppose that a year later someone discovers that the implementation of a low-level layer is causing performance problems. All too often management decides to live with the performance problem in favor of using their programmer's time putting new features into the product. You are forced by management to live with a lot of the coding decisions you make over the years, so your decisions had better be good!

Management may allow you to re-engineer a module to make it better and faster, but consider the lost time. The time that is attributed to the module is the original design time plus the time to re-engineer the module. Designing a module right the first time saves time.

It is not feasible to continually re-engineer old modules. If this does happen due to poor design decisions made upfront, a project will come to a halt.

Designing a module right the first time saves time.

My advice is to design a module right the first time because you'll rarely get the chance to re-engineer the module.

7.2 Good Design Beats an Optimizing Compiler

An optimizing compiler helps a program run faster, but in all cases, a good design makes a program run fast. A great design produces the fastest program. A little more time spent on a programming problem generally results in a better design, which can make a program run significantly faster.

Over the last few years, I think all of us at one time or another saw a well-known product upgrade getting hammered by the trade press for how sluggishly the upgrade performed. Do not let this happen to your product.

And by all means, evaluate your competition. If your product is two to three times slower than the competition and this comes out in a magazine review, do you think potential customers will buy your product?

7.3 Evolutionary Coding

How do you code a module? Do you spend days working feverishly on a module and have it all come together that last half day? Do you code one piece of a module, moving on to the next piece only when you are certain that the piece you just wrote is working?

Over the years I've tried both techniques and I can tell you from experience that coding a module a piece at a time is much easier. It also helps isolate bugs. If you wait until the end to put all the pieces together, where is the problem? By coding one piece at a time, the problem is more than likely in the piece you are working on and not in some piece you have already finished and tested.

7.3.1 Build a Framework

Start coding a module by building a framework that is plugged into a system almost immediately. The goal in creating this framework is to get the module completely stubbed out so that it compiles.

Let's use the Dos module as an example and assume that the module interface has already been designed. The first step in creating the framework is to create the global include file. Next, create a source file that contains all of the sections of a module, but none of the guts. Namely, create the module and APIENTRY comment blocks without any comments, declare the class without any members and write all the APIENTRY functions with no code in the function bodies.

At this point, I place return statements in all functions, so that each function can return an error value. For example, I have DosOpenFile() and DosCloseFile() return NULL and DosRead() and DosWrite() return zero.

The framework is now complete. Compile the module and correct any errors that show up. Although no real code has been written, the framework provides you with a clear goal.

7.3.2 Code the Create and Destroy Methods

After the framework is compiling successfully, you are ready to start implementing the module. Where should you start? The functions that I always tackle first are the create and destroy methods. This allows me to write some simple code that uses and tests the module.

In the case of the Dos module, the DosOpenFile() and DosCloseFile() functions are implemented first, followed by some test code. This test code allows verification that (1) opening an existing file works, (2) trying to open a file that does not exist fails and (3) calling DosCloseFile() works properly and frees allocated memory.

Most modules are not as simple as the Dos module was in determining the data items to add to the class structure. This is fine. Simply fill in the class structure with whatever data items are required by the create and destroy method function. When implementing other method functions, add data items to the class structure as they are needed.

7.3.3 Code the Other Methods

Once the create and destroy methods are working correctly, it is time to start implementing the other method functions. The best strategy is to implement a method function, compile it and then test it by writing test code.

In the case of the Dos module, you may decide to implement DosRead() first. After doing so, you could then create a test file with a known data set and attempt to have the test code read this data set. This helps validate the DosRead() code. Likewise for the DosWrite() function. Write out some information to a file and verify that it was indeed written.

The order in which you code the method functions is totally up to you. After you gain experience using this technique, you will end up picking an order that allows for test code to be written easily.

7.4 Set Goals

Goal setting is always important to accomplishing any task you set out to do, but it is especially important that programmers have clearly defined goals. The problem is that all too often a programmer gets sucked into a project that is 90 percent done and weeks or months later the project is still only 90 percent done.

7.4.1 The 90 Percent Done Syndrome

The problem with programming without a clear well-defined goal is that a project is not completed.

Programming without a goal is like a sailboat without a sail. You drift.

I have fallen into this trap myself. You work on items that relate to the ninety percent that is already done. You may have found a new feature to add and so you work on it, but the feature does not help you complete the project. Or you have found a better way to implement an algorithm, so you spend time recoding the algorithm. While finding a better algorithm is great, it only delays the project from being completed.

The module framework helps you set a clearly defined goal. Once the framework is in place, all the APIENTRY functions are stubbed out, just waiting to be completed. Once these APIENTRY functions are completely written and tested, you have reached your goal and the module is finished.

7.5 Code What to Do, not How to Do It

The goal of this technique is to avoid spreading knowledge about how something is done throughout the entire project. By moving this knowledge to one function, you are isolating the knowledge. Now any code that calls this function is specifying what must be done, leaving the how to the function.

Never reinvent the wheel. Always code what to do, not how to do it.

It is hard to discipline yourself to use this technique, but the payback is well worth the extra effort.

Code that uses this technique becomes shorter and more compact. What used to take ten lines now takes five. This leads to a program that is easier to code, maintain and change.

A good analogy is building blocks. You are building something starting from scratch so you start out by making a few blocks. You use these building blocks to make the something. Now let's say that you want to build something else. The key is that you don't start totally from scratch because you've already built some of the basic building blocks.

Always keep your eyes open for a new building block to implement.

7.6 Virtually No Global Variables

In an object-based system, global variables (variables known to more than one source file) are almost never needed. The reasoning is simple. Objects, by design, are created and destroyed by method functions, so the objects themselves are never static. An advantage of object-based systems that dynamically allocate objects is that they are never limited by some predefined number. Instead, they are limited only by the amount of memory that is available.

Global variables are rarely referenced by the method functions of an object because method functions act upon an object, not global data. Remember that an object's handle is passed in as the first argument to a method function.

So why are global variables ever needed? Sometimes to directly support a class of objects. A prime example is if all objects of a class require access to a read-only resource. When creating the first object for this class, the resource is read into memory and a handle to the resource is stored in a global variable. When destroying the last object for this class, the memory associated with the resource is freed.

Global variables to support individual object instances should not be allowed. Global variables that support a class of objects as a whole are permitted. Because these global variables support a class, their use is limited to the source file that declares the class. This can be enforced by using static on these variables, ensuring that the variables are not visible in other source files.

7.7 Loop Variables

Variables used to control for loops should not be used outside the loop itself. The LOOP() macro (see §2.2.5) was designed to enforce this rule.

Loop variables should not be used outside the loop itself.

This rule makes code easier to maintain. While you are writing a function, you understand it thoroughly and are unlikely to make a mistake in using the loop variable after the loop has exited. However, when you or one of your coworkers modifies the loop next year, the assumptions under which the looping variable is used could easily be invalidated.

It is better to be safe than sorry.

7.8 Use the Highest Compiler Warning Level

Today's compilers are pretty smart, but only if you tell them to be. By default, most compilers use a low warning level and use more advanced error checking only if instructed to do so. This is almost always done to remain compatible with older code. You can imagine the number of support calls a compiler vendor would get if the old code that used to compile fine all of a sudden started producing a lot of warning messages under a new release of the compiler.

For more information on setting the compiler warning level in Microsoft C8, see §2.1.3.

7.9 Use "static" to Localize Knowledge

Using static can sometimes be confusing because it appears to mean different things in different contexts. A simple rule that helps clear things up is that using static assigns permanent storage to an object and limits the visibility of the object to the scope in which it is defined. There are three basic ways in which static can be used.

Static function declaration. You have already seen static used in this case with the LOCAL macro (see §6.6.7) Since a function already has permanent storage, this part of the rule is redundant. The scope of a function is file scope, so static limits the visibility of the function to the rest of the source file in which it is defined.

External static variables. When a variable is defined at file scope, it already has permanent storage and is visible to all source files. Using static makes the variable invisible to other source files. Again, the variable definition already has permanent storage, so this part of the rule is redundant.

Internal static variables. When a variable is defined within a function, it is called an automatic variable. Automatic variables have no persistence across function call invocations. In other words, if a value is assigned to an automatic variable in a block, that value is lost when the block is exited. In almost all cases, this is the desired behavior. However, using static assigns permanent storage to the variable so that a value assigned to the variable is not lost. The scope of the variable is also limited to the block in which it is defined. An internal static variable is just like an external static variable except that the visibility of the variable is limited to a block.

It is also important to understand how an initialized static variable behaves. Consider the following code.

An example in using static
void Testing( void )
{
    LOOP(3) {
        static int nTest1=100;
        int nTest2=100;
        printf( "nTest1=%d, nTest2=%d\n", nTest1, nTest2 );
        ++nTest1;
        ++nTest2;
        } ENDLOOP
/* Testing */


Output from calling Testing() three times
nTest1=100, nTest2=100
nTest1=101, nTest2=100
nTest1=102, nTest2=100
nTest1=103, nTest2=100
nTest1=104, nTest2=100
nTest1=105, nTest2=100
nTest1=106, nTest2=100
nTest1=107, nTest2=100
nTest1=108, nTest2=100

As you can see from this example, nTest1 is initialized only once, while nTest2 is initialized on each pass through the block. The rule is that the initialization of any static variables takes place at compile-time and that the initialization of automatic variables takes place at run-time.

Static variables are initialized once at compile-time. Automatic variables are initialized as needed at run-time.

7.10 Place Variables in the Block Needed

How many times have you tracked down a bug only to realize that you used an automatic variable before it was properly initialized, or used the variable well after it should have been, when it no longer contained an appropriate value? While this may not happen to you as you write the function, it becomes a lot more likely when you go back to the code at a later date and modify it.

The solution to this problem is to define variables only in the innermost scope in which they are needed. A new scope is created any time you use a begin brace {. The scope is terminated by an ending brace }. This means that your if statements, while statements, and so on. create new scopes. Variables defined within a scope are visible only within that scope. As soon as the scope ends, so does your access to the variable. Consider the following code fragment.

Limiting the scope of a variable
LOOP(strlen(pString)) {
    char c=pString[loop];
    ...
    } ENDLOOP

In this example, c is visible only within the loop. As soon as the loop exits, c is longer visible. In fact, it can be defined and reused in another scope.

Define variables in the scope in which they are needed.

In standard C, variables can be defined only at the beginning of a scope before the main body of code. In C++, variables can be defined wherever a statement is valid.

A useful #define for both C and C++ is the NewScope define.

NewScope define
#define NewScope

NewScope is a syntactical place holder (defined to be nothing) that allows a new scope to be introduced into a program. It also allows for a natural indentation style.

Using NewScope
void APIENTRY Function( args )
{
    /*--- Comment ---*/
    (code block)
    /*--- Using NewScope ---*/
    NewScope {
        type var;
        (code block that uses var)
        }

} /* Function */

NewScope is useful in both C and C++ because it allows variables to be created that are private to a block. As soon as the block exits, the variables declared in the block are no longer visible and cannot be referenced.

7.11 Arrays on the Stack

When using arrays that are declared on the stack, you must be careful not to return a pointer to one of these arrays back to the calling function. Consider the following example.

A program with a subtle bug
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

char *myitoa( int nNumber )
{
    char buffer[80];
    sprintf( buffer, "%d", nNumber );
    return (buffer);
}

int main(void)
{
    printf( "Number = %s\n", myitoa(234) );
    return 0;
}

This program contains a subtle bug in that myitoa() is returning the address of buffer, an array on the stack. It is subtle because despite this bug, the program still works properly!

It is a bug to return the address of an array on the stack back to a calling function because after the function returns, the array is now in the part of the stack that no function owns. The problem with this bug is that the code will work unless the array gets overwritten by making another function call, although, making a function call is no guarantee that buffer will be overwritten.

This bug is exactly like keeping a pointer around a block of memory that was freed. Unless the memory is reused and overwritten, accessing an object through the memory pointer will continue to work. As soon as the block of memory gets overwritten, you have a subtle problem to track down.

7.12 Pointers Contain Valid Addresses or NULL

The reasoning behind this is simple. It helps find bugs. If all pointers contain either a valid address or NULL, then accessing the valid address does not cause the program to fault. However, accessing the NULL pointer under most protected environments causes the program to fault and you have found your bug.

Consider what would happen if you have a pointer to a memory object, the memory object is freed and the pointer is not set to NULL. The pointer still contains the old address, an address to an invalid object. However, the address itself under most environments is still valid and accessing the memory does not cause your program to fault. This is a problem and the source of a lot of bugs.

The reason for being able to access the memory of a freed object is that most memory managers simply add the memory back into the pool of free memory. Rarely do they actually mark this memory as being inaccessible to the program. It is time-consuming to communicate with the operating system on every allocation and deallocation request. Instead, a memory pool is maintained. Only when this pool is exhausted does the heap manager ask the operating system for more memory.

To help enforce this policy, macros that interface with the memory manager should be designed and used exclusively. An example of this is NEWOBJ() or FREE() for class objects (see §4.5).

7.13 Avoid Type Casting Whenever Possible

Type casting, by design, bypasses the type checking of the compiler. In essence, type casting tells the compiler that you know what you are doing and not to complain about it. The problem with this is that your environment may change, you may be porting the software to another platform, changing the memory model of the program or upgrading to a new revision of the compiler.

For whatever reason, your environment has changed. When you recompile your program, you run the risk of missing warning messages. This is because the behavior of the statement in which you are using the type cast may have changed, but the type cast masks the behavior change.

There is another situation that is especially true in mixed-model programming under Windows. To make matters worse, Microsoft's own sample code is a bad example because it is littered with totally unnecessary type casts. Consider the following code fragment.

An example of bad type casts
MSG msg;
while (GetMessage((LPMSG)&msg, NULL, 0, 0)) {
    TranslateMessage((LPMSG)&msg);
    DispatchMessage((LPMSG)&msg);
    }

Microsoft Windows programmers recognize this code as the main message loop for an application. All messages that are placed in an application's message queue are dispatched by this message loop.

The problem with this code is that all three type casts to LPMSG are totally unnecessary. The code works great without the type casts. The prototypes for GetMessage(), TranslateMessage() and DispatchMessage() all indicate that they take LPMSG as an argument. The data type of &msg is PMSG due to the mixed-model environment. I can only suppose that the programmer thought that PMSG must be type cast into what the functions expected, an LPMSG. This is simply not the case. In mixed-model programming, the compiler promotes a near pointer to the object to a far pointer to the object in all cases that a far pointer to the object is expected. In other words, the compiler is implicitly performing the type cast for you.

7.13.1 Mixed-Model Programming Implicit Type Cast Warning

In mixed-model programming there exists a subtle problem if you write code that allows NULL to be passed through as one of the argument pointers. Consider the following code.

GetCpuSpeed(), demonstrating implicit type cast problem
int APIENTRY GetCpuSpeed( LPSTR lpBuffer )
{
    int nSpeed=(calculation);
    if (lpBuffer) {
        (fill buffer with text description of speed)
        }
    return (nSpeed);

} /* GetCpuSpeed */

GetCpuSpeed() always returns the CPU speed as an int, but as an option it also creates a text description of the CPU speed in the provided buffer if the buffer pointer is non-NULL. Now what happens when you call GetCpuSpeed()?

Calling GetCpuSpeed()
PSTR pBuffer=NULL;
int nSpeed1=GetCpuSpeed(NULL);
int nSpeed2=GetCpuSpeed(pBuffer);

In both cases you want only the integer speed and not the text description. In the first case, GetCpuSpeed(NULL) behaves as expected. However, in the second case, GetCpuSpeed(pBuffer) fills in a text buffer. The problem is that pBuffer is a near pointer and that GetCpuSpeed() expects a far pointer. No matter what value is contained in pBuffer, it is considered a valid pointer and the type cast of a near pointer to a far pointer (implicitly by the compiler or explicitly by you) uses the data segment value as the segment for the far pointer.

In other words, when the pBuffer near pointer is converted to a far pointer, the offset is NULL, but the segment value is non-NULL.

From experience, I have found that correctly writing code in situations like this is too problematic. My solution to this problem has been to move away from mixed-model pointers and stick with far pointers.

7.14 Use sizeof() on Variables, not Types

How do you use sizeof() in your code? Do you typically use sizeof() with a variable name or a data type? While at first glance the distinction may not seem to matter that much, at a deeper level it matters a lot. Consider the following example.

Using sizeof(), a bad example
int nVar;
...
DumpHex( &nVar, sizeof(int) );

DumpHex() is a general purpose routine that will dump out an arbitrary byte range of memory. The first argument is a pointer to a block of memory and the second argument is the number of bytes to dump.

Can you spot a possible problem in this example? The sizeof() in this example is operating on the int data type and not on the variable nVar. What if nVar needs to be changed in the future to a long data type? Well, sizeof(int) would have to be changed to sizeof(long). A better way to use sizeof() is as follows.

Using sizeof(), a good example
int nVar;
...
DumpHex( &nVar, sizeof(nVar) );

In this new example, sizeof() now operates on nVar. This allows DumpHex() to work correctly no matter what the data type of nVar is. If the type of nVar changes, we will not have to hunt down in the code where the old data type was explicitly used.

7.15 Avoid Deeply Nested Blocks

There are times when, for any number of reasons, you end up writing code that is deeply nested. Consider the following example.

Deeply nested code
void DeepNestFunction( void )
{
    if (test1) {
        (more code)
        if (test2) {
            (more code)
            if (test3) {
                (more code)
                if (test4) {
                    (more code)
                    }
                }
            }
        }

} /* DeepNestFunction */

While this nesting is only four deep, I've had times when it would have gone ten deep. When nesting gets too deep, the code becomes harder to read and understand. There are two basic solutions to this problem.

Unroll the tests. The first solution is to create a boolean variable that maintains the current success or failure status and to constantly retest it as follows.

Unrolling the deep nesting
void UnrollingDeepNesting( void )
{
    BOOL bVar=(test1);
    if (bVar) {
        (more code)
        bVar = (test2);
        }
    if (bVar) {
        (more code)
        bVar = (test3);
        }
    ...

} /* UnrollingDeepNesting */

Call another function. The second solution is to package the innermost tests into another function and to call that function instead of performing the tests directly.

7.16 Keep Functions Small

The primary reason to keep functions small is that it helps you manage and understand a programming problem better. If you have to go back to the code a year later to modify it, you may have forgotten the small details and have to relearn how the code works. It sure helps if functions are small.

As a general rule, try to keep functions manageable by restricting their length to one page. Most of the time functions are smaller than a page and sometimes they are a page or two. Having a function that spans five pages is unacceptable.

As a general rule, try to keep functions under one page.

If a function starts to get too large, step back a moment and try to break the function into smaller functions. The functions should make sense on their own. Remember to treat a function as a method that transitions an object from one valid state to another valid state. Try to come up with well-defined, discrete actions and write functions that perform these actions.

7.17 Releasing Debugging Code in the Product

Is there such a thing as a debug build and a retail build of your product? Should there be? No, I do not think so! Let me explain why. I believe that any released application should be able to be thrown into debug mode on the fly at any time.

In the applications that I develop, I have a global boolean variable called bDebugging that is either FALSE or TRUE. I place what I consider to be true debugging code within an if statement that checks bDebugging. This is usually done for debugging code that adds a lot of execution overhead. For debug code that does not add much overhead, I just include the code and do not bother with bDebugging.

The benefit of doing this is that there is only one build of your product. That way, if a customer is running into a severe problem with your product, you can instruct the customer how to run your product in debug mode and quite possibly find the problem quickly.

I do not consider WinAssert() and VERIFY() to be debugging code. In the programs that I write, WinAssert() and VERIFY() are not switchable by bDebugging. Instead, they are always on. The reasoning is simple. Would you like to know a bug's filename and line number in your program or would you just like to know that your program crashed somewhere, location unknown?

WinAssert() and VERIFY() are not debugging code.

If you object to the users of your product seeing assertion failures and run-time object verification failures, I recommend that you instead silently record the error in a log file. By doing this, you will have some record of what went wrong in the program when the customer calls you.

In a program that ships with WinAssert() and VERIFY() on, the program alerts the user to the exact filename and line number of a problem. If the fault-tolerant syntax is used, the program continues to run. Oftentimes, just knowing that a program failed at this one spot is enough to scrutinize that section of the code and find the problem.

It is important that the fault-tolerant forms of WinAssert() and VERIFY() be used. Doing so ensures that the program continues to run after a fault.

The fault-tolerant forms of WinAssert() and VERIFY() should always be used.

Sometimes a filename and line number are not enough to track down a problem. At times like these, a stack trace is highly desirable.

7.18 Stack Trace Support

A key debugging tool that I use for tracking down problems in my code is utilizing stack trace dumps from a fault. Sometimes only knowing the filename and line number of a fault is not enough to track down a problem. In these cases, the call history that led up to the problem is often enough.

For example, I once had a program that was faulting at a specific filename and line number. This code was examined thoroughly but no problem could be found. So, a stack trace of the fault was obtained from the customer, which assisted me in pinpointing the problem immediately. As it turned out, a newly added feature had caused a reentrancy problem to occur in old code.

Most development environments today provide sophisticated tools that allow the developer to quickly pinpoint problems in their code. What do you do when a customer calls up with a fault that you cannot reproduce? The customer is certainly not running the development environment that you are running.

My solution to this problem is to add full stack trace capabilities into the application itself. At every point in the stack trace, a filename and line number are obtained.

I build stack trace support into my applications.

Unfortunately, the solution is specific to the underlying environment, so I cannot give a general solution, but I will do my best to describe the technique that I use.

7.18.1 Implementing Stack Trace Support

Obtaining filename and line number information. The most important piece of information to which access is needed is debugging filename and line number information. Under a Microsoft development environment, obtaining this information is done in two steps. The first step is to tell the compiler to generate line number information. Under Microsoft C8, this is done with the /Zd command line option which results in the .obj files containing the line number information. The second step is instructing the Microsoft segmented executable linker to produce a .map file. The /map command line option is used.

Translate the filename and line number information. The .map file contains the filename and line number information in a human readable form. A program needs to be written that takes this .map text information and translates it into a form that is easily readable by the stack trace code.

Walking the Stack. This is the toughest part because it is so specific to the platform that you are using. If walking the stack is not provided as a service by the operating system, you may want to consider walking the stack yourself. This is what I do. Luckily, Windows now provides stack walking support through the TOOLHELP.DLL system library. For Windows, the functions of interest are StackTraceFirst(), StackTraceCSIPFirst() and StackTraceNext().

Mapping addresses to filename and line numbers. As you walk the stack, the only piece of information available to you is an address. Somehow you need to map this back to the information you stored in the binary file representation of the .map file. Again, this is specific to the environment you are working on. Under 16-bit protected-mode Windows, far addresses are really composed of a selector and offset. The trick is to map the selector back to a segment number because the segment number is what is specified in the .map file. This is done in two steps. The first step is to map the selector to a global memory handle by using GlobalHandle(). The second step is then to map this global memory handle to a segment number by calling GlobalEntryHandle(). Both functions are provided by TOOLHELP.DLL. You can now look up the filename and line number.

You now have superior stack trace support built right into your application. It is superior because the stack trace gives filename and line numbers instead of the hex offsets usually given by system level stack traces.

7.18.2 Enhancements

If you implement stack trace support in your application, I have some enhancements to suggest to you. I would highly recommend that you first get the basic stack trace support working before tackling these enhancements.

Hooking CPU faults. In protected memory environments, if your program accesses memory that does not belong to it, the program faults and the operating system halts the program. If possible, try to hook into this fault and produce a stack trace! For Windows, TOOLHELP.DLL provides the InterruptRegister() and InterruptUnRegister() functions that allow programs to hook their own faults. This requires some assembly language programming.

Hooking parameter errors. Under Windows, the kernel is performing error checks on the parameters being passed to Windows API calls. It is possible to hook into the bad parameter notification chain. This is done by using TOOLHELP.DLL, which provides NotifyRegister() and NotifyUnRegister().

Displaying function arguments. As you walk the stack, try to parse what arguments were passed to the function along with the filename and line number. This is tricky but it is doable and well worth the effort. Most faults that cause stack traces are caused by an invalid argument in some function call. Spotting this in the stack trace then becomes easy.

7.18.3 The Benefits

I have implemented full stack trace support along with hooking CPU faults, hooking Windows kernel parameter errors and displaying function arguments. What are the benefits? Great customer relations! In most cases, a stack trace is enough to track down a problem. In other words, I can track down a problem without first having to reproduce the problem. Customers begin to trust that a reported problem will get fixed and you end up with a robust product that the customer begins to trust and rely upon.

7.19 Functions Have a Single Point of Exit

This has more to do with writing functions that are easily maintainable than anything else. If a function has one entry point, a flow of control and one exit point, the function is easier to understand than a function with multiple exit points.

It also helps eliminate buggy code because using a return in the middle of a function implies an algorithm that does not have a straightforward flow of control. The algorithm should be redesigned so that there is only one exit point.

In a sense, a return in the middle of a function is just like using a goto statement. Instead of transferring control back to the caller at the end of the function, control is being passed back from the middle of the function.

7.20 Do Not Use the Goto Statement

I agree with the majority opinion that goto statements should be avoided. Functions with goto statements are hard to maintain.

7.21 Write Bulletproof Functions

Who is responsible for making sure that a function gets called and used properly? Is it up to the programmer? Or is it up to the function that gets called? Let's face it, programmers make mistakes. So anything that can be done on the part of the function to ensure that it is being used properly aids the programmer in finding problems in the program.

Consider a GetTextOfMonth() function. It takes as an argument a month, zero through eleven inclusive, and returns a long pointer to a three-character string description of the month. A naturally simple solution is as follows.

GetTextOfMonth() function, no error checking
LPSTR APIENTRY GetTextOfMonth( int nMonth )
{
    CSCHAR TextOfMonths[12][4] = {
        "Jan", "Feb", "Mar", "Apr", "May", "Jun",
        "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
        };
    return (TextOfMonths[nMonth]);

} /* GetTextOfMonth */

The only problem with this code is what happens when the input nMonth is not in the proper range of zero to eleven? The returned pointer points to something, but if treated as a string, it is more than likely much longer than a three-character string. If this string is being used in a printf() statement, the resultant buffer has a high likelihood of being overwritten, trashing memory beyond the end of the buffer and causing even more problems that need to be tracked down.

The solution is to make GetTextOfMonth() completely bulletproof so that any value passed into it returns a pointer to a three-character string. One possible solution is as follows.

GetTextOfMonth() function, with error checking
LPSTR APIENTRY GetTextOfMonth( int nMonth )
{
    CSCHAR szBADMONTH[]="???";
    LPSTR lpMonth=szBADMONTH;
    WinAssert((nMonth>=0) && (nMonth<12)) {
        CSCHAR TextOfMonths[12][4] = {
            "Jan", "Feb", "Mar", "Apr", "May", "Jun",
            "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
            };
        lpMonth = TextOfMonths[nMonth];
        }
    return (lpMonth);

} /* GetTextOfMonth */

Notice how a fault-tolerant form of WinAssert() is being used. This ensures that a full stack trace is logged if the input parameter is invalid.

Should this fault-tolerant code ever be removed? Once you get the code working that uses GetTextOfMonth(), you know there are no bugs, right? No, I do not think so! Do you know that your entire program is bug-free? There could be a bug in some totally unrelated part of the program that is causing a memory overwrite. If it just happens to overwrite a month number that you have stored in memory, you are in big trouble once again. Or what happens when you go back a year latter to modify the code that uses GetTextOfMonth()? You may introduce a subtle bug.

The best way to write a bug-free program is to keep all the defenses up at all times. At least this way, you will know when there is a problem in your program.

You may not know what the problem is, but just knowing that there is a bug is important information for maintaining a quality product.

The best way to write a bug-free program is to keep all the defenses up at all times.

7.22 Writing Portable Code

C is so successful because it is so flexible, flexible, that is, to the compiler writer because many key issues are left to the compiler writer to specify how they should work. This was done so that each implementation of C could take advantage of how a particular machine architecture works.

For example, what is the sign of the remainder upon integer division? How many bytes are there in an int or long or short? Are members of a structure padded to an alignment boundary? Does a zero-length file actually exist? What is the ordering of bytes within an int, long or short?

Most compilers provide a chapter or two in their documentation on how they have implemented these and many more implementation-defined behaviors.

If writing portable code is important to you, I would suggest that you thoroughly read these chapters and adopt programming methodologies that avoid implementation dependent behavior.

7.23 Memory Models

Due to the segmented architecture of the Intel CPU, compiler vendors provide the option of creating a program in one of four basic memory models.

The small memory model. This model allows for less than 64K of data and less than 64K of code. This model is great for quick and dirty utility programs. It also produces the fastest program since there is never any need to reload a segment register.

The compact memory model. This model allows for more than 64K of data and less than 64K of code. It is ideal for small programs that crunch through a lot of data. The program is still fast, but there is a slight speed penalty for accessing far data.

The medium memory model. This memory model allows for less than 64K of data and more than 64K of code. This is the memory model that Microsoft recommends using for programming in Windows. It allows for lots of code and a small amount of data. By using mixed-model programming, you can gain access to more than 64K of data.

The large memory model. This allows for more than 64K of data and more than 64K of code. It is the memory model that most closely matches flat memory model architectures.

These memory models are complicated by the fact that there is something called mixed-model programming. The four basic memory models essentially provide default code and data pointer attributes. A code or data pointer is either a 16-bit near pointer or a 32-bit segment/offset pointer. These near and far attributes are only defaults. Mixed-model programming allows the near/far attributes to be specified on a pointer-by-pointer basis.

My advice to you is to use the large memory model, unless you are ultra concerned with speed. The industry is moving away from segmented architectures toward flat memory model programming, where there are no segments.

Use the large memory model. This will aid porting to flat memory model architectures.

By using the large memory model now, you ease the eventual porting of your software to the flat memory model.

7.24 Automated Testing Procedures

An automated testing procedure is a function that is designed to automatically test your code for you -- code that you think is already bug-free. A key part of most testing procedures is their use of random number generator class objects.

Let's suppose that you have just implemented a B-Tree disk-based storage heap for fast access to your database. How are you going to really test it? You could code examples that use the B-Tree class in order to test edge conditions. This is a good idea anyway, but what do you code next?

A solution that I have found useful and highly effective is to use a random number generator to create data sets that are then thrown at the module to be tested. A random number generator is useful because if a problem is discovered, it can be absolutely recreated by using the same random number seed.

A random number generator is an important part of an automated testing procedure.

In the case of the B-Tree code, you could randomly decide to add or delete a record from the tree and you could randomly vary the size of the records being added. You could also decide to randomly restart the test. As you add records into the database and read them back, how do you verify the integrity of the randomly sized record? One slick technique is to use another random number generator to fill in the record with random values. The slick part is that all you need to save around in memory to validate the record when it is read back in is the random number seed that was used to generate the random record, not the entire record itself.

Another big advantage of using random number generators is that given enough time, they can test virtually all cases and code paths. It is a lot like throwing darts. If you keep on throwing a dart at the dart board, the center target is eventually hit. It is only a matter of time. The question is not if the target is hit, but when.

What you are doing with the automated testing procedure is taking a module that is considered to be bug-free and subjecting it to a torture test of random events over time. Assuming there is a bug in the module, the automated testing procedure will find it eventually.

If there is a bug in a module, an automated testing procedure will eventually find it.

It is important that the automated testing procedure be written so that it is capable of generating all types of conditions and not just the normal set of conditions. You want to make sure that all the code in the module gets tested.

Do automated testing procedures actually work? Yes! They are what turned up the MS-DOS lost cluster bug and the Windows task-switching bug described in §3.1. I was putting my own code through a rigorous automated testing procedure and every once in a while the underlying kernel would fail. I guess the use of probability theory pays off.

7.25 Documentation Tools

Programmers hate to write documentation, but they love to write code and most programmers are willing to comment their code to some degree. An even bigger problem is that even if documentation does exist, it is more than likely out of date because it hasn't been maintained to reflect code changes.

My solution to this problem is to accept the fact that external documentation is not going to be produced directly. Instead, I am going to produce it indirectly.

By having all programmers follow a common documentation style in the entire project, it is possible to write a program that scans all source files and produces documentation.

I use markers in comment blocks to assist me in parsing my comments. For example, module comment blocks begin with /*pm, APIENTRY function comment blocks begin with /*pf and LOCAL function comment blocks begin with /*p.

In practice, this works great. The AUTODOC program that I use scans all sources files and produces a Microsoft Quick Help file as output. The Brief editor that I use supports Quick Help. I now have instant access to all APIENTRY function documentation at the touch of a key.

7.26 Source-Code Control Systems

If you are not already using a source-code control system, I would highly recommend that you get one. I like them because they give me access to the source as it existed all the way back to day one. It is also essential for tracking down problems in released software. You may end up with two or three different versions of your software that are all in active use. The source-code control system gives you easy access to the source of any particular version of your software.

Most source-code control systems follow a get and put methodology. Getting a source file gives the "getter" editing privileges to the source. When changes are complete, the source is put back.

Before I put back any source, I produce a difference file and review all the changes that I have made to the source. On more than one occasion this has saved me from including a silly programming bug.

Always review changes before checking source code back in.

7.26.1 Revision Histories

An important part of maintaining software is keeping an accurate log of what changes were made to a module and why. Rather than keeping this information in the source file itself, I prefer to use the source-code control system.

In the source-code control system that I use, a put will prompt me to enter a description of the changes that I have made to the source file.

The entire revision history is available at any time and is maintained by the source-code control system. In modules that get changed a lot, this technique keeps around the full revision history without cluttering up the source file.

7.27 Monochrome Screen

7.27.1 The Windows Developer

A monochrome monitor is a must for the Windows-based developer. You can configure the debugging kernel to send debug messages to either a serial communications port or the monochrome monitor. This is configured in the DBWIN.EXE program, which is provided with Microsoft C8. A monochrome monitor is preferred because it is a lot faster when you get a lot of debug messages at once.

In addition to the system generating debug messages, the programmer can generate them as well by calling OutputDebugString(). The prototype for it is as follows.

OutputDebugString() prototype in windows.h (v3.1)
void WINAPI OutputDebugString(LPCSTR);

OutputDebugString() should not be called in the final release of your software. You do not want messages going to your customer's communication port. In my code, I control this by calling OutputDebugString() only if I detect that the debugging kernel is running. To detect if you are running under debug Windows, use the GetSystemMetrics(SM_DEBUG) call. It returns zero under retail Windows and a non-zero value under debug Windows.

Another benefit of using a monochrome screen is that most Windows debugging tools have an option to run on the monochrome screen. This way you can see the debug screen and your main screen at the same time.

7.27.2 The MS-DOS Developer

Just as in Windows, most MS-DOS debugging tools have an option to run on the monochrome screen. What do you do if you want to send messages to the monochrome screen?

An OutputDebugString() that can be used by MS-DOS programmers is as follows.

OutputDebugString() for MS-DOS programmers
void APIENTRY OutputDebugString( LPSTR lpS )
{
    LPSTR lpScreen=(LPSTR)0xB0000000; /* base of mono screen    */
    int nPos=0;                       /* for walking lpS string */
    /*--- Scroll monochrome screen up one line ---*/
    _fmemcpy( lpScreen, lpScreen+2*80, 2*80*24 );
    /*--- Place new line down in 25'th line ---*/
    for (int loop=0; loop<80; ++loop) {
        lpScreen[2*(80*24+loop)] = (lpS[nPos]?lpS[nPos++]:' ');
        }

} /* OutputDebugString */

The monochrome screen is memory mapped and is located at segment 0xB000. Every character on a monochrome screen is actually composed of 2 display bytes. One byte is the character to display and the other byte contains attribute information such as blinking, inverted, and so on.

This code works by first scrolling the monochrome screen by performing a memory copy. Next, the string is placed into line 25 of the monochrome screen. The string is placed space padded at the end to make sure that the previous contents of the twenty-fifth line are overwritten.

7.28 Techniques for Debugging Timing Sensitive Code

Application code should never have any timing dependencies. However, system level or interrupt code will more than likely have timing constraints. An example is an interrupt handler for a synchronous communications protocol. These drivers can be especially hard to debug because there is always communications traffic on the line and the protocol itself is timing sensitive. Using OutputDebugString() to help you debug the code wastes too much time and affects the timing sensitive code you want to debug, so an alternative is needed.

7.28.1 PutMonoChar() Function for MS-DOS

One technique that I have used successfully to debug timing sensitive code is to write a few informative characters directly into the monochrome screen video memory, in effect displaying a message on the monochrome monitor. For example, PutMonoChar() places a character at a specific row and column on the monochrome screen.

PutMonoChar(), for MS-DOS
void APIENTRY PutMonoChar( int nRow, int nCol, char c )
{
    if ((nRow>=0) && (nRow<25) && (nCol>=0) && (nCol<80)) {
        *(LPSTR)(0xB0000000+2*(nRow*80+nCol)) = c;
        }

} /* PutMonoChar */

PutMonoChar() works by first validating that the input nRow and nCol are valid. It then writes the character directly into monochrome screen video memory.

The advantage of using PutMonoChar() as opposed to OutputDebugString() for debug messages is that it is so much faster and is unlikely to adversely affect the timing sensitive code you want to debug. This is because PutMonoChar() is just placing one character down instead of OutputDebugString(), which is placing an entire line down and scrolling the entire monochrome screen.