|
|
|
Writing Bug-Free C Code
A Programming Style That Automatically Detects Bugs in C Code
by Jerry Jongerius / January 1995
|
|
|
|
Chapter 7: General Tips
No matter how good your tools that help you detect bugs in your
programs, the goal of every programmer should be to avoid bugs in
the first place. Debugging tools serve a purpose, but relying on
the tools to catch your programming mistakes does not make you a
better programmer. What makes you a better programmer is learning
from your mistakes.
|
A key to writing bug-free code is learning from your mistakes.
|
How do you rate your own ability to produce bug-free code? Whatever
your answer is, how did you rate yourself last year? How do you
expect to rate yourself next year?
The point of this exercise is to stress to you the importance of
improving your coding techniques year after year. The goal is to be
able to write more code next year than last year and to write it
with fewer bugs.
7.1 Design It Right the First Time
If there is one thing that I have learned over the years that I would
emphasize to a programmer just starting out, it is that old code
rarely dies because it just stays around and around. This is
especially true in large projects. And all successful small
projects end up turning into large projects.
As time goes on in a large project, more and more code layers are
built upon existing code layers. It does not take long before
coding is no longer being done to the operating system level or
windowing environment layer but to the layers that were coded on top
of these system layers.
Now suppose that a year later someone discovers that the
implementation of a low-level layer is causing performance problems.
All too often management decides to live with the performance
problem in favor of using their programmer's time putting new
features into the product. You are forced by management to live
with a lot of the coding decisions you make over the years, so your
decisions had better be good!
Management may allow you to re-engineer a module to make it better
and faster, but consider the lost time. The time that is attributed
to the module is the original design time plus the time to
re-engineer the module. Designing a module right the first time
saves time.
It is not feasible to continually re-engineer old modules. If this
does happen due to poor design decisions made upfront, a project
will come to a halt.
|
Designing a module right the first time saves time.
|
My advice is to design a module right the first time because you'll
rarely get the chance to re-engineer the module.
7.2 Good Design Beats an Optimizing Compiler
An optimizing compiler helps a program run faster, but in all cases,
a good design makes a program run fast. A great design produces the
fastest program. A little more time spent on a programming problem
generally results in a better design, which can make a program run
significantly faster.
Over the last few years, I think all of us at one time or another saw
a well-known product upgrade getting hammered by the trade press for
how sluggishly the upgrade performed. Do not let this happen to
your product.
And by all means, evaluate your competition. If your product is two
to three times slower than the competition and this comes out in a
magazine review, do you think potential customers will buy your
product?
7.3 Evolutionary Coding
How do you code a module? Do you spend days working feverishly on a
module and have it all come together that last half day? Do you
code one piece of a module, moving on to the next piece only when
you are certain that the piece you just wrote is working?
Over the years I've tried both techniques and I can tell you from
experience that coding a module a piece at a time is much easier.
It also helps isolate bugs. If you wait until the end to put all
the pieces together, where is the problem? By coding one piece at a
time, the problem is more than likely in the piece you are working
on and not in some piece you have already finished and tested.
7.3.1 Build a Framework
Start coding a module by building a framework that is plugged into a
system almost immediately. The goal in creating this framework is
to get the module completely stubbed out so that it compiles.
Let's use the Dos module as an example and assume that the module
interface has already been designed. The first step in creating the
framework is to create the global include file. Next, create a
source file that contains all of the sections of a module, but none
of the guts. Namely, create the module and APIENTRY comment blocks
without any comments, declare the class without any members and
write all the APIENTRY functions with no code in the function bodies.
At this point, I place return statements in all functions, so that
each function can return an error value. For example, I have
DosOpenFile() and DosCloseFile() return NULL and DosRead() and
DosWrite() return zero.
The framework is now complete. Compile the module and correct any
errors that show up. Although no real code has been written, the
framework provides you with a clear goal.
7.3.2 Code the Create and Destroy Methods
After the framework is compiling successfully, you are ready to start
implementing the module. Where should you start? The functions
that I always tackle first are the create and destroy methods. This
allows me to write some simple code that uses and tests the module.
In the case of the Dos module, the DosOpenFile() and DosCloseFile()
functions are implemented first, followed by some test code. This
test code allows verification that (1) opening an existing file
works, (2) trying to open a file that does not exist fails and (3)
calling DosCloseFile() works properly and frees allocated memory.
Most modules are not as simple as the Dos module was in determining
the data items to add to the class structure. This is fine. Simply
fill in the class structure with whatever data items are required by
the create and destroy method function. When implementing other
method functions, add data items to the class structure as they are
needed.
7.3.3 Code the Other Methods
Once the create and destroy methods are working correctly, it is time
to start implementing the other method functions. The best strategy
is to implement a method function, compile it and then test it by
writing test code.
In the case of the Dos module, you may decide to implement DosRead()
first. After doing so, you could then create a test file with a
known data set and attempt to have the test code read this data set.
This helps validate the DosRead() code. Likewise for the
DosWrite() function. Write out some information to a file and
verify that it was indeed written.
The order in which you code the method functions is totally up to
you. After you gain experience using this technique, you will end
up picking an order that allows for test code to be written easily.
7.4 Set Goals
Goal setting is always important to accomplishing any task you set
out to do, but it is especially important that programmers have
clearly defined goals. The problem is that all too often a
programmer gets sucked into a project that is 90 percent done and
weeks or months later the project is still only 90 percent done.
7.4.1 The 90 Percent Done Syndrome
The problem with programming without a clear well-defined goal is
that a project is not completed.
|
Programming without a goal is like a sailboat without a sail. You
drift.
|
I have fallen into this trap myself. You work on items that relate
to the ninety percent that is already done. You may have found a
new feature to add and so you work on it, but the feature does not
help you complete the project. Or you have found a better way to
implement an algorithm, so you spend time recoding the algorithm.
While finding a better algorithm is great, it only delays the
project from being completed.
The module framework helps you set a clearly defined goal. Once the
framework is in place, all the APIENTRY functions are stubbed out,
just waiting to be completed. Once these APIENTRY functions are
completely written and tested, you have reached your goal and the
module is finished.
7.5 Code What to Do, not How to Do It
The goal of this technique is to avoid spreading knowledge about how
something is done throughout the entire project. By moving this
knowledge to one function, you are isolating the knowledge. Now any
code that calls this function is specifying what must be done,
leaving the how to the function.
|
Never reinvent the wheel. Always code what to do, not how to do it.
|
It is hard to discipline yourself to use this technique, but the
payback is well worth the extra effort.
Code that uses this technique becomes shorter and more compact. What
used to take ten lines now takes five. This leads to a program that
is easier to code, maintain and change.
A good analogy is building blocks. You are building something
starting from scratch so you start out by making a few blocks. You
use these building blocks to make the something. Now let's say that
you want to build something else. The key is that you don't start
totally from scratch because you've already built some of the basic
building blocks.
Always keep your eyes open for a new building block to implement.
7.6 Virtually No Global Variables
In an object-based system, global variables (variables known to more
than one source file) are almost never needed. The reasoning is
simple. Objects, by design, are created and destroyed by method
functions, so the objects themselves are never static. An advantage
of object-based systems that dynamically allocate objects is that
they are never limited by some predefined number. Instead, they are
limited only by the amount of memory that is available.
Global variables are rarely referenced by the method functions of an
object because method functions act upon an object, not global data.
Remember that an object's handle is passed in as the first argument
to a method function.
So why are global variables ever needed? Sometimes to directly
support a class of objects. A prime example is if all objects of a
class require access to a read-only resource. When creating the
first object for this class, the resource is read into memory and a
handle to the resource is stored in a global variable. When
destroying the last object for this class, the memory associated
with the resource is freed.
Global variables to support individual object instances should not be
allowed. Global variables that support a class of objects as a
whole are permitted. Because these global variables support a
class, their use is limited to the source file that declares the
class. This can be enforced by using static on these variables,
ensuring that the variables are not visible in other source files.
7.7 Loop Variables
Variables used to control for loops should not be used outside the
loop itself. The LOOP() macro (see
§2.2.5)
was designed to enforce
this rule.
|
Loop variables should not be used outside the loop itself.
|
This rule makes code easier to maintain. While you are writing a
function, you understand it thoroughly and are unlikely to make a
mistake in using the loop variable after the loop has exited.
However, when you or one of your coworkers modifies the loop next
year, the assumptions under which the looping variable is used could
easily be invalidated.
It is better to be safe than sorry.
7.8 Use the Highest Compiler Warning Level
Today's compilers are pretty smart, but only if you tell them to be.
By default, most compilers use a low warning level and use more
advanced error checking only if instructed to do so. This is almost
always done to remain compatible with older code. You can imagine
the number of support calls a compiler vendor would get if the old
code that used to compile fine all of a sudden started producing a
lot of warning messages under a new release of the compiler.
For
more information on setting the compiler warning level in Microsoft
C8, see §2.1.3.
|
7.9 Use "static" to Localize Knowledge
Using static can sometimes be confusing because it appears to mean
different things in different contexts. A simple rule that helps
clear things up is that using static assigns permanent storage to an
object and limits the visibility of the object to the scope in which
it is defined. There are three basic ways in which static can be
used.
Static function declaration. You have already seen static used in
this case with the LOCAL macro (see
§6.6.7)
Since a function
already has permanent storage, this part of the rule is redundant.
The scope of a function is file scope, so static limits the
visibility of the function to the rest of the source file in which
it is defined.
External static variables. When a variable is defined at file scope,
it already has permanent storage and is visible to all source files.
Using static makes the variable invisible to other source files.
Again, the variable definition already has permanent storage, so
this part of the rule is redundant.
Internal static variables. When a variable is defined within a
function, it is called an automatic variable. Automatic variables
have no persistence across function call invocations. In other
words, if a value is assigned to an automatic variable in a block,
that value is lost when the block is exited. In almost all cases,
this is the desired behavior. However, using static assigns
permanent storage to the variable so that a value assigned to the
variable is not lost. The scope of the variable is also limited to
the block in which it is defined. An internal static variable is
just like an external static variable except that the visibility of
the variable is limited to a block.
It is also important to understand how an initialized static variable
behaves. Consider the following code.
An example in using static
void Testing( void )
{
LOOP(3) {
static int nTest1=100;
int nTest2=100;
printf( "nTest1=%d, nTest2=%d\n", nTest1, nTest2 );
++nTest1;
++nTest2;
} ENDLOOP
/* Testing */
Output from calling Testing() three times
nTest1=100, nTest2=100
nTest1=101, nTest2=100
nTest1=102, nTest2=100
nTest1=103, nTest2=100
nTest1=104, nTest2=100
nTest1=105, nTest2=100
nTest1=106, nTest2=100
nTest1=107, nTest2=100
nTest1=108, nTest2=100
|
As you can see from this example, nTest1 is initialized only once,
while nTest2 is initialized on each pass through the block. The
rule is that the initialization of any static variables takes place
at compile-time and that the initialization of automatic variables
takes place at run-time.
|
Static variables are initialized once at compile-time. Automatic
variables are initialized as needed at run-time.
|
7.10 Place Variables in the Block Needed
How many times have you tracked down a bug only to realize that you
used an automatic variable before it was properly initialized, or
used the variable well after it should have been, when it no longer
contained an appropriate value? While this may not happen to you as
you write the function, it becomes a lot more likely when you go
back to the code at a later date and modify it.
The solution to this problem is to define variables only in the
innermost scope in which they are needed. A new scope is created
any time you use a begin brace {. The scope is terminated by an
ending brace }. This means that your if statements, while
statements, and so on. create new scopes. Variables defined within
a scope are visible only within that scope. As soon as the scope
ends, so does your access to the variable. Consider the following
code fragment.
Limiting the scope of a variable
LOOP(strlen(pString)) {
char c=pString[loop];
...
} ENDLOOP
|
In this example, c is visible only within the loop. As soon as the
loop exits, c is longer visible. In fact, it can be defined and
reused in another scope.
|
Define variables in the scope in which they are needed.
|
In standard C, variables can be defined only at the beginning of a
scope before the main body of code. In C++, variables can be
defined wherever a statement is valid.
A useful #define for both C and C++ is the NewScope define.
NewScope define
#define NewScope
|
NewScope is a syntactical place holder (defined to be nothing) that
allows a new scope to be introduced into a program. It also allows
for a natural indentation style.
Using NewScope
void APIENTRY Function( args )
{
/*--- Comment ---*/
(code block)
/*--- Using NewScope ---*/
NewScope {
type var;
(code block that uses var)
}
} /* Function */
|
NewScope is useful in both C and C++ because it allows variables to
be created that are private to a block. As soon as the block exits,
the variables declared in the block are no longer visible and cannot
be referenced.
7.11 Arrays on the Stack
When using arrays that are declared on the stack, you must be careful
not to return a pointer to one of these arrays back to the calling
function. Consider the following example.
A program with a subtle bug
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
char *myitoa( int nNumber )
{
char buffer[80];
sprintf( buffer, "%d", nNumber );
return (buffer);
}
int main(void)
{
printf( "Number = %s\n", myitoa(234) );
return 0;
}
|
This program contains a subtle bug in that myitoa() is returning the
address of buffer, an array on the stack. It is subtle because
despite this bug, the program still works properly!
It is a bug to return the address of an array on the stack back to a
calling function because after the function returns, the array is
now in the part of the stack that no function owns. The problem
with this bug is that the code will work unless the array gets
overwritten by making another function call, although, making a
function call is no guarantee that buffer will be overwritten.
This bug is exactly like keeping a pointer around a block of memory
that was freed. Unless the memory is reused and overwritten,
accessing an object through the memory pointer will continue to
work. As soon as the block of memory gets overwritten, you have a
subtle problem to track down.
7.12 Pointers Contain Valid Addresses or NULL
The reasoning behind this is simple. It helps find bugs. If all
pointers contain either a valid address or NULL, then accessing the
valid address does not cause the program to fault. However,
accessing the NULL pointer under most protected environments causes
the program to fault and you have found your bug.
Consider what would happen if you have a pointer to a memory object,
the memory object is freed and the pointer is not set to NULL. The
pointer still contains the old address, an address to an invalid
object. However, the address itself under most environments is
still valid and accessing the memory does not cause your program to
fault. This is a problem and the source of a lot of bugs.
The reason for being able to access the memory of a freed object is
that most memory managers simply add the memory back into the pool
of free memory. Rarely do they actually mark this memory as being
inaccessible to the program. It is time-consuming to communicate
with the operating system on every allocation and deallocation
request. Instead, a memory pool is maintained. Only when this pool
is exhausted does the heap manager ask the operating system for more
memory.
To help enforce this policy, macros that interface with the memory
manager should be designed and used exclusively. An example of this
is NEWOBJ() or FREE() for class objects (see
§4.5).
7.13 Avoid Type Casting Whenever Possible
Type casting, by design, bypasses the type checking of the compiler.
In essence, type casting tells the compiler that you know what you
are doing and not to complain about it. The problem with this is
that your environment may change, you may be porting the software to
another platform, changing the memory model of the program or
upgrading to a new revision of the compiler.
For whatever reason, your environment has changed. When you
recompile your program, you run the risk of missing warning
messages. This is because the behavior of the statement in which
you are using the type cast may have changed, but the type cast
masks the behavior change.
There is another situation that is especially true in mixed-model
programming under Windows. To make matters worse, Microsoft's own
sample code is a bad example because it is littered with totally
unnecessary type casts. Consider the following code fragment.
An example of bad type casts
MSG msg;
while (GetMessage((LPMSG)&msg, NULL, 0, 0)) {
TranslateMessage((LPMSG)&msg);
DispatchMessage((LPMSG)&msg);
}
|
Microsoft Windows programmers recognize this code as the main message
loop for an application. All messages that are placed in an
application's message queue are dispatched by this message loop.
The problem with this code is that all three type casts to LPMSG are
totally unnecessary. The code works great without the type casts.
The prototypes for GetMessage(), TranslateMessage() and
DispatchMessage() all indicate that they take LPMSG as an argument.
The data type of &msg is PMSG due to the mixed-model environment. I
can only suppose that the programmer thought that PMSG must be type
cast into what the functions expected, an LPMSG. This is simply not
the case. In mixed-model programming, the compiler promotes a near
pointer to the object to a far pointer to the object in all cases
that a far pointer to the object is expected. In other words, the
compiler is implicitly performing the type cast for you.
|
7.13.1 Mixed-Model Programming Implicit Type Cast Warning
In mixed-model programming there exists a subtle problem if you write
code that allows NULL to be passed through as one of the argument
pointers. Consider the following code.
GetCpuSpeed(), demonstrating implicit type cast problem
int APIENTRY GetCpuSpeed( LPSTR lpBuffer )
{
int nSpeed=(calculation);
if (lpBuffer) {
(fill buffer with text description of speed)
}
return (nSpeed);
} /* GetCpuSpeed */
|
GetCpuSpeed() always returns the CPU speed as an int, but as an
option it also creates a text description of the CPU speed in the
provided buffer if the buffer pointer is non-NULL. Now what happens
when you call GetCpuSpeed()?
Calling GetCpuSpeed()
PSTR pBuffer=NULL;
int nSpeed1=GetCpuSpeed(NULL);
int nSpeed2=GetCpuSpeed(pBuffer);
|
In both cases you want only the integer speed and not the text
description. In the first case, GetCpuSpeed(NULL) behaves as
expected. However, in the second case, GetCpuSpeed(pBuffer) fills
in a text buffer. The problem is that pBuffer is a near pointer and
that GetCpuSpeed() expects a far pointer. No matter what value is
contained in pBuffer, it is considered a valid pointer and the type
cast of a near pointer to a far pointer (implicitly by the compiler
or explicitly by you) uses the data segment value as the segment for
the far pointer.
In other words, when the pBuffer near pointer is converted to a far
pointer, the offset is NULL, but the segment value is non-NULL.
From experience, I have found that correctly writing code in
situations like this is too problematic. My solution to this
problem has been to move away from mixed-model pointers and stick
with far pointers.
|
7.14 Use sizeof() on Variables, not Types
How do you use sizeof() in your code? Do you typically use sizeof()
with a variable name or a data type? While at first glance the
distinction may not seem to matter that much, at a deeper level it
matters a lot. Consider the following example.
Using sizeof(), a bad example
int nVar;
...
DumpHex( &nVar, sizeof(int) );
|
DumpHex() is a general purpose routine that will dump out an
arbitrary byte range of memory. The first argument is a pointer to
a block of memory and the second argument is the number of bytes to
dump.
Can you spot a possible problem in this example? The sizeof() in
this example is operating on the int data type and not on the
variable nVar. What if nVar needs to be changed in the future to a
long data type? Well, sizeof(int) would have to be changed to
sizeof(long). A better way to use sizeof() is as follows.
Using sizeof(), a good example
int nVar;
...
DumpHex( &nVar, sizeof(nVar) );
|
In this new example, sizeof() now operates on nVar. This allows
DumpHex() to work correctly no matter what the data type of nVar is.
If the type of nVar changes, we will not have to hunt down in the
code where the old data type was explicitly used.
7.15 Avoid Deeply Nested Blocks
There are times when, for any number of reasons, you end up writing
code that is deeply nested. Consider the following example.
Deeply nested code
void DeepNestFunction( void )
{
if (test1) {
(more code)
if (test2) {
(more code)
if (test3) {
(more code)
if (test4) {
(more code)
}
}
}
}
} /* DeepNestFunction */
|
While this nesting is only four deep, I've had times when it would
have gone ten deep. When nesting gets too deep, the code becomes
harder to read and understand. There are two basic solutions to
this problem.
Unroll the tests. The first solution is to create a boolean variable
that maintains the current success or failure status and to
constantly retest it as follows.
Unrolling the deep nesting
void UnrollingDeepNesting( void )
{
BOOL bVar=(test1);
if (bVar) {
(more code)
bVar = (test2);
}
if (bVar) {
(more code)
bVar = (test3);
}
...
} /* UnrollingDeepNesting */
|
Call another function. The second solution is to package the
innermost tests into another function and to call that function
instead of performing the tests directly.
7.16 Keep Functions Small
The primary reason to keep functions small is that it helps you
manage and understand a programming problem better. If you have to
go back to the code a year later to modify it, you may have
forgotten the small details and have to relearn how the code works.
It sure helps if functions are small.
As a general rule, try to keep functions manageable by restricting
their length to one page. Most of the time functions are smaller
than a page and sometimes they are a page or two. Having a function
that spans five pages is unacceptable.
|
As a general rule, try to keep functions under one page.
|
If a function starts to get too large, step back a moment and try to
break the function into smaller functions. The functions should
make sense on their own. Remember to treat a function as a method
that transitions an object from one valid state to another valid
state. Try to come up with well-defined, discrete actions and write
functions that perform these actions.
7.17 Releasing Debugging Code in the Product
Is there such a thing as a debug build and a retail build of your
product? Should there be? No, I do not think so! Let me explain
why. I believe that any released application should be able to be
thrown into debug mode on the fly at any time.
In the applications that I develop, I have a global boolean variable
called bDebugging that is either FALSE or TRUE. I place what I
consider to be true debugging code within an if statement that
checks bDebugging. This is usually done for debugging code that
adds a lot of execution overhead. For debug code that does not add
much overhead, I just include the code and do not bother with
bDebugging.
The benefit of doing this is that there is only one build of your
product. That way, if a customer is running into a severe problem
with your product, you can instruct the customer how to run your
product in debug mode and quite possibly find the problem quickly.
I do not consider WinAssert() and VERIFY() to be debugging code. In
the programs that I write, WinAssert() and VERIFY() are not
switchable by bDebugging. Instead, they are always on. The
reasoning is simple. Would you like to know a bug's filename and
line number in your program or would you just like to know that your
program crashed somewhere, location unknown?
|
WinAssert() and VERIFY() are not debugging code.
|
If you object to the users of your product seeing assertion failures
and run-time object verification failures, I recommend that you
instead silently record the error in a log file. By doing this, you
will have some record of what went wrong in the program when the
customer calls you.
In a program that ships with WinAssert() and VERIFY() on, the program
alerts the user to the exact filename and line number of a problem.
If the fault-tolerant syntax is used, the program continues to run.
Oftentimes, just knowing that a program failed at this one spot is
enough to scrutinize that section of the code and find the problem.
It is important that the fault-tolerant forms of WinAssert() and
VERIFY() be used. Doing so ensures that the program continues to
run after a fault.
|
The fault-tolerant forms of WinAssert() and VERIFY() should always be
used.
|
Sometimes a filename and line number are not enough to track down a
problem. At times like these, a stack trace is highly desirable.
7.18 Stack Trace Support
A key debugging tool that I use for tracking down problems in my code
is utilizing stack trace dumps from a fault. Sometimes only knowing
the filename and line number of a fault is not enough to track down
a problem. In these cases, the call history that led up to the
problem is often enough.
For example, I once had a program that was faulting at a specific
filename and line number. This code was examined thoroughly but no
problem could be found. So, a stack trace of the fault was obtained
from the customer, which assisted me in pinpointing the problem
immediately. As it turned out, a newly added feature had caused a
reentrancy problem to occur in old code.
Most development environments today provide sophisticated tools that
allow the developer to quickly pinpoint problems in their code.
What do you do when a customer calls up with a fault that you cannot
reproduce? The customer is certainly not running the development
environment that you are running.
My solution to this problem is to add full stack trace capabilities
into the application itself. At every point in the stack trace, a
filename and line number are obtained.
|
I build stack trace support into my applications.
|
Unfortunately, the solution is specific to the underlying
environment, so I cannot give a general solution, but I will do my
best to describe the technique that I use.
7.18.1 Implementing Stack Trace Support
Obtaining filename and line number information. The most important
piece of information to which access is needed is debugging filename
and line number information. Under a Microsoft development
environment, obtaining this information is done in two steps. The
first step is to tell the compiler to generate line number
information. Under Microsoft C8, this is done with the /Zd command
line option which results in the .obj files containing the line
number information. The second step is instructing the Microsoft
segmented executable linker to produce a .map file. The /map
command line option is used.
Translate the filename and line number information. The .map file
contains the filename and line number information in a human
readable form. A program needs to be written that takes this .map
text information and translates it into a form that is easily
readable by the stack trace code.
Walking the Stack. This is the toughest part because it is so
specific to the platform that you are using. If walking the stack
is not provided as a service by the operating system, you may want
to consider walking the stack yourself. This is what I do.
Luckily, Windows now provides stack walking support through the
TOOLHELP.DLL system library. For Windows, the functions of interest
are StackTraceFirst(), StackTraceCSIPFirst() and StackTraceNext().
Mapping addresses to filename and line numbers. As you walk the
stack, the only piece of information available to you is an address.
Somehow you need to map this back to the information you stored in
the binary file representation of the .map file. Again, this is
specific to the environment you are working on. Under 16-bit
protected-mode Windows, far addresses are really composed of a
selector and offset. The trick is to map the selector back to a
segment number because the segment number is what is specified in
the .map file. This is done in two steps. The first step is to map
the selector to a global memory handle by using GlobalHandle(). The
second step is then to map this global memory handle to a segment
number by calling GlobalEntryHandle(). Both functions are provided
by TOOLHELP.DLL. You can now look up the filename and line number.
You now have superior stack trace support built right into your
application. It is superior because the stack trace gives filename
and line numbers instead of the hex offsets usually given by system
level stack traces.
|
7.18.2 Enhancements
If you implement stack trace support in your application, I have some
enhancements to suggest to you. I would highly recommend that you
first get the basic stack trace support working before tackling
these enhancements.
Hooking CPU faults. In protected memory environments, if your
program accesses memory that does not belong to it, the program
faults and the operating system halts the program. If possible, try
to hook into this fault and produce a stack trace! For Windows,
TOOLHELP.DLL provides the InterruptRegister() and
InterruptUnRegister() functions that allow programs to hook their
own faults. This requires some assembly language programming.
Hooking parameter errors. Under Windows, the kernel is performing
error checks on the parameters being passed to Windows API calls.
It is possible to hook into the bad parameter notification chain.
This is done by using TOOLHELP.DLL, which provides NotifyRegister()
and NotifyUnRegister().
Displaying function arguments. As you walk the stack, try to parse
what arguments were passed to the function along with the filename
and line number. This is tricky but it is doable and well worth the
effort. Most faults that cause stack traces are caused by an
invalid argument in some function call. Spotting this in the stack
trace then becomes easy.
|
7.18.3 The Benefits
I have implemented full stack trace support along with hooking CPU
faults, hooking Windows kernel parameter errors and displaying
function arguments. What are the benefits? Great customer
relations! In most cases, a stack trace is enough to track down a
problem. In other words, I can track down a problem without first
having to reproduce the problem. Customers begin to trust that a
reported problem will get fixed and you end up with a robust product
that the customer begins to trust and rely upon.
7.19 Functions Have a Single Point of Exit
This has more to do with writing functions that are easily
maintainable than anything else. If a function has one entry point,
a flow of control and one exit point, the function is easier to
understand than a function with multiple exit points.
It also helps eliminate buggy code because using a return in the
middle of a function implies an algorithm that does not have a
straightforward flow of control. The algorithm should be redesigned
so that there is only one exit point.
In a sense, a return in the middle of a function is just like using a
goto statement. Instead of transferring control back to the caller
at the end of the function, control is being passed back from the
middle of the function.
7.20 Do Not Use the Goto Statement
I agree with the majority opinion that goto statements should be
avoided. Functions with goto statements are hard to maintain.
7.21 Write Bulletproof Functions
Who is responsible for making sure that a function gets called and
used properly? Is it up to the programmer? Or is it up to the
function that gets called? Let's face it, programmers make
mistakes. So anything that can be done on the part of the function
to ensure that it is being used properly aids the programmer in
finding problems in the program.
Consider a GetTextOfMonth() function. It takes as an argument a
month, zero through eleven inclusive, and returns a long pointer to
a three-character string description of the month. A naturally
simple solution is as follows.
GetTextOfMonth() function, no error checking
LPSTR APIENTRY GetTextOfMonth( int nMonth )
{
CSCHAR TextOfMonths[12][4] = {
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
};
return (TextOfMonths[nMonth]);
} /* GetTextOfMonth */
|
The only problem with this code is what happens when the input nMonth
is not in the proper range of zero to eleven? The returned pointer
points to something, but if treated as a string, it is more than
likely much longer than a three-character string. If this string is
being used in a printf() statement, the resultant buffer has a high
likelihood of being overwritten, trashing memory beyond the end of
the buffer and causing even more problems that need to be tracked
down.
The solution is to make GetTextOfMonth() completely bulletproof so
that any value passed into it returns a pointer to a three-character
string. One possible solution is as follows.
GetTextOfMonth() function, with error checking
LPSTR APIENTRY GetTextOfMonth( int nMonth )
{
CSCHAR szBADMONTH[]="???";
LPSTR lpMonth=szBADMONTH;
WinAssert((nMonth>=0) && (nMonth<12)) {
CSCHAR TextOfMonths[12][4] = {
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
};
lpMonth = TextOfMonths[nMonth];
}
return (lpMonth);
} /* GetTextOfMonth */
|
Notice how a fault-tolerant form of WinAssert() is being used. This
ensures that a full stack trace is logged if the input parameter is
invalid.
Should this fault-tolerant code ever be removed? Once you get the
code working that uses GetTextOfMonth(), you know there are no bugs,
right? No, I do not think so! Do you know that your entire program
is bug-free? There could be a bug in some totally unrelated part of
the program that is causing a memory overwrite. If it just happens
to overwrite a month number that you have stored in memory, you are
in big trouble once again. Or what happens when you go back a year
latter to modify the code that uses GetTextOfMonth()? You may
introduce a subtle bug.
The best way to write a bug-free program is to keep all the defenses
up at all times. At least this way, you will know when there is a
problem in your program.
You may not know what the problem is, but just knowing that there is
a bug is important information for maintaining a quality product.
The best way to write a bug-free program is to keep all the defenses
up at all times.
7.22 Writing Portable Code
C is so successful because it is so flexible, flexible, that is, to
the compiler writer because many key issues are left to the compiler
writer to specify how they should work. This was done so that each
implementation of C could take advantage of how a particular machine
architecture works.
For example, what is the sign of the remainder upon integer division?
How many bytes are there in an int or long or short? Are members
of a structure padded to an alignment boundary? Does a zero-length
file actually exist? What is the ordering of bytes within an int,
long or short?
Most compilers provide a chapter or two in their documentation on how
they have implemented these and many more implementation-defined
behaviors.
If writing portable code is important to you, I would suggest that
you thoroughly read these chapters and adopt programming
methodologies that avoid implementation dependent behavior.
7.23 Memory Models
Due to the segmented architecture of the Intel CPU, compiler vendors
provide the option of creating a program in one of four basic memory
models.
The small memory model. This model allows for less than 64K of data
and less than 64K of code. This model is great for quick and dirty
utility programs. It also produces the fastest program since there
is never any need to reload a segment register.
The compact memory model. This model allows for more than 64K of
data and less than 64K of code. It is ideal for small programs that
crunch through a lot of data. The program is still fast, but there
is a slight speed penalty for accessing far data.
The medium memory model. This memory model allows for less than 64K
of data and more than 64K of code. This is the memory model that
Microsoft recommends using for programming in Windows. It allows
for lots of code and a small amount of data. By using mixed-model
programming, you can gain access to more than 64K of data.
The large memory model. This allows for more than 64K of data and
more than 64K of code. It is the memory model that most closely
matches flat memory model architectures.
These memory models are complicated by the fact that there is
something called mixed-model programming. The four basic memory
models essentially provide default code and data pointer attributes.
A code or data pointer is either a 16-bit near pointer or a 32-bit
segment/offset pointer. These near and far attributes are only
defaults. Mixed-model programming allows the near/far attributes to
be specified on a pointer-by-pointer basis.
My advice to you is to use the large memory model, unless you are
ultra concerned with speed. The industry is moving away from
segmented architectures toward flat memory model programming, where
there are no segments.
|
Use the large memory model. This will aid porting to flat memory
model architectures.
|
By using the large memory model now, you ease the eventual porting of
your software to the flat memory model.
|
7.24 Automated Testing Procedures
An automated testing procedure is a function that is designed to
automatically test your code for you -- code that you think is already
bug-free. A key part of most testing procedures is their use of
random number generator class objects.
Let's suppose that you have just implemented a B-Tree disk-based
storage heap for fast access to your database. How are you going to
really test it? You could code examples that use the B-Tree class
in order to test edge conditions. This is a good idea anyway, but
what do you code next?
A solution that I have found useful and highly effective is to use a
random number generator to create data sets that are then thrown at
the module to be tested. A random number generator is useful
because if a problem is discovered, it can be absolutely recreated
by using the same random number seed.
|
A random number generator is an important part of an automated
testing procedure.
|
In the case of the B-Tree code, you could randomly decide to add or
delete a record from the tree and you could randomly vary the size
of the records being added. You could also decide to randomly
restart the test. As you add records into the database and read
them back, how do you verify the integrity of the randomly sized
record? One slick technique is to use another random number
generator to fill in the record with random values. The slick part
is that all you need to save around in memory to validate the record
when it is read back in is the random number seed that was used to
generate the random record, not the entire record itself.
Another big advantage of using random number generators is that given
enough time, they can test virtually all cases and code paths. It
is a lot like throwing darts. If you keep on throwing a dart at the
dart board, the center target is eventually hit. It is only a
matter of time. The question is not if the target is hit, but when.
What you are doing with the automated testing procedure is taking a
module that is considered to be bug-free and subjecting it to a
torture test of random events over time. Assuming there is a bug in
the module, the automated testing procedure will find it eventually.
|
If there is a bug in a module, an automated testing procedure will
eventually find it.
|
It is important that the automated testing procedure be written so
that it is capable of generating all types of conditions and not
just the normal set of conditions. You want to make sure that all
the code in the module gets tested.
Do automated testing procedures actually work? Yes! They are what
turned up the MS-DOS lost cluster bug and the Windows task-switching
bug described in §3.1.
I was putting my own code through a
rigorous automated testing procedure and every once in a while the
underlying kernel would fail. I guess the use of probability theory
pays off.
|
7.25 Documentation Tools
Programmers hate to write documentation, but they love to write code
and most programmers are willing to comment their code to some
degree. An even bigger problem is that even if documentation does
exist, it is more than likely out of date because it hasn't been
maintained to reflect code changes.
My solution to this problem is to accept the fact that external
documentation is not going to be produced directly. Instead, I am
going to produce it indirectly.
By having all programmers follow a common documentation style in the
entire project, it is possible to write a program that scans all
source files and produces documentation.
I use markers in comment blocks to assist me in parsing my comments.
For example, module comment blocks begin with /*pm, APIENTRY
function comment blocks begin with /*pf and LOCAL function comment
blocks begin with /*p.
In practice, this works great. The AUTODOC program that I use scans
all sources files and produces a Microsoft Quick Help file as
output. The Brief editor that I use supports Quick Help. I now
have instant access to all APIENTRY function documentation at the
touch of a key.
7.26 Source-Code Control Systems
If you are not already using a source-code control system, I would
highly recommend that you get one. I like them because they give me
access to the source as it existed all the way back to day one. It
is also essential for tracking down problems in released software.
You may end up with two or three different versions of your software
that are all in active use. The source-code control system gives
you easy access to the source of any particular version of your
software.
Most source-code control systems follow a get and put methodology.
Getting a source file gives the "getter" editing privileges to the
source. When changes are complete, the source is put back.
Before I put back any source, I produce a difference file and review
all the changes that I have made to the source. On more than one
occasion this has saved me from including a silly programming bug.
|
Always review changes before checking source code back in.
|
7.26.1 Revision Histories
An important part of maintaining software is keeping an accurate log
of what changes were made to a module and why. Rather than keeping
this information in the source file itself, I prefer to use the
source-code control system.
In the source-code control system that I use, a put will prompt me to
enter a description of the changes that I have made to the source
file.
The entire revision history is available at any time and is
maintained by the source-code control system. In modules that get
changed a lot, this technique keeps around the full revision history
without cluttering up the source file.
7.27 Monochrome Screen
7.27.1 The Windows Developer
A monochrome monitor is a must for the Windows-based developer. You
can configure the debugging kernel to send debug messages to either
a serial communications port or the monochrome monitor. This is
configured in the DBWIN.EXE program, which is provided with
Microsoft C8. A monochrome monitor is preferred because it is a lot
faster when you get a lot of debug messages at once.
In addition to the system generating debug messages, the programmer
can generate them as well by calling OutputDebugString(). The
prototype for it is as follows.
OutputDebugString() prototype in windows.h (v3.1)
void WINAPI OutputDebugString(LPCSTR);
|
OutputDebugString() should not be called in the final release of your
software. You do not want messages going to your customer's
communication port. In my code, I control this by calling
OutputDebugString() only if I detect that the debugging kernel is
running. To detect if you are running under debug Windows, use the
GetSystemMetrics(SM_DEBUG) call. It returns zero under retail
Windows and a non-zero value under debug Windows.
Another benefit of using a monochrome screen is that most Windows
debugging tools have an option to run on the monochrome screen.
This way you can see the debug screen and your main screen at the
same time.
7.27.2 The MS-DOS Developer
Just as in Windows, most MS-DOS debugging tools have an option to run
on the monochrome screen. What do you do if you want to send
messages to the monochrome screen?
An OutputDebugString() that can be used by MS-DOS programmers is as
follows.
OutputDebugString() for MS-DOS programmers
void APIENTRY OutputDebugString( LPSTR lpS )
{
LPSTR lpScreen=(LPSTR)0xB0000000; /* base of mono screen */
int nPos=0; /* for walking lpS string */
/*--- Scroll monochrome screen up one line ---*/
_fmemcpy( lpScreen, lpScreen+2*80, 2*80*24 );
/*--- Place new line down in 25'th line ---*/
for (int loop=0; loop<80; ++loop) {
lpScreen[2*(80*24+loop)] = (lpS[nPos]?lpS[nPos++]:' ');
}
} /* OutputDebugString */
|
The monochrome screen is memory mapped and is located at segment
0xB000. Every character on a monochrome screen is actually composed
of 2 display bytes. One byte is the character to display and the
other byte contains attribute information such as blinking,
inverted, and so on.
This code works by first scrolling the monochrome screen by
performing a memory copy. Next, the string is placed into line 25
of the monochrome screen. The string is placed space padded at the
end to make sure that the previous contents of the twenty-fifth line
are overwritten.
|
7.28 Techniques for Debugging Timing Sensitive Code
Application code should never have any timing dependencies. However,
system level or interrupt code will more than likely have timing
constraints. An example is an interrupt handler for a synchronous
communications protocol. These drivers can be especially hard to
debug because there is always communications traffic on the line and
the protocol itself is timing sensitive. Using OutputDebugString()
to help you debug the code wastes too much time and affects the
timing sensitive code you want to debug, so an alternative is needed.
7.28.1 PutMonoChar() Function for MS-DOS
One technique that I have used successfully to debug timing sensitive
code is to write a few informative characters directly into the
monochrome screen video memory, in effect displaying a message on
the monochrome monitor. For example, PutMonoChar() places a
character at a specific row and column on the monochrome screen.
PutMonoChar(), for MS-DOS
void APIENTRY PutMonoChar( int nRow, int nCol, char c )
{
if ((nRow>=0) && (nRow<25) && (nCol>=0) && (nCol<80)) {
*(LPSTR)(0xB0000000+2*(nRow*80+nCol)) = c;
}
} /* PutMonoChar */
|
PutMonoChar() works by first validating that the input nRow and nCol
are valid. It then writes the character directly into monochrome
screen video memory.
The advantage of using PutMonoChar() as opposed to
OutputDebugString() for debug messages is that it is so much faster
and is unlikely to adversely affect the timing sensitive code you
want to debug. This is because PutMonoChar() is just placing one
character down instead of OutputDebugString(), which is placing an
entire line down and scrolling the entire monochrome screen.
|
Copyright © 1993-1995, 2002-2008 Jerry Jongerius
This book was previously published by Person Education, Inc.,
formerly known as Prentice Hall. ISBN: 0-13-183898-9
|
|