No matter how good your tools that help you detect bugs in your programs, the goal of every programmer should be to avoid bugs in the first place. Debugging tools serve a purpose, but relying on the tools to catch your programming mistakes does not make you a better programmer. What makes you a better programmer is learning from your mistakes. How do you rate your own ability to produce bug-free code? Whatever your answer is, how did you rate yourself last year? How do you expect to rate yourself next year? The point of this exercise is to stress to you the importance of improving your coding techniques year after year. The goal is to be able to write more code next year than last year and to write it with fewer bugs. 7.1 Design It Right the First Time If there is one thing that I have learned over the years that I would emphasize to a programmer just starting out, it is that old code rarely dies because it just stays around and around. This is especially true in large projects. And all successful small projects end up turning into large projects. As time goes on in a large project, more and more code layers are built upon existing code layers. It does not take long before coding is no longer being done to the operating system level or windowing environment layer but to the layers that were coded on top of these system layers. Now suppose that a year later someone discovers that the implementation of a low-level layer is causing performance problems. All too often management decides to live with the performance problem in favor of using their programmer's time putting new features into the product. You are forced by management to live with a lot of the coding decisions you make over the years, so your decisions had better be good! Management may allow you to re-engineer a module to make it better and faster, but consider the lost time. The time that is attributed to the module is the original design time plus the time to re-engineer the module. Designing a module right the first time saves time. It is not feasible to continually re-engineer old modules. If this does happen due to poor design decisions made upfront, a project will come to a halt. My advice is to design a module right the first time because you'll rarely get the chance to re-engineer the module. 7.2 Good Design Beats an Optimizing Compiler An optimizing compiler helps a program run faster, but in all cases, a good design makes a program run fast. A great design produces the fastest program. A little more time spent on a programming problem generally results in a better design, which can make a program run significantly faster. Over the last few years, I think all of us at one time or another saw a well-known product upgrade getting hammered by the trade press for how sluggishly the upgrade performed. Do not let this happen to your product. And by all means, evaluate your competition. If your product is two to three times slower than the competition and this comes out in a magazine review, do you think potential customers will buy your product? 7.3 Evolutionary Coding How do you code a module? Do you spend days working feverishly on a module and have it all come together that last half day? Do you code one piece of a module, moving on to the next piece only when you are certain that the piece you just wrote is working? Over the years I've tried both techniques and I can tell you from experience that coding a module a piece at a time is much easier. It also helps isolate bugs. If you wait until the end to put all the pieces together, where is the problem? By coding one piece at a time, the problem is more than likely in the piece you are working on and not in some piece you have already finished and tested. 7.3.1 Build a Framework Start coding a module by building a framework that is plugged into a system almost immediately. The goal in creating this framework is to get the module completely stubbed out so that it compiles. Let's use the Dos module as an example and assume that the module interface has already been designed. The first step in creating the framework is to create the global include file. Next, create a source file that contains all of the sections of a module, but none of the guts. Namely, create the module and APIENTRY comment blocks without any comments, declare the class without any members and write all the APIENTRY functions with no code in the function bodies. At this point, I place return statements in all functions, so that each function can return an error value. For example, I have DosOpenFile() and DosCloseFile() return NULL and DosRead() and DosWrite() return zero. The framework is now complete. Compile the module and correct any errors that show up. Although no real code has been written, the framework provides you with a clear goal. 7.3.2 Code the Create and Destroy Methods After the framework is compiling successfully, you are ready to start implementing the module. Where should you start? The functions that I always tackle first are the create and destroy methods. This allows me to write some simple code that uses and tests the module. In the case of the Dos module, the DosOpenFile() and DosCloseFile() functions are implemented first, followed by some test code. This test code allows verification that (1) opening an existing file works, (2) trying to open a file that does not exist fails and (3) calling DosCloseFile() works properly and frees allocated memory. Most modules are not as simple as the Dos module was in determining the data items to add to the class structure. This is fine. Simply fill in the class structure with whatever data items are required by the create and destroy method function. When implementing other method functions, add data items to the class structure as they are needed. 7.3.3 Code the Other Methods Once the create and destroy methods are working correctly, it is time to start implementing the other method functions. The best strategy is to implement a method function, compile it and then test it by writing test code. In the case of the Dos module, you may decide to implement DosRead() first. After doing so, you could then create a test file with a known data set and attempt to have the test code read this data set. This helps validate the DosRead() code. Likewise for the DosWrite() function. Write out some information to a file and verify that it was indeed written. The order in which you code the method functions is totally up to you. After you gain experience using this technique, you will end up picking an order that allows for test code to be written easily. 7.4 Set Goals Goal setting is always important to accomplishing any task you set out to do, but it is especially important that programmers have clearly defined goals. The problem is that all too often a programmer gets sucked into a project that is 90 percent done and weeks or months later the project is still only 90 percent done. 7.4.1 The 90 Percent Done Syndrome The problem with programming without a clear well-defined goal is that a project is not completed. I have fallen into this trap myself. You work on items that relate to the ninety percent that is already done. You may have found a new feature to add and so you work on it, but the feature does not help you complete the project. Or you have found a better way to implement an algorithm, so you spend time recoding the algorithm. While finding a better algorithm is great, it only delays the project from being completed. The module framework helps you set a clearly defined goal. Once the framework is in place, all the APIENTRY functions are stubbed out, just waiting to be completed. Once these APIENTRY functions are completely written and tested, you have reached your goal and the module is finished. 7.5 Code What to Do, not How to Do It The goal of this technique is to avoid spreading knowledge about how something is done throughout the entire project. By moving this knowledge to one function, you are isolating the knowledge. Now any code that calls this function is specifying what must be done, leaving the how to the function. It is hard to discipline yourself to use this technique, but the payback is well worth the extra effort. Code that uses this technique becomes shorter and more compact. What used to take ten lines now takes five. This leads to a program that is easier to code, maintain and change. A good analogy is building blocks. You are building something starting from scratch so you start out by making a few blocks. You use these building blocks to make the something. Now let's say that you want to build something else. The key is that you don't start totally from scratch because you've already built some of the basic building blocks. Always keep your eyes open for a new building block to implement. 7.6 Virtually No Global Variables In an object-based system, global variables (variables known to more than one source file) are almost never needed. The reasoning is simple. Objects, by design, are created and destroyed by method functions, so the objects themselves are never static. An advantage of object-based systems that dynamically allocate objects is that they are never limited by some predefined number. Instead, they are limited only by the amount of memory that is available. Global variables are rarely referenced by the method functions of an object because method functions act upon an object, not global data. Remember that an object's handle is passed in as the first argument to a method function. So why are global variables ever needed? Sometimes to directly support a class of objects. A prime example is if all objects of a class require access to a read-only resource. When creating the first object for this class, the resource is read into memory and a handle to the resource is stored in a global variable. When destroying the last object for this class, the memory associated with the resource is freed. Global variables to support individual object instances should not be allowed. Global variables that support a class of objects as a whole are permitted. Because these global variables support a class, their use is limited to the source file that declares the class. This can be enforced by using static on these variables, ensuring that the variables are not visible in other source files. 7.7 Loop Variables Variables used to control for loops should not be used outside the loop itself. The LOOP() macro (see §2.2.5) was designed to enforce this rule. This rule makes code easier to maintain. While you are writing a function, you understand it thoroughly and are unlikely to make a mistake in using the loop variable after the loop has exited. However, when you or one of your coworkers modifies the loop next year, the assumptions under which the looping variable is used could easily be invalidated. It is better to be safe than sorry. 7.8 Use the Highest Compiler Warning Level Today's compilers are pretty smart, but only if you tell them to be. By default, most compilers use a low warning level and use more advanced error checking only if instructed to do so. This is almost always done to remain compatible with older code. You can imagine the number of support calls a compiler vendor would get if the old code that used to compile fine all of a sudden started producing a lot of warning messages under a new release of the compiler.
7.9 Use "static" to Localize Knowledge Using static can sometimes be confusing because it appears to mean different things in different contexts. A simple rule that helps clear things up is that using static assigns permanent storage to an object and limits the visibility of the object to the scope in which it is defined. There are three basic ways in which static can be used. Static function declaration. You have already seen static used in this case with the LOCAL macro (see §6.6.7) Since a function already has permanent storage, this part of the rule is redundant. The scope of a function is file scope, so static limits the visibility of the function to the rest of the source file in which it is defined. External static variables. When a variable is defined at file scope, it already has permanent storage and is visible to all source files. Using static makes the variable invisible to other source files. Again, the variable definition already has permanent storage, so this part of the rule is redundant. Internal static variables. When a variable is defined within a function, it is called an automatic variable. Automatic variables have no persistence across function call invocations. In other words, if a value is assigned to an automatic variable in a block, that value is lost when the block is exited. In almost all cases, this is the desired behavior. However, using static assigns permanent storage to the variable so that a value assigned to the variable is not lost. The scope of the variable is also limited to the block in which it is defined. An internal static variable is just like an external static variable except that the visibility of the variable is limited to a block. It is also important to understand how an initialized static variable behaves. Consider the following code.
As you can see from this example, nTest1 is initialized only once, while nTest2 is initialized on each pass through the block. The rule is that the initialization of any static variables takes place at compile-time and that the initialization of automatic variables takes place at run-time.
7.10 Place Variables in the Block Needed How many times have you tracked down a bug only to realize that you used an automatic variable before it was properly initialized, or used the variable well after it should have been, when it no longer contained an appropriate value? While this may not happen to you as you write the function, it becomes a lot more likely when you go back to the code at a later date and modify it. The solution to this problem is to define variables only in the innermost scope in which they are needed. A new scope is created any time you use a begin brace {. The scope is terminated by an ending brace }. This means that your if statements, while statements, and so on. create new scopes. Variables defined within a scope are visible only within that scope. As soon as the scope ends, so does your access to the variable. Consider the following code fragment.
In this example, c is visible only within the loop. As soon as the loop exits, c is longer visible. In fact, it can be defined and reused in another scope. In standard C, variables can be defined only at the beginning of a scope before the main body of code. In C++, variables can be defined wherever a statement is valid. A useful #define for both C and C++ is the NewScope define.
NewScope is a syntactical place holder (defined to be nothing) that allows a new scope to be introduced into a program. It also allows for a natural indentation style.
NewScope is useful in both C and C++ because it allows variables to be created that are private to a block. As soon as the block exits, the variables declared in the block are no longer visible and cannot be referenced. 7.11 Arrays on the Stack When using arrays that are declared on the stack, you must be careful not to return a pointer to one of these arrays back to the calling function. Consider the following example.
This program contains a subtle bug in that myitoa() is returning the address of buffer, an array on the stack. It is subtle because despite this bug, the program still works properly! It is a bug to return the address of an array on the stack back to a calling function because after the function returns, the array is now in the part of the stack that no function owns. The problem with this bug is that the code will work unless the array gets overwritten by making another function call, although, making a function call is no guarantee that buffer will be overwritten. This bug is exactly like keeping a pointer around a block of memory that was freed. Unless the memory is reused and overwritten, accessing an object through the memory pointer will continue to work. As soon as the block of memory gets overwritten, you have a subtle problem to track down. 7.12 Pointers Contain Valid Addresses or NULL The reasoning behind this is simple. It helps find bugs. If all pointers contain either a valid address or NULL, then accessing the valid address does not cause the program to fault. However, accessing the NULL pointer under most protected environments causes the program to fault and you have found your bug. Consider what would happen if you have a pointer to a memory object, the memory object is freed and the pointer is not set to NULL. The pointer still contains the old address, an address to an invalid object. However, the address itself under most environments is still valid and accessing the memory does not cause your program to fault. This is a problem and the source of a lot of bugs. The reason for being able to access the memory of a freed object is that most memory managers simply add the memory back into the pool of free memory. Rarely do they actually mark this memory as being inaccessible to the program. It is time-consuming to communicate with the operating system on every allocation and deallocation request. Instead, a memory pool is maintained. Only when this pool is exhausted does the heap manager ask the operating system for more memory. To help enforce this policy, macros that interface with the memory manager should be designed and used exclusively. An example of this is NEWOBJ() or FREE() for class objects (see §4.5). 7.13 Avoid Type Casting Whenever Possible Type casting, by design, bypasses the type checking of the compiler. In essence, type casting tells the compiler that you know what you are doing and not to complain about it. The problem with this is that your environment may change, you may be porting the software to another platform, changing the memory model of the program or upgrading to a new revision of the compiler. For whatever reason, your environment has changed. When you recompile your program, you run the risk of missing warning messages. This is because the behavior of the statement in which you are using the type cast may have changed, but the type cast masks the behavior change.
7.14 Use sizeof() on Variables, not Types How do you use sizeof() in your code? Do you typically use sizeof() with a variable name or a data type? While at first glance the distinction may not seem to matter that much, at a deeper level it matters a lot. Consider the following example.
DumpHex() is a general purpose routine that will dump out an arbitrary byte range of memory. The first argument is a pointer to a block of memory and the second argument is the number of bytes to dump. Can you spot a possible problem in this example? The sizeof() in this example is operating on the int data type and not on the variable nVar. What if nVar needs to be changed in the future to a long data type? Well, sizeof(int) would have to be changed to sizeof(long). A better way to use sizeof() is as follows.
In this new example, sizeof() now operates on nVar. This allows DumpHex() to work correctly no matter what the data type of nVar is. If the type of nVar changes, we will not have to hunt down in the code where the old data type was explicitly used. 7.15 Avoid Deeply Nested Blocks There are times when, for any number of reasons, you end up writing code that is deeply nested. Consider the following example.
While this nesting is only four deep, I've had times when it would have gone ten deep. When nesting gets too deep, the code becomes harder to read and understand. There are two basic solutions to this problem. Unroll the tests. The first solution is to create a boolean variable that maintains the current success or failure status and to constantly retest it as follows.
Call another function. The second solution is to package the innermost tests into another function and to call that function instead of performing the tests directly. 7.16 Keep Functions Small The primary reason to keep functions small is that it helps you manage and understand a programming problem better. If you have to go back to the code a year later to modify it, you may have forgotten the small details and have to relearn how the code works. It sure helps if functions are small. As a general rule, try to keep functions manageable by restricting their length to one page. Most of the time functions are smaller than a page and sometimes they are a page or two. Having a function that spans five pages is unacceptable. If a function starts to get too large, step back a moment and try to break the function into smaller functions. The functions should make sense on their own. Remember to treat a function as a method that transitions an object from one valid state to another valid state. Try to come up with well-defined, discrete actions and write functions that perform these actions. 7.17 Releasing Debugging Code in the Product Is there such a thing as a debug build and a retail build of your product? Should there be? No, I do not think so! Let me explain why. I believe that any released application should be able to be thrown into debug mode on the fly at any time. In the applications that I develop, I have a global boolean variable called bDebugging that is either FALSE or TRUE. I place what I consider to be true debugging code within an if statement that checks bDebugging. This is usually done for debugging code that adds a lot of execution overhead. For debug code that does not add much overhead, I just include the code and do not bother with bDebugging. The benefit of doing this is that there is only one build of your product. That way, if a customer is running into a severe problem with your product, you can instruct the customer how to run your product in debug mode and quite possibly find the problem quickly. I do not consider WinAssert() and VERIFY() to be debugging code. In the programs that I write, WinAssert() and VERIFY() are not switchable by bDebugging. Instead, they are always on. The reasoning is simple. Would you like to know a bug's filename and line number in your program or would you just like to know that your program crashed somewhere, location unknown? If you object to the users of your product seeing assertion failures and run-time object verification failures, I recommend that you instead silently record the error in a log file. By doing this, you will have some record of what went wrong in the program when the customer calls you. In a program that ships with WinAssert() and VERIFY() on, the program alerts the user to the exact filename and line number of a problem. If the fault-tolerant syntax is used, the program continues to run. Oftentimes, just knowing that a program failed at this one spot is enough to scrutinize that section of the code and find the problem. It is important that the fault-tolerant forms of WinAssert() and VERIFY() be used. Doing so ensures that the program continues to run after a fault. Sometimes a filename and line number are not enough to track down a problem. At times like these, a stack trace is highly desirable. 7.18 Stack Trace Support A key debugging tool that I use for tracking down problems in my code is utilizing stack trace dumps from a fault. Sometimes only knowing the filename and line number of a fault is not enough to track down a problem. In these cases, the call history that led up to the problem is often enough. For example, I once had a program that was faulting at a specific filename and line number. This code was examined thoroughly but no problem could be found. So, a stack trace of the fault was obtained from the customer, which assisted me in pinpointing the problem immediately. As it turned out, a newly added feature had caused a reentrancy problem to occur in old code. Most development environments today provide sophisticated tools that allow the developer to quickly pinpoint problems in their code. What do you do when a customer calls up with a fault that you cannot reproduce? The customer is certainly not running the development environment that you are running. My solution to this problem is to add full stack trace capabilities into the application itself. At every point in the stack trace, a filename and line number are obtained. Unfortunately, the solution is specific to the underlying environment, so I cannot give a general solution, but I will do my best to describe the technique that I use.
7.18.3 The Benefits I have implemented full stack trace support along with hooking CPU faults, hooking Windows kernel parameter errors and displaying function arguments. What are the benefits? Great customer relations! In most cases, a stack trace is enough to track down a problem. In other words, I can track down a problem without first having to reproduce the problem. Customers begin to trust that a reported problem will get fixed and you end up with a robust product that the customer begins to trust and rely upon. 7.19 Functions Have a Single Point of Exit This has more to do with writing functions that are easily maintainable than anything else. If a function has one entry point, a flow of control and one exit point, the function is easier to understand than a function with multiple exit points. It also helps eliminate buggy code because using a return in the middle of a function implies an algorithm that does not have a straightforward flow of control. The algorithm should be redesigned so that there is only one exit point. In a sense, a return in the middle of a function is just like using a goto statement. Instead of transferring control back to the caller at the end of the function, control is being passed back from the middle of the function. 7.20 Do Not Use the Goto Statement I agree with the majority opinion that goto statements should be avoided. Functions with goto statements are hard to maintain. 7.21 Write Bulletproof Functions Who is responsible for making sure that a function gets called and used properly? Is it up to the programmer? Or is it up to the function that gets called? Let's face it, programmers make mistakes. So anything that can be done on the part of the function to ensure that it is being used properly aids the programmer in finding problems in the program. Consider a GetTextOfMonth() function. It takes as an argument a month, zero through eleven inclusive, and returns a long pointer to a three-character string description of the month. A naturally simple solution is as follows.
The only problem with this code is what happens when the input nMonth is not in the proper range of zero to eleven? The returned pointer points to something, but if treated as a string, it is more than likely much longer than a three-character string. If this string is being used in a printf() statement, the resultant buffer has a high likelihood of being overwritten, trashing memory beyond the end of the buffer and causing even more problems that need to be tracked down. The solution is to make GetTextOfMonth() completely bulletproof so that any value passed into it returns a pointer to a three-character string. One possible solution is as follows.
Notice how a fault-tolerant form of WinAssert() is being used. This ensures that a full stack trace is logged if the input parameter is invalid. Should this fault-tolerant code ever be removed? Once you get the code working that uses GetTextOfMonth(), you know there are no bugs, right? No, I do not think so! Do you know that your entire program is bug-free? There could be a bug in some totally unrelated part of the program that is causing a memory overwrite. If it just happens to overwrite a month number that you have stored in memory, you are in big trouble once again. Or what happens when you go back a year latter to modify the code that uses GetTextOfMonth()? You may introduce a subtle bug. The best way to write a bug-free program is to keep all the defenses up at all times. At least this way, you will know when there is a problem in your program. You may not know what the problem is, but just knowing that there is a bug is important information for maintaining a quality product. The best way to write a bug-free program is to keep all the defenses up at all times. 7.22 Writing Portable Code C is so successful because it is so flexible, flexible, that is, to the compiler writer because many key issues are left to the compiler writer to specify how they should work. This was done so that each implementation of C could take advantage of how a particular machine architecture works. For example, what is the sign of the remainder upon integer division? How many bytes are there in an int or long or short? Are members of a structure padded to an alignment boundary? Does a zero-length file actually exist? What is the ordering of bytes within an int, long or short? Most compilers provide a chapter or two in their documentation on how they have implemented these and many more implementation-defined behaviors. If writing portable code is important to you, I would suggest that you thoroughly read these chapters and adopt programming methodologies that avoid implementation dependent behavior.
7.24 Automated Testing Procedures An automated testing procedure is a function that is designed to automatically test your code for you -- code that you think is already bug-free. A key part of most testing procedures is their use of random number generator class objects. Let's suppose that you have just implemented a B-Tree disk-based storage heap for fast access to your database. How are you going to really test it? You could code examples that use the B-Tree class in order to test edge conditions. This is a good idea anyway, but what do you code next? A solution that I have found useful and highly effective is to use a random number generator to create data sets that are then thrown at the module to be tested. A random number generator is useful because if a problem is discovered, it can be absolutely recreated by using the same random number seed. In the case of the B-Tree code, you could randomly decide to add or delete a record from the tree and you could randomly vary the size of the records being added. You could also decide to randomly restart the test. As you add records into the database and read them back, how do you verify the integrity of the randomly sized record? One slick technique is to use another random number generator to fill in the record with random values. The slick part is that all you need to save around in memory to validate the record when it is read back in is the random number seed that was used to generate the random record, not the entire record itself. Another big advantage of using random number generators is that given enough time, they can test virtually all cases and code paths. It is a lot like throwing darts. If you keep on throwing a dart at the dart board, the center target is eventually hit. It is only a matter of time. The question is not if the target is hit, but when. What you are doing with the automated testing procedure is taking a module that is considered to be bug-free and subjecting it to a torture test of random events over time. Assuming there is a bug in the module, the automated testing procedure will find it eventually. It is important that the automated testing procedure be written so that it is capable of generating all types of conditions and not just the normal set of conditions. You want to make sure that all the code in the module gets tested.
7.25 Documentation Tools Programmers hate to write documentation, but they love to write code and most programmers are willing to comment their code to some degree. An even bigger problem is that even if documentation does exist, it is more than likely out of date because it hasn't been maintained to reflect code changes. My solution to this problem is to accept the fact that external documentation is not going to be produced directly. Instead, I am going to produce it indirectly. By having all programmers follow a common documentation style in the entire project, it is possible to write a program that scans all source files and produces documentation. I use markers in comment blocks to assist me in parsing my comments. For example, module comment blocks begin with /*pm, APIENTRY function comment blocks begin with /*pf and LOCAL function comment blocks begin with /*p. In practice, this works great. The AUTODOC program that I use scans all sources files and produces a Microsoft Quick Help file as output. The Brief editor that I use supports Quick Help. I now have instant access to all APIENTRY function documentation at the touch of a key. 7.26 Source-Code Control Systems If you are not already using a source-code control system, I would highly recommend that you get one. I like them because they give me access to the source as it existed all the way back to day one. It is also essential for tracking down problems in released software. You may end up with two or three different versions of your software that are all in active use. The source-code control system gives you easy access to the source of any particular version of your software. Most source-code control systems follow a get and put methodology. Getting a source file gives the "getter" editing privileges to the source. When changes are complete, the source is put back. Before I put back any source, I produce a difference file and review all the changes that I have made to the source. On more than one occasion this has saved me from including a silly programming bug. 7.26.1 Revision Histories An important part of maintaining software is keeping an accurate log of what changes were made to a module and why. Rather than keeping this information in the source file itself, I prefer to use the source-code control system. In the source-code control system that I use, a put will prompt me to enter a description of the changes that I have made to the source file. The entire revision history is available at any time and is maintained by the source-code control system. In modules that get changed a lot, this technique keeps around the full revision history without cluttering up the source file.
This book was previously published by Pearson Education, Inc., formerly known as Prentice Hall. ISBN: 0-13-183898-9 |