Would you build a skyscraper without a proper, solid foundation? Of course not. Would you build a large application without a rock-solid base system? I wouldn't, and yet I get the feeling that this is happening every day. Do you consider the standard C library to be a rock-solid base? Before you answer this, I need to clarify what I consider to be rock-solid.
A rock-solid function must first of all be bug-free itself. The function must provide a clean, intuitive interface. What hope would you ever have if you were constantly making mistakes in how a function is called? Function names must clearly state what the function does. What good is a function named DoIt()? The function must detect and report invalid function arguments. The function should be fault-tolerant. The program you are working on should not crash simply because you called a function incorrectly.
Do you now consider the standard C library to be a rock-solid base? Many functions are, but many functions are not. Consider the heap management routines in the C library, specifically, the free() function. The free() function deallocates a block of memory that was previously allocated through the malloc() function call.
What happens when you pass free() a completely random value, or a value that you have already previously passed to free()? Your program may bomb immediately. If it doesn't, the heap may be corrupted. If it isn't, some memory may have been overwritten. The point is that not all C library functions are rock-solid. Why not first code a layer on top of the system base that is rock-solid?
As you code your program, you need to consider the current program as it stands as the base for whatever new features you are putting in. Once done, this is the new base for the next set of features. As you code, make sure that the current base is rock-solid; that it is fault-tolerant and that it catches incorrect usage of functions. If the base is not rock-solid, you need to make it rock-solid.
3.1 System Functions Contain Bugs
Your underlying operating system or development environment has bugs in it. Since there is no such thing as a completely bug-free system, try to find out as much as you can about the environment you are working on. Try to obtain bug lists if they are available.
3.1.3 What to Do
The point in demonstrating to you that bugs do exist in MS-DOS and Windows is to emphasize that sometimes even system level functions fail. Code that you think never fails is bound to fail sometime.
My reaction to having system level functions fail me has been to provide another layer of code between my application code and the system level functions that checks for assumptions that I am making. At some point, you write code that makes an assumption. Consider the following code.
Do you see what the assumption is? The assumption being made in this code is that the close() function never fails. Well then, why not assert this? The close() returns zero for success, otherwise non-zero for failure.
The best solution is to provide a wrapper function around each and every system call. Assert any assumptions that are being made within this wrapper function. Placing the assert in the wrapper function once instead of every place it is being called is a lot less error prone.
3.2 Using Macros to Aid Porting
With all the different machine architectures that are in use today, how in the world do you write code so that it can be ported easily? C provides an excellent mechanism for conditional compilation, but this is only a small part of the solution.
How do you handle segmented versus non-segmented architectures? What about C and C++? There are slight differences between the two languages.
One solution that works really well is to abstract out the interdependencies between the environments into a set of macros so that the code base does not have to change.
3.2.1 Segmented/Flat Architecture Macros
A number of #defines that provide a basis for further development are as follows.
FAR and NEAR. These are used to abstract out the near- and far-segmented architecture. NEAR implies a 16-bit pointer and FAR implies a 32-bit pointer. When porting to a non-segmented architecture, these can be defined to be nothing.
FASTCALL and PASCAL. These are used to specify the calling convention of a function primarily for optimization purposes. When porting to a non-segmented architecture, these can be defined to be nothing.
EXPORT. This define is applicable to Windows DLL programming. Otherwise, it can be defined to be nothing.
BASEDIN. This define is primarily used by the CSCHAR macro to place character strings within a code segment primarily for optimization purposes. When porting to a non-segmented architecture, it can be defined to be nothing.
In most cases, these macros are used in other macros or in typedef's so that the code base is not cluttered up. For example, to declare a far pointer to a char so that it works equally well under a segmented and non-segmented architecture, you could do the following.
However, using char FAR* will just clutter up all the source files with FAR. The solution is to use a typedef to declare what a far pointer to a char is once.
Now LPSTR should be used instead of char FAR*. The concept of trying to hide how something works provides an abstraction that aids porting and allows for clean source code.
3.2.2 Using EXTERNC
If you are coding under C++, name mangling can sometimes be a problem. This happens under Windows DLL coding when a .def file is used. Functions that are exported must be specified in the .def file, but name mangling can make it almost impossible to type in the names manually. Luckily, C++ provides a solution in the form of a linkage specification. The #define's you can use are as follows.
Under C++, EXTERNC gets defined to be extern "C". Under C, EXTERNC gets defined to be nothing. EXTERNC is used in function prototypes as follows.
You need to use EXTERNC only in the prototype for a function, not in the source code where the function is actually written. This is how Microsoft C8 works.
3.3 Using WinAssert() Statements
The WinAssert() statement is the classic assertion statement with a few twists. Why use assertion statements? The key reason is to verify that decisions and assumptions made at design-time are working correctly at run-time. There is a difference between WinAssert() and CompilerAssert() §2.1.4. Both check design-time assumptions, but CompilerAssert() provides the check at compile-time, whereas WinAssert() provides the check at run-time.
An assert macro is provided by the Microsoft C8 library in assert.h. It takes any boolean expression. Nothing happens when the assert macro is true. If the assert macro is false, however, the _assert() function is called with a string pointer to the text of the boolean expression, a string pointer to the text of the source file and an integer line number where the error occurred. Usage of the stringizing operator (#) is described in §2.2.7 . This information is then formatted and displayed by _assert().
A problem with this macro is that it ends up placing too many strings in the default data segment. One easy solution is to remove the #exp argument, which is turning the boolean expression into text. After all, the file and line number are all that are needed to look up the boolean expression. Also, every time the assert() macro is used, a new string __FILE__ is created. Some compilers are able to optimize these multiple references into one reference, but why not just fix the problem? My solution to the problem is to declare a short stub function at the top of each source file which references __FILE__. WinAssert() then calls this stub function with the current line number.
An interesting twist that has been added to WinAssert() is that it supports writing code that is fault-tolerant. If a design-time assumption has failed, should you really be executing a section of code? I say no! The WinAssert() statement may be followed by a semicolon or by a block of code. The block of code will be executed only if the assertion succeeds.
The WinAssert() is implemented through a set of macros as follows.
If WinAssert() is used in a source file, USEWINASSERT must appear at the top of the source file somewhere.
In addition to the WinAssert() macro, an AssertError() macro is provided for those times that you want to unconditionally force an error to be reported.
The reporting process of an assertion failure starts by calling a function that is local (private) to the source file. The function is _DoWinAssert() and the argument is the line number where the failure occurred. The body of _DoWinAssert() is straightforward except for the inclusion of WinAssert(nLine). Since the line number is never zero, this appears to have no purpose. This trick forces _DoWinAssert() to be compiled into the module, even if there are no references to the function in the rest of the file. Otherwise, Microsoft C8 removes the unreferenced function.
Another subtle problem is that if _DoWinAssert() is declared to be a LOCAL function (described in Chapter 6), the optimizing Microsoft C8 compiler will not build a stack frame for this function. For this reason, it has the static NEAR attributes instead of the LOCAL attribute, which allows the stack frame to be built.
In addition to these defines, WinAssert() requires that ReportWinAssert() be defined somewhere. I define it in a DLL so that the function needs to be coded only once.
Once done, any other application has access to it. ReportWinAssert() allows you to display the assertion error in whatever way is appropriate at your organization. In my ReportWinAssert(), I log the filename, line number and stack trace to a log file and issue a system modal message box requesting that the user report the error. See §A.7 for example implementations.
One of the key things you must remember is that the argument to WinAssert() must have absolutely no side effect on any variables. It must only reference variables.
This is in case a policy of removing assertion statements from the code before releasing the product is enforced. While I do not recommend that you remove assertion statements, you still want to play it safe. You do not want to end up accidentally removing code that is needed to make your program run correctly.
3.4 Naming Conventions
One of the most important aspects of programming is the use of a consistent naming convention. Without one, your program ends up being just a jumble of various techniques and hence hard to understand. With a naming convention, your program is more readable and easier to understand and maintain.
I will describe the naming conventions that I have used to code a large application that have worked quite well for me.
3.4.1 Naming Macros
Macro names should always be in uppercase and may contain optional underscore characters. For macros that take arguments, I prefer not to use the underscore character anywhere (e.g., NUMSTATICELS()). For macros that define constant numeric values, underscore characters are OK (e.g., MAX_BUFFER_SIZE).
Macro names in uppercase stand out and draw attention to where they are located. Some macros that are universally used throughout almost all code are allowed to be in mixed upper- and lowercase. An example of a macro like this is the WinAssert() macro §3.3.
There are many times that a set of macros contain a common subexpression. When this happens, I create another macro that contains the common sub-expression. The sole purpose of this type of macro is that it is to be used by other macros and not in the source code. A naming convention I use to help me remember that the macro is private to other macros is to name it with a leading underscore character.
3.4.2 Naming Data Types
I can remember the difficulty I had coming up with good data type names when I first started to code. I was using mixed upper- and lowercase for data type names and variable names. However, it became harder and harder to read the program. I always ended up wanting the variable name to be spelled the same as the data type name but could not do this, so I ended up calling it something different which made the program hard to understand.
The convention that I finally settled upon is that all data types should be in uppercase. The variable names can then be spelled the same, but in mixed upper- and lowercase. This convention may at first seem awkward, but in practice I have found that it works well.
Data type names should also avoid using the underscore character. This is because macro names may use the underscore character and it is best to avoid any possible confusion or ambiguity over whether or not an uppercase name is a data type name or macro name.
3.4.3 Declaring Data Types
New data types must be declared with a typedef statement. While it is possible to use a macro to create what looks like a new data type, it is not a true data type and is subject to subtle coding problems.
Consider the data type PSTR, shown below, which is a character pointer.
In the above example, what is the type of pA and what is the type of pB? In the case of using typedef to create the new data type, the type of pA and pB is a character pointer, which is as expected. However, in the case of using the macro to create the new data type, the type of pA is a character pointer and the type of pB is a character. This is because PSTR pA, pB really represents char *pA, pB which is not the same as char *pA, *pB.
This example shows the danger in using macros to declare new data types in the system. Therefore, you should avoid using macros to declare new data types.
3.4.4 Naming Variables
All variables should be named using the Hungarian variable naming convention with mixed upper- and lowercase text and no underscore characters.
The Hungarian naming convention states that you should prefix all variable names with a short lowercase abbreviation for the data type of the variable name. (See Table 3-1).
For example, nSize is an integer, bOk is a BOOL and hIcon is an abstract handle. Prefixes may be combined to produce a more descriptive prefix. An example would be lpnCount, which is a long pointer to an integer and lpanCounts, which is a long pointer to an array of integers.
The advantage of Hungarian notation is that you are much more likely to catch a simple programming problem early in the coding cycle, even before you compile. An example would be nNewIndex = lIndex+10. Just by glancing at this you can see that the left-hand side is an integer and the right-hand side is a long integer. This may or may not be what you intended, but the fact that this can be deduced without seeing the original data declarations is a powerful concept.
The Hungarian notation handles all built-in data types, but what about derived types? A technique that I have found useful is to select an (uppercase) data type name that has a natural mixed upper- and lowercase name.
This convention works great for short data type names like HICON, but not so well for long data type names like LPQUEUEENTRY. The resulting variable name lpQueueEntry is just too long to be convenient. In this case, an abbreviation like lpQE should be used. However, make sure that lpQE is not also an abbreviation for another data type in your system.
Whatever technique you use to derive variable names from data type names is fine provided that there is only one derivation technique used in your entire program. A bad practice would be to use lpQE in one section of code and lpQEntry in another section of code.
3.4.5 Naming Functions
Functions should be named using the module/verb/noun convention in mixed upper- and lowercase text with no underscore characters.
Suppose you have just started a project from scratch. There is only one source file and the number of functions in it is limited. You are naming functions whatever you feel like and coding is progressing rapidly. Two months go by and you are working on a new function that needs to call a specialized memory copy routine you wrote last month. You start to type in the function name, but then you hesitate. Did you call the function CopyMem() or MemCopy()? You do not remember, so you look it up real quick.
This actually happened to me and the solution was simple. Follow the Microsoft Windows example of naming functions using the verb/noun or action/object technique. So, the function should have been called CopyMem().
This solved my immediate problem, but not the long-term problem. It wasn't long before I had thousands of function names, some with similar sounding verb/noun names. My solution was to prefix the verb/noun with the module name.
Suppose you have a module that interfaces with the disk operation system of your environment. An appropriate module name would be Dos and several possible function names are DosOpenFile(), DosRead(), DosWrite() and DosCloseFile().
Module names should contain two to five characters, but an optimum length for the module name is three to four characters. You can almost always come up with a meaningful abbreviation for a module that fits in three to four characters.
3.5 Chapter Summary
This book was previously published by Pearson Education, Inc.,
formerly known as Prentice Hall. ISBN: 0-13-183898-9