Coding Style Guide

Tectrix Coding Style Guide

From: Thatcher

To: Software team

Date: December 30, 1996

This guide codifies a variety of basic practices. Some of these practices are techniques that I’ve found have improved my programming ability, and should do the same for you. Other practices are just conventions to make it easier for everyone to work together. I expect everyone to spend some time reading this guide and work on applying it to your own coding

Goals

Our code should be readable, maintainable, and correct, in that order. If the code isn't readable, then it's hard to know if it's correct. If it's not maintainable, then it's hard to make it correct. And the code has to be correct, or we’re hosed. Efficiency is always nice, and sometimes required, but it generally doesn’t take precedence over the first three goals.

To make code that's readable, maintainable, and correct, it helps to think through the design carefully, give it the appropriate amount of structure, and adhere to a neat and consistent style. As you develop new code or edit old code, you should be thinking about how what you’re doing fits into the larger picture. You should make appropriate design changes if necessary. If something’s wrong with the design, try to avoid just patching it over because it’s "quicker"; consider the overall quality of the system. Otherwise, entropy will catch up with us.

Modules

This is a quote from the notes for my undergraduate software engineering course:

What is a Module?

Intuitively, a module is a cohesive component. Problem decomposition consists of defining modules. Each module possesses certain defined input and output behavior. The purpose of a module is to encapsulate data and operations under a single unit with a clean interface to the rest of the system. Ideally, a module accomplishes one thing: it might hide data, or it might abstract a set of operations.

(from The Mythical Man-Month Revisited, James Coggins)

C/C++ books often define "module" as a single source file that goes into a program. I prefer to define "module" in the "cohesive component" sense, and just say "source file" when I mean a single file containing source code. Used in this way, there are all kinds of things that can be thought of as modules. "Module" is the abstract term that can refer to a well designed library, a set of functions contained in several source files which has a clean interface, a single C++ class, or even a single function. Modules can, and often do, contain sub-modules.

The most important programming guideline I can give is: think in terms of modules. Every time you create or edit a piece of code, you should ask yourself, "what modules is this piece of code a part of, and how does it fit into the purpose of those modules?"

In Tectrix code, modules are packaged in various ways depending on their size and relationship to the project. These include: libraries, related groups of source files, single source files, C++ classes, groups of functions, and single functions. All of these modules follow the guideline of public interface, private implementation. All external access to a module should go through its public interface.

Libraries

If the module is a library, it has a name, a directory in which the code is located, and a header file. The directory has the same name as the module. The header file has the module name, plus the extension .HPP. Since we still use DOS tools a lot, the name should fit in 8 characters or less. The name relates to the purpose of the module and should be easy to remember and type.

The header file contains a definition of the public interface to the module, including a declaration of all externally callable functions, all classes and types used in the interface, and any required constant and symbol definitions. Functions, classes, types, etc. not integral to the public interface are not declared in the header file. Functions, classes, etc. shared within the module, but not part of the public interface, can instead be defined in the file PRIVATE.HPP.

The names of public functions in a module are prefixed with the name of the module or an abbreviation of it. For example, public functions belonging to the RENDER library have names like render_Open, render_SetClipbox, render_Draw, etc.

Typically, such a library will define the functions prefix_Open and prefix_Close. Before using the module, programs should call the Open function. Before exiting, they should call the Close function.

example:

The CyberGear Run Time (CGRT) library of Sweeney Town, St. Benjamin, etc. has a subdirectory called RENDER containing RENDER.HPP which defines the public interface, plus files that make up the private implementation, including PRIVATE.HPP, RENDER.CPP, VIEW.CPP, SCREEN.CPP (an example of a sub-module), a makefile, etc.

Collections of source files

If the module is comprised of a collection of source files, but isn’t packaged into a library, its public interface may be defined in the header file for an enclosing module (e.g. the sfx_ module is a sub-module of the SOUND library, and its public interface is defined in SOUND.HPP), or it may have its own header file. Modules whose code is contained in a single source file follow the same rules. If there is a group of source files, each source file can usually be classified as implementing a separate sub-module.

C++ classes

A C++ class is another kind of module. Often, the class has a header file and a source file containing the implementation. Other times, the class is a part of a larger module that has a header file, and the class is defined in the larger module’s header file. Or, a class may be private to a module, and not appear in the module’s public interface.

example:

The RMode class is a type of referee that keeps track of the major mode that a machine is in. Its class definition is in RMODE.HPP, and its implementation is in RMODE.CPP. The vector class is defined in GEOMETRY.HPP, and implemented in VECTOR.CPP, a part of the GEOMETRY library.

Groups of functions

Sometimes a module will be made up of a group of several functions that are private to a single source file. This kind of module is analogous to a library, but its scope is much more narrowly defined. In this case, the public interface to the module is defined by a set of prototypes, and the code is defined later. Alternatively, the functions might just appear before they are used -- here the separation between interface and implementation is not as explicit, but it is still there; the implementation is in the function body while the interface is defined by its declaration.

example:

In GEOMETRY\BITMAP.CPP, the source file containing the implementation of the bitmap class, there is a small sub-module, consisting of two functions. The prototypes (public interface) appear early in the source file:

// Special bitmap allocate/free functions.
card8*	BitmapAllocate(int xsize, int ysize);
void	BitmapFree(card8 *bmap, int xsize, int ysize);

and the actual function definitions (private implementation) occur at the end of the source file. In this case, the module is much less formally defined than in other examples -- it doesn’t even have an explicit name. But, it is still a module, and follows the rule of public interface, private implementation.

Naming modules

As mentioned above, the name of a module packaged as a library should be a descriptive, easy to type word that’s shorter than 8 characters. The names of modules packaged in groups of one or more source files follow the same general guidelines. The names of C++ classes generally fall into one of two categories: They either have long, descriptive names with internal capitalization, like AbstractBike, or they have shorter, lowercase names, potentially including underscores, like vector or line_segments. Classes in the first category are usually higher level classes. Classes in the second category are usually lower-level utility data types.

Writing Readable Code

Don’t try to hand-optimize source code to make it more compact. Some C programmers (perhaps misled by the K&R book) like to cram lots of operations into one line of code. We strongly prefer easy readability over compactness. For example, we write:

	// Increment the cursor position and the buffer end.
	BufferEnd++;
	Data[BufferEnd] = '\0';

and not:

	Data[++BufferEnd] = '\0';

Remember that the latter coding is no faster at run-time. I’ve never seen an instance where the Watcom object code was improved by such minor optimizations. Similarly, this code:

	x0 = BigStructure[index].x0;
	y0 = BigStructure[index].y0;
	x1 = BigStructure[index].x1;
	y1 = BigStructure[index].y1;
	index++;

may look wasteful, but the compiler does the right thing. Modern compilers can actually generate worse code if the source is made more compact by introducing an alias (a pointer or reference).

Lesson: Stick with clear, boring, obvious code. Convey your coolness in your comments.

Writing Reliable Code

(Taken from a note of Thatcher’s, November 1996)

Always initialize class data members to some value in the constructor. If you don't have a preferred initial value, set pointers to NULL and integers to 0, etc., so at least the program will (mis)behave repeatably.

Writing "Good Code"

Code Complete is chock full of wisdom about the definition of "good code", and advice on how to write it. It’s worth your while to look through this book and think about its recommendations. Writing Solid Code is another MSPress book that’s worth checking out.

Documentation

Data structures, code structure, identifiers, and program statements are all forms of documentation. Comments are just another, supporting, form of documentation. Ideally, the program statements themselves should be readable enough so that someone can get the meaning without too much trouble (that’s not always the case in practice, but it’s a worthy goal). Comments should serve as a synopsis and explanation of the intent of a section of program statements. If you find yourself detailing at great length how a section of program works, then maybe you should figure out how to re-write the code more clearly.

Comments should generally be written in complete sentences. Sometimes it makes sense to make a short comment that's a fragment or just a word. Use your judgment.

Comments are part of the code! If you need to edit a piece of code, read the comments as well as the program statements themselves before proceeding. Edit the comments as well as the program statements, if necessary. Errors in comments are bugs, and should be treated as such.

As much as possible, comments should be physically near the program statements they describe, and not excessively long. Otherwise, they can be less readable and harder to maintain.

Typographical Guidelines

This is a list of rules that should help keep our code readable and familiar.

Source file headers

Like this:

// filename.cpp	-creator mm/dd/yy	Copyright Tectrix

// Description of what this file is for. Doesn’t have to be
// very long.

e.g.:

// main.cpp	-thatcher 10/6/94 Copyright Tectrix

// Contains the main() function of our VR runtime system, plus
// some helper functions.

Names

Names are an important part of the code, and are worth a little effort. This applies to maintenance as well; if you change the meaning or usage of something, make sure its name reflects the current meaning. This applies to everything: project names, source files, types, libraries, functions, variables.

Stand-alone functions, class member functions, class data members, type names: use internal capitalization and avoid abbreviations when reasonable:

FinalNull
NavigationController
SetControlParameters()
UserIconCount
ProcessUpdatePacket()

Public functions attached to a major module or library should have the module prefix prepended:

util_Error();
render_Flip();
geo_AddObject();

Limited scope variable names can have short, lowercase names when appropriate.

{
	int	i;
	for (i=0; i<10; i++) {
		ColorArray[i] = 0;
	}
}

Function definitions

This is the basic format for the definition of a function in a source file. For the corresponding declaration in a header file, just use the first line (with a semicolon after it).

void	SetPixel(int x, int y, int color)
// This function sets a pixel at the given (x,y) location to the given
// color. If (x,y) is off the screen, then no pixel is set.
// (0,0) is at the lower-left corner of the screen; (319, 199) is at the
// upper right.
// This function works only in VGA mode 13h.
{
	if (x >= 0 && x < 320 && y >=0 && y < 200) {
		// Point is on the screen. Calculate the byte offset, and
		// store the given color value there.
		unsigned char*	offset = (199 - y) * 320 + x;
		*offset = color;
	}
}

I usually write the header comment after I’ve named the function and before I type any code. Often just writing the comment clarifies the purpose of the function and helps improve the design.

Keep this description up to date! Change it as you change the contents, name, or parameters of the function.

The // method of commenting is generally preferred over the /*...*/ method, but the second method occasionally makes more sense.

Indentation

Use tabs, pick your tab size using epsilon. The "official" (i.e. the one I use) tab width is 8 spaces per tab. I use 8 spaces because it makes the flow of the program statements very obvious, and I’ve found it discourages confusing, bug-prone control structures.

Bracketing and indentation

Control blocks: you can either do this:

	while (some condition) {
		do some stuff;
	}

or this:

	while (some condition)
	{
		do some stuff;
	}

Generally the first style is preferred, but you should use the second style if the conditional doesn’t fit on a single line.

Switch statements: I prefer to do this:

	switch (variable) {
	case 1:
		do some stuff;
		break;
	case 2:
		do some other stuff;
		break;
	}

Instead of the more traditional:

	switch (variable) {
		case 1:
			do some stuff;
			break;
		....
	}

because it keeps the indentation somewhat under control, and makes at least as much syntactic sense.

Use blank lines to separate conceptual units of code. Generally it’s nice to have a comment above these chunks, but use your judgment.

More Miscellaneous Guidelines

* One tab between declaration specifier and variable name:

int	i, j;
float	f;
char	*FinalNull;
NavigationController	*Controller;

* One space after each comma:

	in = fopen("filename.txt", "r");

* Two blank lines between function bodies or other major line groupings:

int	AddOne(int x)
// Returns x + 1.
{
	return(x + 1);
}


int	AddTwo(int x)
// Returns x + 2.
{
	return(x + 2);
}

* Blank space after language keywords. No blank space between function name and its parameter list.

	if (p == NULL) {
		printf("Pointer is null.\n");
	} else {
		printf("Pointer is linked to %x.\n", p->Next);
		p = p->Next;
	}
	for (i = 0; i < 20; i++) {
		printf("%d! = %d\n", i, factorial(i));
	}

* Spaces around operators, usually. Sometimes it's OK to leave the spaces out.

	i++;
	i = i + j;
	if (i < 1 && j > i) {
		j = i + 1;
		for (k=0; k < j; k++) {
			j += k;
		}
	}