Pismere Coding Standards

by Danilo Almeida

Revision 0.4 - This is a draft. Please send feedback to pismere-team@mit.edu for discussion.

Comments that need to be addressed are stuck into the draft are in red.

TBD: exception handling, actual sample MIT copyright.

NOTE: This first cut has been cobbled together from GNU and Athena coding standards. It still needs a bunch of re-org work. Please feel free to send suggestions. Currently, we only deal with C. We need to add C++-specific stuff.

Formatting Formatting Your Source Code
Header Files Header Files
C Pre-Processor Using the C Pre-Processor
Comments Commenting Your Work
Syntactic Conventions Clean Use of C Constructs
Names Naming Variables and Functions
Software Engineering Software engineering practices
Namespaces Symbol namespaces
Portability (3rd party) Portability considerations for code we get from third parties


Formatting Your Source Code

If possible, try to have no source code lines longer than 79 characters. Do not store tabs in a line after any non-tab character. Tab stops must be set to 8 characters in your editor. That should be independent from the indentation setting. The indentation setting should be 4 characters.

QUESTION: Do we want to outlaw tabs altogether?

Putting Emacs to Work for You

You can use Emacs to help you format your source code. First, you add the emacs locker by running \\tremont\locker\addlocker emacs from a prompt. Then, from that prompt, start up Emacs with the command-line: runemacs.cmd (the .cmd is important).

You can then hit TAB to indent the current line or select an entire region (like, say, the entire buffer) and do M-x indent-region. The site-start.el in the site-lisp directory of Emacs shows how we set up our Emacs to do the proper indenting (and syntax highlighting).

Back to Formatting

It is important to put the open-brace that starts the body of a C function in column zero, and avoid putting any other open-brace or open-parenthesis or open-bracket in column zero. Several tools look for open-braces in column zero to find the beginnings of C functions. These tools will not work on code not formatted that way.

It is also important for function definitions to start the name of the function in column zero. This helps people to search for function definitions, and may also help certain tools recognize them.

So as to make it easy to read, parse, and modify argument lists, we list the arguments one per line, indented at 4 characters. This makes it so that we don't have to worry about splitting arguments, etc. Thus, the proper format is this:

static char *
concat(
    char *s1,
    char *s2
    )
{
    ...
}

Put a comment explaining the function before the function definition. The format is decribed in the Commenting Your Work.

Write prototypes for all of your functions. Include argument names in your prototypes, to help the reader understand what they mean. Use the formatting described above. You should simply be able to copy the function header as the prototype. Leave the variable names in the prototype so as to make the prototypes more self-documenting (and to make it so you can just simply copy-paste when updating them).

When declaring local variables, only declare one variable per line. This makes teh code easier to read.

For the body of the function, we prefer code formatted like this:

    if (x < foo(y, z))
    {
        haha = bar[4] + 5;
    }
    else
    {
        while (z)
        {
            haha += foo(z, z);
            z--;
        }
        return ++x + bar();
    }

In an if-else clause, we always put the enclosing curly braces to reduce the possibility of error and to make the code easier to read. On a singleton if statement, the curly braces are optional.

We find it easier to read a program when it has spaces before the open parenthesis and after the commas. Especially after the commas. However, when making a function call, do not put a space before the open parenthesis. Do put spaces before the open parenthesis after "while", "if", or "for", though.

When you split an expression into multiple lines, split it after an operator, not after one. Also include extra ()s to make the operator precedence more explicit. This helps make the code easier to read. Here is the right way:

    if (foo_this_is_long && and_so_is_this_variable && (bar > win(x, y, z)) &&
        remaining_condition)

Try to avoid having two operators of different precedence at the same level of indentation. For example, don't write this:

    mode = (inmode[j] == VOIDmode ||
            GET_MODE_SIZE(outmode[j]) > GET_MODE_SIZE(inmode[j]) ?
            outmode[j] : inmode[j]);

Instead, use extra parentheses so that the indentation shows the nesting:

    mode = ((inmode[j] == VOIDmode) ||
            (GET_MODE_SIZE(outmode[j]) > GET_MODE_SIZE(inmode[j]))) ?
            outmode[j] : inmode[j]);

Insert extra parentheses so that Emacs will indent the code properly. For example, the following indentation looks nice if you do it by hand, but Emacs would mess it up:

    v = rup->ru_utime.tv_sec*1000 + rup->ru_utime.tv_usec/1000 +
        rup->ru_stime.tv_sec*1000 + rup->ru_stime.tv_usec/1000;

But adding a set of parentheses solves the problem:

    v = (rup->ru_utime.tv_sec*1000 + rup->ru_utime.tv_usec/1000 +
         rup->ru_stime.tv_sec*1000 + rup->ru_stime.tv_usec/1000);

Format do-while statements like this:

    do
    {
        a = foo(a);
    }
    while (a > 0);

When writing return statements, do not put parentheses around the return value; just write "return foo;".


Header Files

Each header file should check for and define a unique symbol to prevent it from being included multiple times during compilation. For example:

#ifndef __NameOfHeaderFile_ExtensionOfHeaderFile__
#define __NameOfHeaderFile_ExtensionOfHeaderFile__
...
#endif /* __NameOfHeaderFile_ExtensionOfHeaderFile__ */

If your header file name is not particularly unique, you can prefix it with the name of the compoment to which it belongs. Or you might add some unique string to the end.


Using the C Pre-Processor

When defining macros, use all-caps names. If you cannot do that for some reason, append _macro at the end of the macro. This will help people debugging your code to identify what are macros and what are real symbols.

Make sure you comment any pre-processor symbols or macros that you define.

Never put an #ifdef around the start or end of a control block. That will screw up automagic indenting tools. So, never do something like this:

#ifdef FOO
    if (foo_func(x))
#else
    if (normal_func(x))
#endif

Instead, if you really need to do this, use:

    if (
#ifdef FOO
        foo_test(x)
#else
        normal_test(x)
#endif
        )

Avoid sprinkling your code with #ifdefs. Instead, use an abstraction for the functionality you wish to conditionally use. For example, instead of having the above code fragment sprikled thorughout a source file, use the following:

#ifdef FOO
#define TEST(x) foo_test(x)
#else
#define TEST(x) normal_test(x)
#endif

...

    if (TEST(x))

Under Win32, _WIN32 is always defined by the compiler. Test for that rather than WIN32.

The pismere source tree use NODEBUG in Makefiles to check whether we are building debug or not. If NODEBUG is set, we are building release binaries. If NODEBUG is not set, we are building debug binaries. When debug binaries are built, our Makefiles define _DEBUG. For release, we define NDEBUG. Only check for _DEBUG. We do this to resemble the Platform SDK definitions.


Commenting Your Work

Each C source and header file should begin with an MIT copyright statement, a comment describing what the file does. MIT copyright statements may be omitted for source code which is not distributed outside of MIT.

Every program should start with a comment saying briefly what it is for. Example: fmt - filter for simple filling of text.

Please write the comments in a program in English, because English is the one language that nearly all programmers in all countries can read. If you do not write English well, please write comments in English as well as you can, then ask other people to help rewrite them. If you can't write comments in English, please find someone to work with you and translate your comments into English.

Please put a comment on each function saying what the function does, what sorts of arguments it gets, and what the possible values of arguments mean and are used for. It is not necessary to duplicate in words the meaning of the C argument declarations, if a C type is being used in its customary fashion. If there is anything nonstandard about its use (such as an argument of type char * which is really the address of the second character of a string, not the first), or any possible values that would not work the way one would expect (such as, that strings containing newlines are not guaranteed to work), be sure to say so.

Also explain the significance of the return value, if there is one.

Please put two spaces after the end of a sentence in your comments, so that the Emacs sentence commands will work. Also, please write complete sentences and capitalize the first word. If a lower-case identifier comes at the beginning of a sentence, don't capitalize it! Changing the spelling makes it a different identifier. If you don't like starting a sentence with a lower case letter, write the sentence differently (e.g., "The identifier lower-case is ...").

The comment on a function is much clearer if you use the argument names to speak about the argument values. The variable name itself should be lower case, but write it in upper case when you are speaking about the value rather than the variable itself. Thus, "the inode number NODE_NUM" rather than "an inode".

Here is a simple example:

/*
 *  concat
 *
 *    Concatenates two strings and returns a pointer to the new string.
 *
 *  Parameters:
 *
 *    s1 - Pointer to a null-terminated string.
 *
 *    s2 - Pointer to a null-terminated string.
 *
 *  Returns:
 *
 *    A new null-terminated string that consists of the first string 
 *    followed by the second string.  If both s1 and s2 are null, null
 *    is returned.  If only one of them is null (or an empty string), the 
 *    returned string only contains the non-null string.
 *
 *  Remarks:
 *
 *    Use free_string to free the returned string.
 *
 *  Implementation:
 *
 *    We use a clever algorithm where...
 *
 */

Another big-block style you can use is

/******************************************************************************
 *
 *  concat
 *
 *  ...
 *
 *****************************************************************************/

Try to avoid in-function comments, explaining things up-front when appropriate. Do put comments inline when things may be tricky.

There should be a comment on each static variable as well, like this:

/* Nonzero means truncate lines in the display;
   zero means continue them.  */
int truncate_lines;

Every #endif should have a comment, except in the case of short conditionals (just a few lines) that are not nested. The comment should state the condition of the conditional that is ending, including its sense. #else should have a comment describing the condition and sense of the code that follows. For example:

#ifdef FOO
  ...
#else /* !FOO */
  ...
#endif /* !FOO */
#ifdef FOO
  ...
#endif /* FOO */

but, by contrast, write the comments this way for a #ifndef:

#ifndef FOO
  ...
#else /* FOO */
  ...
#endif /* FOO */
#ifndef FOO
  ...
#endif /* !FOO */

Clean Use of C Constructs

Please explicitly declare all arguments to functions. Don't omit them just because they are ints.

Declarations of external functions and functions to appear later in the source file should all go in one place near the beginning of the file (somewhere before the first function definition in the file), or else should go in a header file. Don't put extern declarations inside functions.

It used to be common practice to use the same local variables (with names like tem) over and over for different values within one function. Instead of doing this, it is better declare a separate local variable for each distinct purpose, and give it a name which is meaningful. This not only makes programs easier to understand, it also facilitates optimization by good compilers. You can also move the declaration of each local variable into the smallest scope that includes all its uses. This makes the program even cleaner.

Don't use local variables or parameters that shadow global identifiers.

Don't declare multiple variables in one declaration that spans lines. Start a new declaration on each line, instead. Only declare one variable per line. For example, instead of:

int    foo,
       bar;
or this:
int foo, bar;
write this:
int foo;
int bar;
(If they are global variables, each should have a comment preceding it anyway.)

When you have an if-else statement nested in another if statement, always put braces around the if-else. Thus, never write like this:

    if (foo)
        if (bar)
            win();
        else
            lose();
always like this:
    if (foo)
    {
        if (bar)
	{
            win();
	} else {
           lose();
	}
    }

If you have an if statement nested inside of an else statement, either write else if on one line, like this,

    if (foo)
    {
        ...
    }
    else if (bar)
    {
        ...
with its then-part indented like the preceding then-part, or write the nested if within braces like this:
    if (foo)
    {
        ...
    } else {
        if (bar)
            ...
    }

Don't declare both a structure tag and variables or typedefs in the same declaration. Instead, declare the structure tag separately and then use it to declare the variables or typedefs.

Try to avoid assignments inside if-conditions. For example, don't write this:

    if ((foo = (char *) malloc(sizeof *foo)) == 0)
        fatal("virtual memory exhausted");

instead, write this:

    foo = (char *) malloc(sizeof *foo);
    if (foo == 0)
        fatal("virtual memory exhausted");

This makes the program easier to debug in a debugger.

When using a switch statement:

    switch(foo) {
    case A:
    {
	...
	break;
    }
    case B:
    {
	...
	break;
    }
    default:
    {
	...
    } 
    } /* swtich */

Don't make the program ugly to placate lint. Please don't insert any casts to void. Zero without a cast is perfectly fine as a null pointer constant, except when calling a varargs function.


Naming Variables and Functions

The names of global variables and functions in a program serve as comments of a sort. So don't choose terse names -- instead, look for names that give useful information about the meaning of the variable or function. In a program, names should be English, like other comments.

Local variable names can be shorter, because they are used only within one context, where (presumably) comments explain their purpose.

Please use underscores to separate words in a name, so that the Emacs word commands can be useful within them. Stick to lower case; reserve upper case for macros and enum constants, and for name-prefixes that follow a uniform convention.

For example, you should use names like ignore_space_change_flag; don't use names like iCantReadThis.

Actually, in mixed case names are not so bad. However, never use names with multiple capital letters together.

For example, never use something like DCEIsAGoodAcronym. Rather, use something like dce_is_a_good_acronym or DceIsGoodAcronym. (In fact, you probably want something less verbose in the first place.)

Variables that indicate whether command-line options have been specified should be named according the meaning of the option, not after the option-letter. A comment should state both the exact meaning of the option and its letter. For example,

/* Ignore changes in horizontal whitespace (-b).  */
int ignore_space_change_flag;

When you want to define names with constant integer values, use enum rather than #define. This makes debugging easier.

Use filenames that will go over to UNIX easily. Remember that UNIX is case-sensitive and that ls on UNIX has a default way of sorting things. In general, keep file names lowercase, except for informational files like ChangeLog, NEWS, README, TODO, etc.


Software engineering practices

The following software engineering practices are strongly encouraged:

The following software engineering practices are discouraged:


Symbol namespaces

If you are writing a library, you should pick a prefix for your library. You should ensure that all symbols which interact with the application's namespace (both at link time and on inclusion of the header file) begin with your prefix (or the all-caps form of your prefix, in the case of preprocessor symbols and enumeration constants) followed by an underscore. Symbols which are not intended for use by the application should begin with your prefix followed by two underscores.

For instance, if your prefix is "hes", then functions exported to the user should begin with "hes_" (e.g. "hes_resolve"). Functions used internally by the library should begin with "hes__" (e.g. "hes__resolve" for the internal resolver function). Preprocessor symbols intended for use by the application should begin with "HES_" (e.g. "HES_ER_OK"); preprocessor symbols not intended for use by the application should begin with "HES__" (e.g. "HES__HESIOD_H", used to protect hesiod.h from multiple inclusion).

Names of structures should begin with your prefix, but structure fields don't need to. Strictly speaking, structure fields interact with the user's namespace because the user might have "#define"'d them to something before including your header file, but applications generally shouldn't be "#define"ing lots of lowercase symbols, so this is not a real worry.


Portability considerations for code we get from third parties

The overriding principle for code obtained from third parties is to make as few changes as possible. A lot of third-party code has a very bad approach to portability, or cares about a lot of platforms we don't care about. You should attempt to follow the portability approach used by the rest of the program, such as it may be. Ideally, any changes you make should be made in such a manner that they can be incorporated into the source tree maintained by the third party.