This page is a list of C++ coding recommendations, added to on an as-needed basis by the Vesta developers.
Contents
Testing Pointer Validity
Don't do this:
T *x; ... assert(x);
Or this:
assert(!x);
assert is a function which takes an int argument. One some platforms, pointers are larger than integers. The above code can therefore result in a warning like "converting pointer to integer of smaller size". Instead you should do this:
assert(x != 0);
Or this:
assert(x == 0);
Depending on the compiler, an if test could have the same problem, so rather than this:
T *x; ... if(x) { ... }
Do this:
if(x != 0) { ... }
Assertions
It's good to use assert, but there are a few things you should keep in mind.
Assertions Shouldn't Have Side-effects
The expressions used in an assert should not have any side-effects. They shouldn't set variables, produce output, or otherwise do work. For example, don't do this:
Table<K,V>::Default t; ... assert(t.Get(key, /*OUT*/ value)); // use value
Instead, do this:
bool inTbl = t.Get(key, /*OUT*/ value); assert(inTbl); ...
Aside from being a good rule of style, it's important to remember that assertions can be disabled at compile time (by defining NDEBUG) to create a higher-performance executable. This means that you can't count on the expression in an assertion being evaluated.
Don't assert(false)
Instead of "assert(false);" use "abort();". (Be sure to print a descriptive error message first.)
First of all, assert(false); creates more complex code than abort();. abort(); is more efficient.
Secondly, assert(false); doesn't convey any useful information other than a source location in the error message. It's better to print a descriptive message about the problem. (An assertion with a real expression does convery some useful information.)
When using abort();, you may want to include source location information in your error message with the __FILE__ and __LINE__ pre-processor macros.
Keep Separate Assertions Separate
If you have multiple properties that you need to check, don't combine them in a single assertion with the && operator. Make separate assert calls for each. If you combine them and later get an assertion failure, you won't know which property was false. You get more precise assertion failures by keeping multiple assertions separate.
For example, suppose you have two pointers A and B which must both be non-NULL. If you do this:
assert((A != 0) && (B != 0));
You won't know which pointer was NULL when the assertion failed. So you'll get more precise information about what went wrong if you split this into two assertions:
assert(A != 0); assert(B != 0);
Of course not all complex logical assertions can be split up this way, just simple sequences &&-ed together. For example, if you have two properties P and Q, and P being true implies that Q must also be true, this might be written as an assertion like this:
assert(!P || Q);
This is totally different from the && case and can't be reduced.
Don't use an if just for an assertion
Rather than writing this:
if(X) assert(Y);
Use the implication form mentioned above:
assert(!X || Y);
The latter better supports recompiling with assertions disabled. The compiler should be smart enough to realize that the body of the if statement is unnecessary and remove it completely from generate code, but it's a little better to pull the test condition inside the assertion.
References vs. Pointers
Using a pointer means that it's possible for it to be a NULL pointer. "0" is in fact always an acceptable value to initialize a pointer variable or a pointer argument to a function. Any functions which take a pointer argument and assumes that it's non-zero should probably take a reference argument instead. (If not, it should include an assert that the argument is non-NULL.)
Using a reference implies that it will always refer to a valid object. (While it's technically possible to construct a reference to a NULL pointer, it requires more effort and it's unlikely to happen accidentally.) Function arguments which must always refer to a valid object would be better with a reference than a pointer.
Namespaces and namespace pollution
Don't do "using namespace"
C++ namespaces mean that some classes, functions, etc. have cumbersome names. For example, in ISO C++ the output stream class is known as std::ostream. In earlier versions of C++ it was known as simply ostream, which is easier to type.
Some people use this to revert to the old way:
using namespace std;
In general this is considered bad C++ coding practice, as it imports the entire std namespace into the top-level namespace. Instead, just bring in selected things which you need. For example:
using std::ostream; using std::endl; using std::hex; using std::setw; using std::setfill;
This avoids the significant namespace pollution that a whole-namespace import will cause.
Don't put "using" directives in header files
The using directive should be put in definition (.C) files but not header (.H) files. Putting them in header files causes anyone including the header file to get the same names imported into their namespace.
Instead, each definition file should just import those portions of other namepsaces which they need. That way imported names are not imposed upon every file including a header.
This does mean that header files need to use longer names (e.g. std::ostream).
Consider putting new names in a namespace
When defining a new type, function, or other global name, in library code, consider putting it in a namespace. This helps avoid possible name collisions with names from any other library the code may later be combined with.
Use Basics::uintN rather than BitN
There are several typedefs for unsigned integers at the global level which were used in the original Vesta code for bit-fields and unsigned integers:
Bit8 |
an 8-bit unsigned integer |
Bit16 |
a 16-bit unsigned integer |
Bit32 |
a 32-bit unsigned integer |
Bit64 |
a 64-bit unsigned integer |
There are equivalent types in the Basics:: namespace.
Basics::uint8 |
an 8-bit unsigned integer |
Basics::uint16 |
a 16-bit unsigned integer |
Basics::uint32 |
a 32-bit unsigned integer |
Basics::uint64 |
a 64-bit unsigned integer |
In general the latter should be preferred.
We retain the Bit names rather than modify all the code which uses them. It would be nice to phase them out eventually in favor of the names which are in the Basics:: namespace. Doing so would reduce the number of names at the global level. Also, "uint" is generally clearer to someone reading the code that "Bit".
Portability
64-bit Integer Constants
Rather than this:
const Basics::uint64 x = 0x0123456789abcdef;
You should use the CONST_INT_64 macro from Basics.H to make sure that your constant will work correctly with all compilers on all platforms:
const Basics::uint64 x = CONST_INT_64(0x0123456789abcdef);
(This might even have to change in the future to a different form to support more compilation environments.)
If possible avoid writing out such constants in a long form. For example, rather than this:
const Basics::uint64 x = CONST_INT_64(0xffffffffffffffff);
This would be a better choice:
const Basics::uint64 x = (~((Basics::uint64)0));
Memory Allocation
Allocation Macros
There are several memory allocation macros defined in Basics.H which can be used anywhere and must be used in any code ever linked into an application using the garbage collector. It's simplest to just use them everywhere.
The main reason these are important is to make it possible to instrument memory allocations and gather information about the origin of each block. This can be important when debugging certain kinds of problems.
There are three basic macros:
NEW(T) |
Use instead of new T |
NEW_CONSTR(T, (constructor_arg1, ...)) |
Use instead of new T(constructor_arg1, ...) |
NEW_ARRAY(T, size) |
Use instead of new T[size] |
And three "pointer-free" variants which mean that the caller promises that the allocated blocks contain no pointers to other heap blocks (which is important for garbage collection):
NEW_PTRFREE(T) |
Same as NEW(T) |
NEW_PTRFREE_CONSTR(T, (constructor_arg1, ...)) |
Same as NEW_CONSTR(T, (constructor_arg1, ...)) |
NEW_PTRFREE_ARRAY(T, size) |
Same as NEW_ARRAY(T, size) |
STL Allocator
When using the Standard Template Library classes (e.g. std::list, std::map) to store pointers in containers (e.g. std::vector<Foo *>), any code which is ever linked into an application using the garbage collector must use the custom allocator Basics::Allocator from the basics library. This class is defined in:
Instead of using this:
#include <list> ... std::list<Foo *>
You must do this:
#include <list> #include <BasicsAllocator.H> ... std::list<Foo *, Basics::Allocator<Foo *> >
This allocator must be used to force the blocks which make up the container to be in the garbage collector's heap. If you don't do this, the pointers stored in the STL containers may be in the normal heap, which may cause the garbage collector to be unable to find the references to them. This may cause the garbage collector to reclaim those blocks as garbage when they are still in use, leading to a crash or other problems that are very difficult to debug.
Efficiently Formatting Text Strings
Use Stack-Allocated Storage When Possible
Heap allocations are expensive, particularly in multi-threaded programs. This is especially a concern in long-running programs such as servers and even the Vesta evaluator. If you can use a stack-allocated buffer for formatting a text string, it will avoid lock contention and other performance issues associated with heap allocations. There are at least two ways you can do this:
Use snprintf. This requires that you know the maximum size of the string in advance, which isn't possible in all cases.
snprintf is similar to sprintf, but is a little safer because it will never over-run the end of the buffer.
Use Basics::OBufStream with a stack-allocated buffer. This allows you to start with a stack-allocated buffer that may be large enough in the common case. By default, Basics::OBufStream will re-allocate a new buffer from the heap if the current buffer is too small.
Consider Text::printf
The function Text::printf added in basics/38 allows you to format a string as the family of printf functions would but places the result in a Text instance. It determines the size of the resulting string automatically and avoids multiple heap allocations which become likely if a text string is built up gradually through multiple operations. If you can't use a stack-allocated buffer, Text::printf may be a good alternative.
Use Basics::OBufStream, not std::ostringstream
We specifically stopped using the class std::ostringstream from the standard C++ library because of its poor performance, at least in some implementation. We discovered that every time a write was performed to an ostringstream it would allocate a new heap block with enough space for the current contents plus the new bytes and copy in the existing contents plus the new bytes. This means there's one heap allocation per write to the stream.
The Basics::OBufStream works more like the now deprecated ostrstream class from older versions of the C++ standard library. It allocates a buffer with more space than is needed currently. When that space is exhausted it will allocate a larger buffer. This means that many writes to the stream may occur before the next heap block allocation.
Also, OBufStream allows the user to supply an initial buffer on the stack. This can be even more efficient in the common case by using no heap block allocations. We use this method in the repository server code when formatting transaction log entries.
Avoid lots of Text::operator+
The Text class provides + and += operators that allow one to append text values. While useful, this unfortunately has similar heap allocation characteristics to the ostringstream class: each + operation to append two Text values together will allocate a new heap block. So if you have something like this:
Text a, b; // ... Text message = "Description part 1: " + a + ", more description: " + b;
That will perform at least three different heap block allocations, each with an associated string copy.
Consider using either a Basics::OBufStream or Text::printf instead.