23 October 2013

How To Eliminate Bugs In Loops (Part 2)

Written by rfelten. Posted in C++

In part 1 of this article, I outlined a strategy of adding scaffolding to programs to detect out-of-range indexes to fixed sized arrays within loops. In Part 2, I will expand on the techniques to include variable ranges, variable sized arrays, and algorithms.

Looping through a constant range of array elements.
If we are looping through a range of indexes that represent a subset of the array, we need to ensure that the range actually fits within the declared array size. If the range is constant, again we can can use our static_assert function, and eliminate problems at compile time. For example:

const unsigned MY_FIRST_MOON = 3;
const unsigned MY_FAVORITE_MOON = 16;
...
static_assert(MY_FAVORITE_MOON < SIZE(moonInfo));

for (unsigned i = MY_FIRST_MOON; i < MY_FAVORITE_MOON; i++)
{
    rocket->flyTo(moonInfo[i].moons[i];
}

Note that if you use an int instead of unsigned for the index, you will have to check the lower bound to make sure it is not negative.

Looping through a variable range of elements.

In the above examples, the range was constant, and static_assert could be used at compile time. But if the range is computed at runtime, we need to use runtime checks. These definitely slow down the program, but since this is done once outside the loop rather than each time we access the array inside the loop, the extra time remains constant regardless of how many times we go through the loop. Consider the alternative: We are confident enough that we found all possible bugs during our own testing, so we never deliver code containing runtime checks to our unsuspecting customers (who are now standing under the arches after we remove the scaffolding). Maybe the program occasionally crashes or produces the wrong answer. “Hey, it’s a known issue, but we’re working on it. Can you send us some code dumps?”

For this example I am going to use what I will call runtime_assert to indicate a runtime check. Unlike static_assert, this will not be caught at compile time, but will detect an error when the program is running. Since this will remain in delivered code, you probably don’t want to use assert which will crash the program. C++ exceptions or other types of error handling are needed. Rather than give examples of different ways of providing run-time error processing, I’m just going to point out where the processing should be inserted, by assuming we have a macro called runtime_assert.

unsigned startMoon = (computed value somewhere in the program);
unsigned endMoon = (computed value somewhere in the program);
...
runtime_assert(endMoon < SIZE(moonInfo));

for (unsigned i = startMoon; i < endMoon; i++)
{
    rocket->flyTo(moonInfo[i].moons[i];
}

Computed subscripts
Examine the subscripts used in the loop. For most loops, the indexes to the arrays run through a given range, usually incremented by one. But many times, the subscripts to an array might be a computed value. In this case, you might be able to determine in advance the minimum and maximum array index that will be used in the loop, and check to make sure it won’t go beyond the size of the array before you enter the loop.

static_assert(SIZE(evenNumbers) ==  SIZE(allNumbers / 2));
for (unsigned i = 0; i < SIZE(allNumbers); i++)
{
    evenNumbers[i / 2] = allNumbers[i]  / 2;
}

In the worst case, it may be necessary to check the index each time it is used inside the loop. This is because the computed indexes cannot be determined in advance, or the data used in the computations are coming from an outside source, for example being read from a file. Even though it will cost extra processing time each time the loop is executed, I would advise that you check the index value anyway for two reasons. 1) You are already performing a computation or reading a file, or getting the index from an unpredictable source, so checking it’s range will only add a little more time to the time already used in determining the new index value. 2) Unlike using fixed range index values, you are even more likely to have an out of range condition when computing an index or reading a file, and if you don’t check for it – you have dangerously left out valuable scaffolding in your program.

Variable length arrays using std::vector
It is highly recommended that you use a container class for variable length arrays. The STL std::vector is the container of choice. In this case, if you are looping through the entire vector, and you are only accessing one vector, you don’t have to add any scaffolding. You can get the correct size of the vector by using it’s size function.

const int varNumMoons = (compute number of moons);
std::vector<MoonClass> moons(varNumMoons);
...
for (int i = 0; i < moons.size(); i++) // note we don't need to use varNumMoons
{
    rocket->flyToMoon(moons[i]);
}

If you have multiple vectors, you will need to add a runtime check.

stl::vector<MoonClass> moons(varNumMoons);
stl::vector<bool> isMoonVisited(possiblyDifferentVarNumMoons);
...
runtime_assert(isMoonVisited.size() <= moons.size());

for (int i = 0; i < SIZE(isMoonVisited); i++)
{     
    if (!isMoonVisited[i])
    {
       rocket->flyTo(moons[i]);
       isMoonVisited[i] = true;
    }
}

Variable length arrays using keyword new.
For any given number of reasons, programmers might not use std::vector for their dynamically allocated arrays. For example, you are maintaing existing code that uses dynamic arrays allocated through the use of new. Or somebody convinced your boss that vectors are too slow, and he forbid you to use them. Whatever the reason, we have to add some rickety scaffolding, but it’s better than nothing. The problem is that there is no way to determine the size of a dynamically allocated array. The best we can do is allocate the array with a computed value, and define the loop index range using the same variable.

const int varNumMoons = (compute number of moons);
MoonClass* moons = new MoonClass[varNumMoons];
...
for (int i = 0; i < varNumMoons; i++) // we have assume varNumMoons is correct variable!
{
    rocket->flyToMoon(moons[i]);
}

If you have multiple dynamic arrays, and you intend the two variables to be the same size, you should define them using the same variable. But if they are different sizes, then we can get a lot of benefit from adding a runtime check, especially if the two variables are defined in different parts of the program.

const int varNumMoons = (compute number of moons);
MoonClass* moons = new MoonClass[varNumMoons];
...
const int numVisitedMoons = (compute number of moons to visit)
bool moons = new bool[numVisitedMoons];
...
runtime_assert(numVisitedMoons <= varNumMoons)

for (int i = 0; i < numVisitedMoons; i++)
{
    if (!isMoonVisited[i])
    {
       rocket->flyTo(moons[i]);
      isMoonVisited[i] = true;
    }
}

Arrays passed into functions
The best way to pass an array into a function is to pass in a reference to a std::vector. The function then knows the size of the container. But the traditional (and bug-causing) way of passing arrays into functions is to pass in a pointer to the array, and a length value. It’s bug-causing because the function has no way to know whether the length value is correct. When calling functions with fixed sized arrays, you can at least do your part in avoiding bugs by passing in the length using the SIZE() macro. That at least guarantees you that you’ve given the function the correct array size.

visitAllMoons(moons, SIZE(moons));

If there is more than one array that is being passed into the function, I recommend adding static_asserts for each extra array, to make sure that the lengths are compatible. When I say compatible, I don’t necessarily mean equal, although that may be true. You need to know what the functions’s requirements are for the lengths of the arrays it accesses through its calling parameters. For example, one array may need to be at least twice the size of the other. For fixed size arrays, use a static_assert to enforce the requirement. Use a runtime_assert if the requirement is not constant. If the function has only a single length value for multiple arrays, it is most likely assuming that the arrays are the same length, and you should enforce that assumption using asserts.

static_assert(SIZE(moons) == SIZE(isMoonVisited));
visitUnvisitedMoons(moons, &isMoonVisitied, SIZE(moons));

If you are the programmer designing the function, and you are looping through more than one array, consider having a separate length parameter for each array, and then add runtime checks inside the function to make sure the lengths passed in are compatible. There is no guarantee that the caller has given you arrays that are as big as he says, but at least you aren’t creating a bug inside your function. Any bug caused by an out-of-range index will be caused by the caller, not your function.

void function visitUnvisitedMoons(MoonClass* moons, unsigned sizeMoons, bool**isVisited, unsigned sizeIsVisited)
{
    runtime_assert(sizeMoons <= sizeIsVisited);
    for (unsigned i = 0; i < sizeMoons; i++)
    {
        if (!*isMoonVisited[i]) 
        {
             rocket->flyTo(moons[i]);
             *isMoonVisited[i] = true;
        }
    }
}

If you are writing both the calling code, and the function, then you should wear both hats and put the checks in twice, once when calling the function, and also inside the function. If you are willing to accept the overhead of calling a function, adding an extra runtime check shouldn’t add too much extra code. The redundancy is the scaffolding, so to avoid bugs in loops, this is your best protection.

We Must Not Have Loops

We started this discussion (in Part 1) with my professor’s blanket statement that “We must have loops.” Some people say that conventional wisdom is eventually replaced with the opposite. That isn’t (always) true, but sometimes it seems that way on the surface. New thinking in the C++ world says that we shouldn’t be writing loops, the opposite of “We Must Have Loops.” In Scott Meyers book “Effective STL – 50 Specific Ways to Improve Your Use of the Standard Template Library” item 43 is “Prefer Algorithms Calls To Hand Written Loops.” But when you replace loops with algorithms, the algorithms use loops internally, examining every element in a given range.

If you give an algorithm the entire range of a container, a std::vector for example, the scaffolding is already built-in to the vector.

bool IsOdd (int i) { return ((i%2)==1);} 
std::vector<int> myvector;
std::vector<int>::iterator it = std::find_if (myvector.begin(), myvector.end(), IsOdd);

If the algorithm is given a sub-range of a container, you’d better make sure your iterators are valid. According to the STL, “Note that invalid parameters cause undefined behavior.” Well, we just love undefined behavior in our programs don’t we? Our scaffolding also applies to iterators. Here’s a simple example.

int startIndex = (computed somewhere in the program)
int stopIndex = (computed somewhere in the program)
runtime_assert(startIndex >= 0);
runtime_assert(stopIndex < myvector.size());
std::vector<int>::iterator it = std::find_if (myvector.begin() + startIndex, myvector.begin() + stopIndex, IsOdd);

If you get into the habit of adding scaffolding in all the cases I have outlined in this blog, you should find that you spend a lot less time debugging your programs. You’d be surprised how many times you make a simple change and it won’t compile, or you run the program, and it crashes with an assert or error because of an indexing problem. You’ll appreciate being instantly informed of a bug the second you introduce it. You’ll also kick yourself if you spend a half-day or more debugging a problem that could have been found instantly if you had added these checks.