How To Eliminate Bugs In Loops (Part 2)

Written by rfelten. Posted in C++

C++ programmerIn part 1 of this article, I outlined a strategy of adding scaffolding to programs to detect out-of-range indexes to fixed sized arrays within loops. In Part 2, I will expand on the techniques to include variable ranges,  variable sized arrays, and algorithms.

Looping through a constant range of array elements.
If we are looping through a range of indexes that represent a subset of the array, we need to ensure that the range actually fits within the declared array size. If the range is constant, again we can can use our static_assert function, and eliminate problems at compile time. For example:

const unsigned MY_FIRST_MOON = 3;
const unsigned MY_FAVORITE_MOON = 16;
...
static_assert(MY_FAVORITE_MOON < SIZE(moonInfo));

for (unsigned i = MY_FIRST_MOON; i < MY_FAVORITE_MOON; i++)
{
    rocket->flyTo(moonInfo[i].moons[i];
}

Note that if you use an int instead of unsigned for the index, you will have to check the lower bound to make sure it is not negative.

Looping through a variable range of elements.

In the above examples, the range was constant, and static_assert could be used at compile time. But if the range is computed at runtime, we need to use runtime checks. These definitely slow down the program, but since this is done once outside the loop rather than each time we access the array inside the loop, the extra time remains constant regardless of how many times we go through the loop. Consider the alternative: We are confident enough that we found all possible bugs during our own testing, so we never deliver code containing runtime checks to our unsuspecting customers (who are now standing under the arches after we remove the scaffolding). Maybe the program occasionally crashes or produces the wrong answer. “Hey, it’s a known issue, but we’re working on it. Can you send us some code dumps?”

For this example I am going to use what I will call runtime_assert to indicate a runtime check. Unlike static_assert, this will not be caught at compile time, but will detect an error when the program is running. Since this will remain in delivered code, you probably don’t want to use assert which will crash the program. C++ exceptions or other types of error handling are needed. Rather than give examples of different ways of providing run-time error processing, I’m just going to point out where the processing should be inserted, by assuming we have a macro called runtime_assert.

unsigned startMoon = (computed value somewhere in the program);
unsigned endMoon = (computed value somewhere in the program);
...
runtime_assert(endMoon < SIZE(moonInfo));

for (unsigned i = startMoon; i < endMoon; i++)
{
    rocket->flyTo(moonInfo[i].moons[i];
}

Computed subscripts
Examine the subscripts used in the loop. For most loops, the indexes to the arrays run through a given range, usually incremented by one. But many times, the subscripts to an array might be a computed value. In this case, you might be able to determine in advance the minimum and maximum array index that will be used in the loop, and check to make sure it won’t go beyond the size of the array before you enter the loop.

static_assert(SIZE(evenNumbers) ==  SIZE(allNumbers / 2));
for (unsigned i = 0; i < SIZE(allNumbers); i++)
{
    evenNumbers[i / 2] = allNumbers[i]  / 2;
}

In the worst case, it may be necessary to check the index each time it is used inside the loop. This is because the computed indexes cannot be determined in advance, or the data used in the computations are coming from an outside source, for example being read from a file. Even though it will cost extra processing time each time the loop is executed, I would advise that you check the index value anyway for two reasons. 1) You are already performing a computation or reading a file, or getting the index from an unpredictable source, so checking it’s range will only add a little more time to the time already used in determining the new index value. 2) Unlike using fixed range index values, you are even more likely to have an out of range condition when computing an index or reading a file, and if you don’t check for it – you have dangerously left out valuable scaffolding in your program.

Variable length arrays using std::vector
It is highly recommended that you use a container class for variable length arrays. The STL std::vector is the container of choice. In this case, if you are looping through the entire vector, and you are only accessing one vector,  you don’t have to add any scaffolding. You can get the correct size of the vector by using it’s size function.

const int varNumMoons = (compute number of moons);
std::vector<MoonClass> moons(varNumMoons);
...
for (int i = 0; i < moons.size(); i++) // note we don't need to use varNumMoons
{
    rocket->flyToMoon(moons[i]);
}

If you have multiple vectors, you will need to add a runtime check.

stl::vector<MoonClass> moons(varNumMoons);
stl::vector<bool> isMoonVisited(possiblyDifferentVarNumMoons);
...
runtime_assert(isMoonVisited.size() <= moons.size());

for (int i = 0; i < SIZE(isMoonVisited); i++)
{     
    if (!isMoonVisited[i])
    {
       rocket->flyTo(moons[i]);
       isMoonVisited[i] = true;
    }
}

Variable length arrays using keyword new.
For any given number of reasons, programmers might not use std::vector for their dynamically allocated arrays. For example, you are maintaing existing code that uses dynamic arrays allocated through the use of new. Or somebody convinced your boss that vectors are too slow, and he forbid you to use them. Whatever the reason, we have to add some rickety scaffolding, but it’s better than nothing. The problem is that there is no way to determine the size of a dynamically allocated array. The best we can do is allocate the array with a computed value, and define the loop index range using the same variable.

const int varNumMoons = (compute number of moons);
MoonClass* moons = new MoonClass[varNumMoons];
...
for (int i = 0; i < varNumMoons; i++) // we have assume varNumMoons is correct variable!
{
    rocket->flyToMoon(moons[i]);
}

If you have multiple dynamic arrays, and you intend the two variables to be the same size, you should define them using the same variable. But  if they are different sizes, then we can get a lot of benefit from adding a runtime check, especially if the two variables are defined in different parts of the program.

const int varNumMoons = (compute number of moons);
MoonClass* moons = new MoonClass[varNumMoons];
...
const int numVisitedMoons = (compute number of moons to visit)
bool moons = new bool[numVisitedMoons];
...
runtime_assert(numVisitedMoons <= varNumMoons)

for (int i = 0; i < numVisitedMoons; i++)
{
    if (!isMoonVisited[i])
    {
       rocket->flyTo(moons[i]);
      isMoonVisited[i] = true;
    }
}

Arrays passed into functions
The best way to pass an array into a function is to pass in a reference to a std::vector. The function then knows the size of the container. But the traditional (and bug-causing) way of passing arrays into functions is to pass in a pointer to the array, and a length value. It’s bug-causing because the function has no way to know whether the length value is correct. When calling functions with fixed sized arrays, you can at least do your part in avoiding bugs by passing in the length using the SIZE() macro. That at least guarantees you that you’ve given the function the correct array size.

visitAllMoons(moons, SIZE(moons));

If there is more than one array that is being passed into the function, I recommend adding static_asserts for each extra array, to make sure that the lengths are compatible. When I say compatible, I don’t necessarily mean equal, although that may be true. You need to know what the functions’s requirements are for the lengths of the arrays it accesses through its calling parameters. For example, one array may need to be at least twice the size of the other. For fixed size arrays, use a static_assert to enforce the requirement. Use a runtime_assert if the requirement is not constant. If the function has only a single length value for multiple arrays, it is most likely assuming that the arrays are the same length, and you should enforce that assumption using asserts.

static_assert(SIZE(moons) == SIZE(isMoonVisited));
visitUnvisitedMoons(moons, &isMoonVisitied, SIZE(moons));

If you are the programmer designing the function, and you are looping through more than one array, consider having a separate length parameter for each array, and then add runtime checks inside the function to make sure the lengths passed in are compatible. There is no guarantee that the caller has given you arrays that are as big as he says, but at least you aren’t creating a bug inside your function. Any bug caused by an out-of-range index will be caused by the caller, not your function.

void function visitUnvisitedMoons(MoonClass* moons, unsigned sizeMoons, bool**isVisited, unsigned sizeIsVisited)
{
    runtime_assert(sizeMoons <= sizeIsVisited);
    for (unsigned i = 0; i < sizeMoons; i++)
    {
        if (!*isMoonVisited[i]) 
        {
             rocket->flyTo(moons[i]);
             *isMoonVisited[i] = true;
        }
    }
}

If you are writing both the calling code, and the function, then you should wear both hats and put the checks in twice, once when calling the function, and also inside the function. If you are willing to accept the overhead of calling a function, adding an extra runtime check shouldn’t add too much extra code. The redundancy is the scaffolding, so to avoid bugs in loops, this is your best protection.

We Must Not Have Loops

We started this discussion (in Part 1) with my professor’s blanket statement that “We must have loops.” Some people say that conventional wisdom is eventually replaced with the opposite. That isn’t (always) true, but sometimes it seems that way on the surface.  New thinking in the C++ world says that we shouldn’t be writing loops, the opposite of “We Must Have Loops.” In Scott Meyers book “Effective STL – 50 Specific Ways to Improve Your Use of the Standard Template Library” item 43 is “Prefer Algorithms Calls To Hand Written Loops.” But when you replace loops with algorithms,  the algorithms use loops internally,  examining every element in a given range.

If you give an algorithm the entire range of a container, a std::vector for example, the scaffolding is already built-in to the vector.

bool IsOdd (int i) { return ((i%2)==1);} 
std::vector<int> myvector;
std::vector<int>::iterator it = std::find_if (myvector.begin(), myvector.end(), IsOdd);

If the algorithm is given a sub-range of a container, you’d better make sure your iterators are valid. According to the STL, “Note that invalid parameters cause undefined behavior.” Well, we just love undefined behavior in our programs don’t we? Our scaffolding also applies to iterators. Here’s a simple example.

int startIndex = (computed somewhere in the program)
int stopIndex = (computed somewhere in the program)
runtime_assert(startIndex >= 0);
runtime_assert(stopIndex < myvector.size());
std::vector<int>::iterator it = std::find_if (myvector.begin() + startIndex, myvector.begin() + stopIndex, IsOdd);

If you get into the habit of adding scaffolding in all the cases I have outlined in this blog, you should find that you spend a lot less time debugging your programs. You’d be surprised how many times you make a simple change and it won’t compile, or you run the program, and it crashes with an assert or error because of an indexing problem. You’ll appreciate being instantly informed of a bug the second you introduce it. You’ll also kick yourself if you spend a half-day or more debugging a problem that could have been found instantly if you had added these checks.

 

 

 

Robert Felten’s Qt Course At Silicon Valley Code Camp Now Part of C++ Track

Written by rfelten. Posted in C++, News, Qt

Robert FeltenThe course I am teaching at the Silicon Valley Code Camp – Introduction to Qt Container Classes – is now part of the C++  and C++ 11 Track. See http://www.siliconvalley-codecamp.com/Track/2013/c-and-c11.

I will cover Why I dumped STL and Boost container classes and now use Qt container classes exclusively, even when not using GUI’s.

See http://www.siliconvalley-codecamp.com/Register to register for this free class.

How to Eliminate Bugs in Loops (Leave the Scaffolding In) – Part 1

Written by rfelten. Posted in C++

C++ programmerBack in school during a beginner’s course for computer programming, when introducing the concept of loops the professor preempted students who might be planning to ask why we needed loops in our programs by declaring the axiom “Because we must have loops.” I guess he didn’t want to give too many examples or explanations to the students at that point, further immersing them in the fog that apparently hits newbies. Similarly, when I mentioned to my parents who knew nothing about computer programming about bugs I found in my own programs, they were flabbergasted as to how I, a so-called professional, could make such mistakes. Instead of trying to explain how complicated the relationships between different parts of the code are, how it is impossible to keep them all in your head at once, and how you can’t easily predict all the paths through your program – I should have just declared “Because you must have bugs.”

Coincidently, many bugs occur in loops. And since you must have loops and you must have bugs, developers are doomed to a life of searching for bugs inside loops. But does it really have to be that way?

Think of every program written by someone else that you inherited for maintenance or enhancements. It’s just a bunch of poorly written spaghetti code, right? (And if you’re honest, so is the code you wrote last year). Imagine next time you get on an airplane, that your life depends on somebody else’s code. (It does). Frightening, isn’t it? Or worse, imagine your life depends on your own code. Maybe every time you write a program, you should code it as if it was being installed in an airplane and you are the passenger (even if your boss wants it done quicker).

There is a story about architects in ancient Rome who were forced to stand under their own arches while the scaffolding was removed. They might be killed if the arch fell down the first time it had to stand on it’s own. If I was in their place, I would say “Can’t we just leave the scaffolding in permanently?” Well, that’s exactly what I do in my programs. I put scaffolding in to detect bugs, and I leave it in permanently. I have developed a simple set of techniques, and whenever I get lazy and don’t put them all in, every single time; I usually end up regretting it. There’s nothing worse than spending half a day solving a bug and then realize that the bug could have been prevented during compilation, or at least detected when the program was run, rather than by trying to figure out post-mortem why the program crashed or why the results are wrong. Or worse, thinking that the results are correct when they are not.

So what’s happening in loops that are causing bugs? Many times you are accessing elements of the arrays, and an array subscript goes out of range. It’s really that simple. I would guess that applies to 90% of the bugs dealing with loops. This would include many “buffer overrun” and “off by one” type of errors. An advanced programmer might tell you to not access arrays with subscripts, just use STL containers and built-in algorithms. I’m not going to tell you to change your coding style. If you use arrays and/or vectors, and use array subscripts, that’s fine with me.
Arrays can be fixed in size at compile time, or dynamic in length. The loop may access all the elements of an array, a fixed subset, or a varying subset of elements.

Declaring Fixed Sized Arrays

First, let’s start with how the size of fixed-size arrays are declared. Many programmers might declare the size of an array as:

const int NUM_MOONS_NEPTUNE = 13; 
bool isMoonVisited[NUM_MOONS_NEPTUNE];
Another alternative is to define an enumerated list:
Enum
{
    Naiad,
    Thalassa,
    …
    Psamathe,
    Neso,
    NUM_MOONS_NEPTUNE
};
bool isMoonVisited[NUM_MOONS_NEPTUNE];

Then the programmer, thinking that this is good coding practice, loops through the array by writing:

for (int I = 0; I < NUM_MOONS_NEPTUNE; i++)
{
    isMoonVisited[i] = false;
}

They even think they’ve prevented new bugs, because when the new moon of Neptune is discovered, all they have to do is change

const int NUM_MOONS_NEPTUNE = 13;

to

const int NUM_MOONS_NEPTUNE = 14;

or add S_2004_N1 to the enumerated list and they’re done.
Well, what happens if somebody changes

bool isMoonVisited[NUM_MOONS_NEPTUNE];

to

bool isMoonVisited[NUM_MOONS_SATURN];

But they don’t change

for (int i = 0; i < NUM_MOONS_NEPTUNE; i++)
{
    isMoonVisited[i] = false;
}

They’ve just introduced a bug. The subscript to isMoonVisited will be outside the range of its declared size because there are more moons of Saturn than Neptune.
So here’s my first rule to prevent bugs. Other than declaring the size of a single array, at no other place anywhere in your program should you ever use the NUM_MOONS_NEPTUNE constant or enumerated value again. You might just as well code

bool isMoonVisited[13];

This appears to violate the magic number rule, but it doesn’t any more than

const int NUM_MOONS_NEPTUNE = 13;

violates the magic number rule. And not creating a variable NUM_MOONS_NEPTUNE is better, because it forces you to not have a variable that you can misuse.

Avoid loop bugs using fixed sized arrays

As you will see, my loop bug avoidance technique does not depend on whether or not you define NUM_MOONS_NEPTUNE; it will work either way. But you should never use the value other than for the declaration of the arrays.
Of course in the above example, it might be better to include the information that we are dealing with Neptune moons by naming the array isNeptuneMoonVisited instead of isMoonVisited, but that’s another issue entirely.
Assuming we don’t define NUM_MOONS_NEPTUNE, how do we know the size of the array in order to loop through it? We create a global macro (which I usually put in an include file called size.h) to get the size of an array:

#define SIZE(a) (sizeof(a) / sizeof(a[0]))

I’ve seen this defined as ARRAY_SIZE() which is less likely to be used incorrectly if you don’t mind the additional clutter in your code listing.
This macro handily returns the size of an array that was defined using an initialization list without a size indicator, for example:

bool isMoonVisited[] = {false, false, false, false, false, false, false, false, false, false, false, false, false};

Looping through all elements of an array.
We get the number of elements of isMoonVisited by invoking

SIZE(isMoonVisited)

This is evaluated at compile time, so there is no hit on your program’s efficiency. Now the loop is:

for (int i = 0; i < SIZE(isMoonVisited); i++)
{
    isMoonVisited[i] = false;
}

We really don’t care what size it is, as the loop will work perfectly no matter what. Nothing will have to change if the size of the array changes.
So here’s where the scaffolding comes in. Suppose that there are a variety of arrays being accessed in the loop. For example:

for (int i = 0; i < SIZE(isMoonVisited); i++)
{     
    if (!isMoonVisited[i])
    {
       rocket->flyTo(moons[i]);
       isMoonVisited[i] = true;
    }
}

If the size of moons isn’t at least as big as the size of isMoonVisited, we have a buffer overrun. If the size of moons is bigger than the size of isMoonVisited, we probably have a bug, because we’re not calling flyToMoons for all of its elements. I would guess that even an automatic bug checker wouldn’t catch that one.
We can find both of these bugs at compile time by using a static assert call before we enter the loop. There are many implementations of static asserts, and you can find examples on the Internet. Some are simple macros (that return non-intuitive error messages), the Boost library has an implementation, and C++11 has a built-in version. I’m not going to provide a tutorial for static asserts, let it suffice that you have found one that you are happy with. Let’s call it static_assert().

static_assert (SIZE(moons) == SIZE(isMoonVisited));

Remembering the Roman architects, this is what I like to think of as scaffolding. Some programmers turn off asserts in delivered code (especially run-time asserts), but I always leave them in place, all the time, for all arrays that we are looping through, even if it seems obvious that we already made sure that we defined the arrays to be the same size. This assumption may not be true once the maintainer gets ahold of your code. (And that maintainer may be you, a year from now, after you’ve forgotten most of what you did.)
If you know for sure that you don’t care if flyTo is called for all elements of moons, use the lesser restrictive

static_assert (SIZE(moons) <= SIZE(isMoonVisited));

to at least prevent a buffer overrun.

Clearly isMoonVisited and moons are closely related. Let’s look at how we define the sizes of arrays that are related to each other.
Depending on how we defined our arrays we can do this several ways. One way is to use the same const int or enumerated value to define the size of each array to be the same.

const int NUM_MOONS_NEPTUNE = 13;
bool isMoonVisited[NUM_MOONS_NEPTUNE];
MoonClass moons[NUM_MOONS_NEPTUNE];

Each array may be defined in a different file, or far apart from each other in the same file, not one right under the other, and a maintainer could easily change the size of one without changing the other. If that happened, you’d have a bug, but if you left the scaffolding in (the assert at the beginning of each loop), at least you’d find the bug as soon as you compiled the program.
However, the above example violates my rule of reusing the size variable (NUM_MOONS_NEPTUNE) somewhere else in the program. Instead define related arrays in terms of the sizes of each other.

const int NUM_MOONS_NEPTUNE = 13;
bool isMoonVisited[NUM_MOONS_NEPTUNE];
MoonClass moons[SIZE(isMoonVisited)];

Or

bool isMoonVisited[13];
MoonClass moons[SIZE(isMoonVisited)];

Using this method, a maintainer would have to really go out of his way to change the size of one array without changing the size of the other (introducing a new bug), whereas if he repeats NUM_MOON_NEPTUNE to define the size of each array, he’d have to go out of his way to change the size of both arrays simultaneously (avoiding a new bug).

Which do you think is better? Coding your program so that a maintainer has to go out of his way to create new bugs, or so that new bugs appear whenever he is not careful. I think the former, no doubt.

Of course, if you could put both the arrays in a struct or class, you could have a single array of class objects, and you avoid the above potential problems. You don’t need the scaffolding either. However, this isn’t always possible.

struct MoonInfo
{
MoonClass moon;
Bool isMoonVisited;
};
const int NUM_MOONS_NEPTUNE = 13;
MoonInfo moonInfo[NUM_MOONS_NEPTUNE];
for (int i = 0; i < SIZE(moonInfo); i++)
{ 
    if (!moonInfo[i].isMoonVisited) 
    {
        rocket->flyTo(moonInfo[i].moons);
        moonInfo[i].isMoonVisited = true;
    }
}

In Part 2, I will discuss handling variable index ranges, variable sized arrays, and other issues.

Welcome

Written by tshafer. Posted in C++, News, Qt

Robert FeltenFor years I have taken classes, attended conferences and lectures, and read books, magazines, journals, and blogs about such topics as writing software effectively, good software design, and coding techniques.  I have also developed and taught courses in software development. Here I will be documenting some of my own ideas on software development, especially C++ coding, and tips using Qt for GUI and other code development.

I hope you find the information valuable, and I encourage comments, opinions, additions, corrections, and debate.