12 February 2019

Assaulting A C++ Class

Written by rfelten. Posted in C++

I was perusing the internet for sample C++ interview questions, and I saw one that first intrigued me – but after reading the answer I recoiled in horror. It asked the interviewee to implement a method to access the private attribute of a class (in this case – an int), with an implementation that was not platform dependent and did not depend on the size of the private attribute.

I first asked myself why would I want to do that? Private data is private for a reason. The user of the class is not supposed to know how the class implements its functions or stores its private data, or even if it has private attributes at all. The designer of the class is free to change the implementation at his discretion at any time, without consulting any users of the class – as long as the public interface continues to work as advertised.

The question’s solution is to create a new class identical to the original except that the private data you want to access is made public in the new class. A reinterpret_cast<> from a pointer to the original class object to a pointer to the new class object will allow the private attribute to be accessed. You can even change the private data inside the original class object, and the original object is not aware of it, or able to detect it. This interview answer comes with a caveat that the code should be avoided in a final product but is good when dealing with legacy code, and can be used to extract intermediate values from an external library.

Well, I have a few things to say about this.

First, once this code is written, how will you make sure it doesn’t get into the final product? The answer says “avoid” putting it into the final product – but hey, it works, ship it! And when dealing with legacy code, it sounds like they are recommending to do just that.

So, now we have code that is dependent on private attributes in another class, but the compiler and linker have no clue that this dependency exists. And once this is in the code base, the programmers will be unlikely to know about it. That’s both for new programmers examining the code base for the first time, or you – yes you, a year from now when you forgot all about what you did. An automatic search for all references will not uncover it, either. Someone responsible for maintaining the original class can change the private attribute at any time and will not be aware that some other code was accessing the private attribute in a reinterpreted class object. In fact, the private attribute’s size or type (or name or meaning) can change, or it can be eliminated completely, and the code will still compile, link and run. It just won’t run as designed. You’ve got a potential BIG BUG on your hands, and no easy way to find it. This is almost the definition of undefined behavior.

You don’t even need to change the private attribute to cause a bug, all you have to do is change, add, or delete any of the attributes in the public, protected, or private area of the original class; you can add a new attribute, delete an attribute, change the size or type of any attribute, and the copied class will no longer match.

Examine the following example:
We have a class called Stat, that computes the average of values that have been inserted.
class Stat { public: Stat() : total(0), numSamples(0) {} void Insert(int i) { total += i; ++numSamples; } double GetAverage() const { return numSamples > 0 ? static_cast(total) / numSamples : 0; } private: int total; int numSamples; };
Using this class, we insert 3 values, then compute the average:
Stat stat; stat.Insert(1); stat.Insert(17); stat.Insert(4); std::cout << "average = " << stat.GetAverage() << std::endl;
The output of the program looks like this.
average = 7.33333

A user of the class decides that he needs more than just the average, he wants to know how many samples there are in Stat, and he also knows that there is a private attribute numSamples which will give him the data he wants. So he implements a to be able to access numSamples, and the code works perfectly.
class CopyStat { CopyStat() : total(0), numSamples(0) {} void Insert(int i) { total += i; ++numSamples; } double GetAverage() const { return numSamples > 0 ? static_cast(total) / numSamples : 0; } private: int total; public: int numSamples; };
Stat stat; std::cout << "average = " << stat.GetAverage() << std::endl; stat.Insert(1); stat.Insert(17); stat.Insert(4); std::cout << "average = " << stat.GetAverage() << std::endl;

auto copyStat = reinterpret_cast<CopyStat*>(&stat); std::cout << "Num samples = " << copyStat->numSamples << std::endl;
The output of the program is now:
average = 7.33333
Num samples = 3

Sometime later, someone in charge of maintaining Stat decides to add a function that obtains the max value inserted. So he adds int max to the class attributes, and adds a new function that returns max.
class Stat { public: Stat() : total(0), numSamples(0), max(0) {} void Insert(int i) { total += i; ++numSamples; max = i > max ? i : max; } double GetAverage() const { return numSamples > 0 ? static_cast(total) / numSamples : 0; } int getMax() const { return max; } private: int total; int max; int numSamples; };
Now when the program is recompiled and run, the output is:
average = 7.33333
Num samples = 17

Num samples is wrong! It should be 3, but it is 17, because the CopyStat no longer matches the Stat class. We have a new bug!

So just what compelling reason would anyone have to write code this way, to access the private attribute in a class?

If you can’t use the class without accessing the private attributes inside the class, then there is something wrong with your design. The solution is to change your design, or even change the design of the original class if possible – but please do not use this hack!

Maybe you say it’s just for testing. But if you can’t test your code without accessing a private attribute in a class, then there is something wrong with your test code. You should be able to test your implementation using only the public interface of the class. Even if you decide it is more convenient to test knowing the private data, your test is fatally flawed, because, as we have seen, if anything changes in the original class, your test is no longer valid.

Maybe you only need to access the private attribute for debugging purposes, but then why can’t you access the private attribute using an online debugger? This is better than the danger of writing software that could enter the code base and cause a bug.

If you are testing the original class itself, then make your test a friend to the class that is being tested. The test can then access the private attribute, and if the private attribute changes, the compiler will know about the dependency, and the test will continue to run properly, or it will fail – but at least you will know about it and be able to update the test case appropriately.
Isn’t this what automated tests are for?

Finally, if you can’t make your test a friend to the original class because you can’t access the source code or rebuild the external library, then why are you testing it? It’s not your responsibility to test and maintain the source code in an external library, is it?

One more thing: this technique depends on the compiler storing public and private attributes in one class identically with another class in the exact same order – but private in one class and public in the other. Can you count on that happening correctly across all platforms and compilers? Is there an absolute guarantee this technique will always work?

Excuse me if I write code that doesn’t take advantage of the internal workings of C++ compilers, causes bugs, and instead relies only on good object-oriented programming principles.