The uninitialized variable anathema: non-deterministic C++

edA‑qa mort‑ora‑y - Dec 17 '17 - - Dev Community

A variable with an undefined value is a terrible language failure. Especially when programs tend to work anyway. It's a significant fault of C++ to allow this easy-to-create and hard-to-detect situation. I was recently treated to this insanity with my Leaf project. Various uninitialized structure values made my program fail on different platforms. There's no need for this: all variables should have guaranteed initial values.

The problem

Local variables, member variables and certain structure allocations result in uninitialized values. C++ has rules for uninitialized, default initialized, and zero-initialized. It's an overly complicated mess.

Given the frequency at which a variable is not initialized in C++ it seems like it'd be easy to spot. It isn't. While the language states they are uninitialized, a great number of variables nonetheless end up with zero values. Consider the below program.

#include <iostream>

int main() {
    bool a;
    if( a ) {
        std::cout << "True" << std::endl;
    } else {
        std::cout << "False" << std::endl;
    }

    int b;
    std::cout << b << std::endl;
}
Enter fullscreen mode Exit fullscreen mode


`

For me this always prints False and 0. It prints the same on ideone. Yet according to the standard it could print True and whatever integer value it wants. Both a and b are uninitialized. A tool like valgrind can point out the problem in this case.

It's because such programs tend to work that makes the problem insidious. While developing the error may not show up since zero happens to be the desired initial value. A test suite is incapable of picking up this error until it's run on a different platform. In some projects, I've included valgrind as part of the regular testing, but I think that is rare, and even then I didn't make it part of the automated test suite (too many false positives).

Confounding the problem is that while all types are default initialized, it means nothing for fundamentals. At least a class will have the default constructor called, resulting in a usable instance. A fundamental's "default initializer" is nothing, rather than the sensible "zero initializer". This dichotomy creates a situation where it's not possible to say whether T a;, for some type T, is an initialized or uninitialized variable. A quick glance at the code will always "look" right, even if sometimes wrong.

Why zero

But why does it always tend to be zero? It's a bit ironic not to initialize the memory since the OS will not give a program uninitialized memory. This is a security mechanism. The underlying memory on the system is a shared protected resource. Program A writes to a page, frees it, then program B happens to get allocated the same page. Program B should not be able to read what Program A wrote to that memory. To prevent an information leak the kernel initializes all memory. On Linux it happens to do this with zeros.

There's no reason it has to be done this way. I believe OpenBSD uses a different initial value. And apparently, ArchLinux running inside VirtualBox does something different as well (this is where Leaf failed). It may not even be the OS; the program can also pick up memory that it previously had allocated. In this case, nothing will re-zero this memory since it's within the same program.

Apparently OpenBSD's basic free/malloc will reinitialize the data on each allocation. It's a security feature that mitigates the negative potential of buffer overflows. Curiously it might have prevented the Heartbleed defect, but OpenSSL sort of bypassed that mechanism anyway.

The solution

A language should simply not allow uninitialized variables. I'm not saying that an explicit value must always be given, but the language should guarantee the value. C++ should have stated that all variables are default initialized, which in turn means zero initialized for fundamentals. It should not matter how I created the variable, in which scope it resides, or whether it is a member variable.

There might be some exceptions for performance, or low-level data manipulation. These are however the outlying situations. Mostly the optimizer can handle the basic case of unused values and eliminate them. If we want a block of uninitialized memory we can always allocate it on our own, in which case, I don't expect the data to be initialized and thus don't get caught in the trap.

Just for completeness, a language might offer a special noinit keyword that indicates a variable should not be initialized.

I even think this should be modified in C++ now. Since the values were previously undefined, making them defined now won't change the correctness of any existing programs. It's entirely backwards compatible and would significantly improve the quality of C++.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player