Josef “Jeff” Sipek

Practical and Portable x86 Recompilation

About a month ago, I stumbled across this fascinating blog post. I finally got around to sharing it here on my blahg.

Practical and Portable x86 Recompilation

Useless reinterpret_cast in C++

A few months ago (for whatever reason, I didn’t publish this post earlier), I happened to stumble on some C++ code that I had to modify. While trying to make things work, I happened to get code that essentially was:

uintptr_t x = ...;
uintptr_t y = reinterpret_cast<uintptr_t>(x);

Yes, the cast is useless. The actual code I had was much more complicated and it wasn’t immediately obvious that ‘x’ was already a uintptr_t. Thinking about it now, I would expect GCC to give a warning about a useless cast. What I did not expect was what I got:

foo.cpp:189:3: error: invalid cast from type "uintptr_t {aka long unsigned int}"
    to type "uintptr_t {aka long unsigned int}"

Huh? To me it seems a bit silly that the compiler does not know how to convert from one type to the same type. (For what it’s worth, this is GCC 4.6.2.)

Can anyone who knows more about GCC and/or C++ shed some light on this?

Awful, Awful C++: Operator Overloading Insanity

Ew! This is an emergency post. I just found out about some operator overloading brain damage in C++. Take a look at the Wikipedia article about Wikipedia article: Operators in C and C++. Just to make sure, I checked the C/C++ Language Reference for IBM’s XL C++ compiler (I find it easier to read than many of the other language references).

As you are no doubt aware, C++ has a ++ prefix and a ++ postfix operator (there is also the decrement operator but the same insanity applies to it). They work differently so when overloading them, there is a need for two different functions.

The syntax for operator overloading does not however allow for the difference in a obvious way. operator++ doesn’t seem to indicate whether you are talking about the prefix or the postfix variant. Here’s a code snippet that shows the syntax for both:

class X {
	void operator++() { };
	void operator++(int) { };
};

Do you see the difference? The postfix variant uses a dummy int argument. What?!

C++ is even worse than that! I found out that there are two more operators that C++ has. Here’s an example of how they might be used:

struct X {
	int *foo;
};

void foo(struct X *p)
{
	p->*foo = 1;
}

void bar(struct X &p)
{
	p.*foo = 1;
}

Do you see it? Do you see that abomination? Here’s the equivalent version using a more mainstream syntax.

void foo(struct X *p)
{
	*p->foo = 1;
}

void bar(struct X &p)
{
	*p.foo = 1;
}

We’ve tried to compile a test program that uses this “fancy” notation, but gcc does not like it. I am glad.

Awful, Awful C++: Exceptions & Memory Leaks

Let me start today’s post with a question. Can you spot the issue in the following code snippet? (Yes, I know, it’s a bit artificial, but it is inspired by a lot of real code I have seen.)

void foo(struct S *ptr)
{
	assert(ptr);

	ptr->a = new int[100];
	ptr->b = new int[100];
}

void bar()
{
	struct S s;

	try {
		foo(&s);
	} catch (std::bad_alloc &ba) {
		// foo failed to allocate memory
		return;
	}

	// continue processing
}

Do you see the issue?

It’s not a C++ problem, but C++ exceptions make it possible. The answer is: there is a potential memory leak.

Everyone has always told me how amazing exceptions are, but I always found them counter intuitive. I’ve used several languages that have you manage the memory, and so I am rather familiar with the following construct (I’m using C for this, but the same idea exists in other languages):

ptr = malloc(sizeof(xyz));
if (!ptr) {
	/* allocation failed */
}

Everyone has always told me that with exceptions, you can do a bunch of things in the try block and then figure out what exactly went wrong in one of the catch blocks. In other words, something like:

try {
	ptr = new xyz;
	// more processing
} catch (std::bad_alloc &ba) {
	// oops, failed to allocate memory
} catch (whatever &w) {
	// handle 'whatever' exception
}

That all sounds great, but what if you need to allocate two objects? When you catch a bad_alloc exception, how do you know which of the allocations failed? Sure, you could initialize it to NULL, and then check for it in the catch block. It becomes a bigger issue when the function using new does no exception handling and instead relies on the caller to take care of it. (Java handles this better by forcing you to either catch the exception, or marking the function with an annotation indicating that it may throw a specific exception. Interestingly enough, C++ lets you annotate functions to tell the compiler whether or not it will throw an exception, but that’s a topic for another time.) Since the caller does not have access to the callee’s stack frame, it has no way of determining whether or not there was a successful allocation in addition to the failed one. Again, this is not an issue with the language, but rather with the view that exceptions are magic that makes memory management simple.

Recently, I mentioned this issue to someone who uses C++ on daily basis, and he suggested that this could be fixed by having the function doing the allocation (in my example foo) use auto_ptr instead of raw pointers. That means, that the if execution leaves the scope whatever was allocated gets destroyed. This compilicates the implementation of foo — now, we need two auto_ptrs to store the actual pointers to the integer arrays, and if we managed to allocate both, we can set the raw pointers in the structure and everything is fine and dandy. If there was an allocation failure, we exit the scope because of the exception — the auto_ptrs will destroy things for us.

Finally, I would like to address those of you that are about to tell me that if memory allocation failed, then we are doomed anyway and possibly leaking some other memory is not a big deal. In some software, that is certainly true. However, there is a huge amount of code that can (and should) continue to function even in the presence of such failure. For example, suppose you are writing a web server. A client connects, you try to allocate some sort of structure to keep track of per-client state. The allocation fails. Does it mean that the whole web server process is doomed? No. Just this one client as far as we know. We should clean up, and then close the network connection. That one client may end up unhappy, but there’s nothing we could do for it. Other clients that already happen to have all the memory they need, can continue on their merry way. Additionally, memory may become available in the future — a memory hog may exit freeing up a lot of memory.

Awful, Awful C++: Casting

C++ has several ways of casting. Sadly enough, the names are mostly unintuitive. I asked several people I know (some of which happen to be, or have been instructors for the introductory C++ class), and I managed to construct the following list. Not all of them knew what all the casts do. This is a very C++ symptom; most people don’t know significant fractions of the language. This is mostly because C++ is a very complex language and no one teaches/tells you about everything.

Of course, Stroustrup’s The C++ Programming Language describes all the cast constructs.

static_cast
converts between related types. The types have to be convertible. For example, ints can be trivially converted into enums. Floating point variables can be converted to integral types — truncate the decimal portion.
reinterpret_cast
converts between unrelated types. For example, from an integer to a pointer, or between two unrelated pointer types (e.g., float* and int*).
const_cast
is used when you want to remove const or volatile qualifiers.
dynamic_cast
attempts a type conversion but performs run-time checks to see if the conversion is valid. This is useful for converting pointer types — from a base class type to a subclass type. If the conversion is not valid, dynamic_cast returns a null pointer.
C-style cast
is the confusing cast in C++. It is a combination of static_cast, reinterpret_cast, and const_cast. That is, it can be used to add or remove const-ness, volatile-ness, convert between related or unrelated types (between ints and pointers, different type pointers, and between the various integral types and floats).

All in all, it really isn’t that bad once you see a list.

I started this post with the intent to show how ridiculous C++ casting is, but I think that all I can really say is just that I don’t agree with some of the ways it is done. I don’t like how verbose the cast syntax is. (I am aware that unnecessary casting is bad; in C, I get frustrated whenever I see people casting malloc return value instead of simply assigning it.) On the other hand, I have to agree with people that say that it’s easier to grep for C++-style casts.

The issue with C++ here is that it allows C-style casts to coexist with C-style casts. I’ve been told by C++ fans that one cannot treat C++ as if it were C. One way that would accomplish that would be making C++ look different than C. An easy thing would be to get rid of C-style casts. (I am aware that this will never happen.) Not only do they confuse C programmers trying to use C++, and C++ programmers that haven’t been told about all the features of the language, they also make the compiler’s job more difficult and the software potentially buggier. In C++, the C-style cast is the nuke. It tells the compiler to just do the conversion. C++ gives the programmer 4 different casts, using the right one (or ones if more than one is required) at the right time means the compiler can do more optimizations and more sanity checks making the code faster and less likely to be buggy.

So, C++ isn’t awful because it has a bunch of different casts, but because it allows them co-exist with the C-style cast.

Awful, Awful C++: Exceptions & the new operator

Over the past few months, I have pondered many aspects of C++. I thought it would be good to write my concerns and conclusions down somewhere. That somewhere ended up being right here — my blahg.

I was thinking about C++ and how quirky it is. And then I considered the scenario where you dynamically allocate an object, but the constructor throws an exception. Would that free the allocated object? Well, section 14.4.4 in the C++ book answers that:

What happens if X’s constructor throws an exception? Is the memory allocated by the operator new() freed? For the ordinary case, the answer is yes…

When the placement syntax is used, the answer cannot be that simple. Some uses of that syntax allocate memory, which then ought to be released; however, some don’t. Furthermore, the point of using the placement syntax is to achieve nonstandard allocation, so nonstandard freeing is typically required. Consequently, the action taken depends on the allocator used. If an allocator Z::operator new() is used, Z::operator delete() is invoked if it exists; otherwise, no deallocation is attempted.

Curse you, Bjork Stroustrup!

Powered by blahgd