typecasting with virtual functions - 【StackMirror】|c++|casting|virtual

In the code below, pC == pA:

class A
{
};

class B : public A
{
public:
    int i;
};

class C : public B
{
public:
    char c;
};

int main()
{
    C* pC = new C;
    A* pA = (A*)pC;

    return 0;
}

But when I add a pure virtual function to B and implement it in C, pA != pC:

class A
{
};

class B : public A
{
public:
    int i;
    virtual void Func() = 0;
};

class C : public B
{
public:
    char c;
    void Func() {}
};

int main()
{
    C* pC = new C;
    A* pA = (A*)pC;

    return 0;
}

Why is pA not equal to pC in this case? Don't they both still point to the same "C" object in memory?

2012-04-03 22:02
by user987280

What exactly do you mean by they are equal? Does this mean that you get the expected values when you cast from one to the other in case A but not case B - David D 2012-04-03 22:05

Try to provide an example that is closer to your actual code, because your assumption is wrong (the bug is elsewhere). Those pointers point to the same location - Tamás Szelei 2012-04-03 22:07

I may be a little confused. I thought that pA and pC would have the same value if they point to the same object in memory, even if one is an A* and the other is a C*. http://img831.imageshack.us/img831/6848/screenshotsbc.jp - user987280 2012-04-03 22:15

I knew this was possible with multiple inheritance, but I haven't been able to figure out the mechanics behind this little puzzler - Mark Ransom 2012-04-03 22:28

@TamásSzelei, your test is invalid because the == operator will implicitly convert one of the pointers. I modified it to show the addresses which were equal, but that doesn't prove anything either because different compilers are free to layout their objects differently - Mark Ransom 2012-04-03 22:33

@TamásSzelei, the behavior he describes is correct (for his compiler). It's an implementation detail, but if you test this on VC++, you'll find that the pointer values are indeed referencing different memory locations - Derek Park 2012-04-03 22:51

Thanks, I didn't know about this behavior - Tamás Szelei 2012-04-04 07:36

You're seeing a different value for your pointer because the new virtual function is causing the injection of a vtable pointer into your object. VC++ is putting the vtable pointer at the beginning of the object (which is typical, but purely an internal detail).

Let's add a new field to A so that it's easier to explain.

class A {
public:
    int a;
};
// other classes unchanged

Now, in memory, your pA and A look something like this:

pA --> | a      |          0x0000004

Once you add B and C into the mix, you end up with this:

pC --> | vtable |          0x0000000
pA --> | a      |          0x0000004
       | i      |          0x0000008
       | c      |          0x000000C

As you can see, pA is pointing to the data after the vtable, because it doesn't know anything about the vtable or how to use it, or even that it's there. pC does know about the vtable, so it points directly to the table, which simplifies its use.

2012-04-03 22:37
by Derek Park

Just out of curiosity, how does the GCC implementation work? The addresses are the same with GCC - Tamás Szelei 2012-04-04 07:38

@TamásSzelei, I don't knows for certain. Presumably GCC puts the vtable for B after all of A's fields. Or alternatively, GCC might recognize that since A has no fields of its own, it doesn't really matter where 'pA there's no need to bother changing the pointer when casting between A and B (or C) - Derek Park 2012-04-04 15:14

I asked a follow-up since then and got a nice answer. Your assumption is correct, it is the empty base class optimization. Adding a field to A will make the pointers different - Tamás Szelei 2012-04-04 15:20

@TamásSzelei, It looks like GCC puts the vtable for B after all of A's fields. So in memory you've got [afields|bvtbl|bfields|cfields]. (A has no fields in the example given, but that doesn't change the layout conceptually.) I haven't confirmed that this is exactly what GCC is doing, but it seems the most likely scenario. This means it's optimizing for casting (which now requires no pointer offset) at the expense of virtual function calling (which now requires the offset that the cast avoided) - Derek Park 2012-04-04 15:21

P.S. Sorry for the double-comment. My first one got mangled and I didn't realize it went through. After digging a bit further, it looks like the empty base class optimization is indeed the factor here - Derek Park 2012-04-04 15:44

A pointer to an object is convertible to a pointer to base object and vice versa, but the conversion doesn't have to be trivial. It's entirely possible, and often necessary, that the base pointer has a different value than the derived pointer. That's why you have a strong type system and conversions. If all pointers were the same, you wouldn't need either.

2012-04-03 22:11
by Kerrek SB

I guess that is where my confusion is coming in. I thought that the pointers would have the same value if they pointed to the same object in memory. They have the same value without the virtual function. But when the virtual function gets added in, the values are different. Is there a way to check if pA and pC actually point to the same object in memory if the values are different - user987280 2012-04-03 22:23

@user987280: They don't point to the same object. The derived pointer points to the most-derived, complete object, and the base pointer points to the base subobject. Sometimes the base subobject has the same address as the containing most-derived object, but that's entirely by coincidence - Kerrek SB 2012-04-03 22:30

"Sometimes the base subobject has the same address as the containing most-derived object, but that's entirely by coincidence." So, scratch the virtual function and look at the first bit of code I posted. The fact that pA and pC have the same value is not something which should be relied upon. I did not know that, that's for bringing it to my attention. I was actually relying on that before the virtual function fiasco - user987280 2012-04-03 22:45

Here are my assumptions, based on the question.

1) You have a case where you cast from a C to an A and you get the expected behaviour.
2) You added a virtual function, and that cast no longer works (in that you can no longer pull data from A directly after the cast to A, you get data that makes no sense to you).

If these assumptions are true the hardship you are experiencing is the insertion of the virtual table in B. This means the data in the class is no longer perfectly lined up with the data in the base class (as in the class has added bytes, the virtual table, that are hidden from you). A fun test would be to check sizeof to observe the growth of unknown bytes.

To resolve this you should not cast directly from A to C to harvest data. You should add a getter function that is in A and inherited by B and C.

Given your update in the comments, I think you should read this, it explains virtual tables and the memory layout, and how it is compiler dependent. That link explains, in more detail, what I explained above, but gives examples of the pointers being different values. Really, I had WHY you were asking the question wrong, but it seems the information is still what you wanted. The cast from C to A takes into account the virtual table at this point (note C-8 is 4, which on a 32 bit system would be the size of the address needed for the virtual table, I believe).

2012-04-03 22:13
by David D