"C++ arrays do not support polymorphism."

by geekzhang on 2013-05-03 11:23:17

First, I saw a Weibo post and Cloud Wind's comment, then I replied to the OP that they didn't understand C's memory management.

Later, it sparked a lot of discussion, and many people took the opportunity to criticize C++, for example:

//@Baidu-ThursdayWang: Isn't this exactly where C++ falls short? You have to remember too many things.

//@Programming Rogue Zhang FaCai: This has very little to do with C. But I need to verify; it shouldn't be like this. If the base class destructor can't be called in such a case, it's too weak.

//@Program Yuan: Looking back now, it was actually fortunate that I didn't delve deeply into the various obscure corners and details of C++ language features due to lack of perseverance. I feel sorry for those who are still immersed in these weird features and enjoy them endlessly.

Then, some incorrect understandings also emerged:

//@BA5BO: Arrays are based on copying, while polymorphism is based on pointers. Assigning a derived class to a base class array just copies a new base class object, so there's no need to call the derived class destructor.

//@Programming Rogue Zhang FaCai: I suddenly understand what's going on. In this case, all elements in the array are equal-length structures, and the types must be consistent, so polymorphism isn't possible. This is different from C# and Java. The latter two store object pointers for reference types.

And so on. It seems I must write a blog post to clarify things.

Since I didn't see the context, I guessed it might be one of the following two scenarios:

1) A pointer array `Base*[]` stores a bunch of derived class pointers. In this case, `delete[] pBase;` only deletes the pointer array and does not delete the objects pointed to by the pointers. This is a basic issue in C. You first need to loop through the pointer array, delete all the objects it contains, and then delete the array. Clearly, this has nothing to do with C++.

2) The second possibility is: `Base*pBase = new Derived[n]`. In this case, `delete[] pBase` clearly will not call the virtual destructor function (though this isn't necessarily true, as I'll explain later). This was the reply by Cloud Wind in the mentioned Weibo post. For this, I think if it's like this, the programmer hasn't fully understood how pointers and arrays work in C, nor has he understood what an object is, or what a pointer and reference to an object are—this is simply poor understanding of C.

Later, after reading @GeniusVczh's original article "How to Design a Language (I) What Are Pits (a)", I realized it referred to the second scenario. That is, the following example (I added a virtual destructor for easier compilation):

```cpp

class Base {

public:

virtual ~Base() { cout

C Language Basics

I won't yet discuss when the above C++ program correctly calls the derived class destructor. Instead, let me first talk about C, which will help you understand the code better later.

For the following:

```cpp

Base *pBase = new Derived[10];

```

What’s the difference between this and the following?

```cpp

Derived d[10];

Base *pBase = d;

```

One is heap memory dynamic allocation, the other is stack memory static allocation. The only difference is the location and type of memory; syntactically and in usage, there is no difference. (If you imagine `Base` and `Derived` as `struct`, and `new` as `malloc()`, do you still think this has anything to do with C++?)

So, do you think `pBase` points to an object, is a reference to an object, or points to an array, or is a reference to an array?

Let’s consider the following scenario:

```cpp

int *pInt;

char *pChar;

pInt = (int*)malloc(10 * sizeof(int));

pChar = (char*)pInt;

```

For the pointers `pInt` and `pChar`, do `pInt[3]` and `pChar[3]` point to the same content? Of course not, because `int` is 4 bytes and `char` is 1 byte, so their strides differ, making them obviously different.

Now, returning to the question of whether `pBase[3]` would point to the correct `Derived[3]` after converting the pointer of the `Derived[]` array to a `Base` type pointer `pBase`.

Let’s look at a pure C example. Below are two structs, similar to inheritance, and I intentionally added a `void *vptr`, akin to a vtable:

```c

struct A {

void *vptr;

int i;

};

struct B {

void *vptr;

int i;

char c;

int j;

} b[2] = { {(void*)0x01, 100, 'a', -1}, {(void*)0x02, 200, 'A', -2} };

```

Note: I compiled this using G++ on a 64-bit platform, where `sizeof(void*)` is 8.

Let’s examine stack memory allocation:

```c

struct A *pa1 = (struct A*)(b);

```

Using gdb, we can observe the following situation (the values of members in `pa1[1]` are completely messed up):

(gdb) p b

$7 = {{vptr=0x1, i=100, c=97'a', j=-1}, {vptr=0x2, i=200, c=65'A', j=-2}}

(gdb) p pa1[0]

$8 = {vptr=0x1, i=100}

(gdb) p pa1[1]

$9 = {vptr=0x7fffffffffff, i=2}

Next, let’s examine the heap situation: (we dynamically allocate `struct B[2]`, then convert it to `struct A*`, and operate on its members)

```c

struct A *pa = (struct A*)malloc(2 * sizeof(struct B));

struct B *pb = (struct B*)pa;

pa[0].vptr = (void*)0x01;

pa[1].vptr = (void*)0x02;

pa[0].i = 100;

pa[1].i = 200;

```

Using gdb to inspect the variables, we can observe the following situation (`pa` works fine, but `pb[1]`'s memory is corrupted):

(gdb) p pa[0]

$1 = {vptr=0x1, i=100}

(gdb) p pa[1]

$2 = {vptr=0x2, i=200}

(gdb) p pb[0]

$3 = {vptr=0x1, i=100, c=0'\000', j=2}

(gdb) p pb[1]

$4 = {vptr=0xc8, i=0, c=0'\000', j=0}

It’s clear that this is entirely due to improper typecasting in C, which causes memory corruption, and has nothing to do with C++. Moreover, any C++ book mentions that casting between parent and child objects can cause serious memory problems.

However, if we modify our `struct B` on a 64-bit platform as follows (by commenting out `int j`):

```c

struct A {

void *vptr;

int i;

};

struct B {

void *vptr;

int i;

char c;

// int j; --- Commented out

} b[2] = { {(void*)0x01, 100, 'a'}, {(void*)0x02, 200, 'A'} };

```

You’ll notice that the memory corruption issues disappear, because `struct A` and `struct B` have the same size:

(gdb) p sizeof(struct A)

$6 = 16

(gdb) p sizeof(struct B)

$7 = 16

Note: If you don’t comment out `int j`, then `sizeof(struct B)` would be 24.

This is memory alignment in C, which exists to allow faster memory access (see "Deep Understanding of C").

If memory alignment is correct, and the members of `struct A` appear in the same order in `struct B` and are located at the beginning, then there will be no problem.

Now let’s revisit the C++ program.

If you’ve read my articles from 5 years ago, "Analysis of C++ Virtual Function Tables" and "C++ Memory Object Layout Part 1 & 2", you’d know that the C++ standard places the virtual function table pointer at the beginning of the class instance. This explains why I intentionally added a `void *vptr` at the start of `struct A` and `struct B`. C++ adds it at the beginning precisely to ensure that after casting, the vtable won't be lost.

Alright, now let’s re-examine C++ with the following code:

```cpp

using namespace std;

class B {

int b;

public:

virtual ~B() { cout

The above code executes correctly, including calling the subclass virtual functions! This is because memory alignment is correct. On my 64-bit CentOS system, `sizeof(B):16`, `sizeof(D):16`.

However, if you add another `int` member in `class D`, the program will result in a segmentation fault. This is because `sizeof(B):16`, `sizeof(D):24`. `pb[1]` accesses an incorrect memory address for the vtable, causing memory corruption.

Additionally: I tested this in Visual Studio 2010. For `struct`, its behavior matches gcc, but for `class` code, it correctly invokes the virtual functions regardless of whether the sizes of the parent and child classes are the same.

However, according to the C++ standard, the following usage is undefined! You can check related discussions on StackOverflow: "Why is it undefined behavior to delete[] an array of derived objects via a base pointer?" (Similarly, you can refer to Item 3 in "More Effective C++").

```cpp

Base *pBase = new Derived[10];

delete[] pBase;

```

Thus, I am very puzzled by Microsoft C++ compiler defining this behavior, leading to disappointment with Microsoft's C++ compiler again. It may seem like it silently compiles correctly and beautifully, but in reality, it misleads many people into treating undefined behavior as defined, even praising it as good. (Just like the Weibo post, claiming VC is amazing and attributing this to OO features. Really!)

Now, you finally understand that the issue with `Base *pBase = new Derived[10];` is a C typecasting problem, and you should also understand what pointers for arrays mean. This is a very peculiar piece of code! Please don't be like those people who loudly proclaim on Weibo or in comments here that Microsoft's C++ compiler supports this!

Finally, I increasingly realize that many people who say C++ is hard to use actually don't understand C.

tags users