This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH]: Fix PR tree-optimization/21407



On 11/05/2005, at 5:23 PM, Mark Mitchell wrote:


Geoff Keating wrote:

On 11/05/2005, at 11:39 AM, Mark Mitchell wrote:

If you asked someone twenty years ago whether these kinds of operations were legal, the answer would have been uniformly yes. Now, opinions differ, because people are seeing more benefits of restricting these operations. But, the standards still don't say.

  struct I {
    virtual void f(); // I is not a POD.
    int a;
    int b;
  };

  void g() {
    I i;
    int *a_p = &i.b;
    /* 8 is a magic number! */
    I* i_p = (I*)((char *) a_p - 8);
    i_p->a = 3;
  }

There's no casting between pointers and integers. There are reinterpet_casts between pointers. The mapping is implementation- defined, but, in practice, all implementations on, say, IA32, are going to leave the bit-pattern unchanged. I just don't see anything that says conclusively that this code is invalid.

In C++, this code clearly invokes unspecified behaviour, initially with the cast of a_p to 'char *'.


Many people would say, reasonably enough, that specifying the behavior is specifying (a) the type of the result, and (b) the value. After all, that's how the descriptions of expressions are written. So, as long as we specify that the result of the conversion is that you get a "char *" with the same bit-pattern as "a_p", then there's nothing more for us to say.


The question is what you can do with a pointer of that type and value.

Well... There are a few more things to say.


One implicit thing that you haven't said is "and then execution continues as specified in the rest of the standard". The problem is that in this case, that's not a complete specification, and in fact will in this example quickly lead to undefined behaviour again.

[basic.compound] para 3 says "The value representation of pointer types is implementation-defined." So just saying "the bit patterns are the same" could produce any result. So let's say that you actually define the result as having a value of the same address as the original pointer.

OK, so now we have two pointers, different types, same address. Let's assume that there is an object of type 'int' at that address, which the original pointer pointed to. Now we have a pointer to 'char' at the same address. Let's assume no object of type 'char' has been created at that address: there's an exhaustive list of the ways that objects are created in [intro.object], and none of them has happened. [basic.compound] para 3 says that "If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained." It doesn't say anything about what happens if there isn't an object of the right type there. So now you have to fill in that blank, or be back at undefined behaviour again: does this pointer point to an object? What object?

You can't find those answers by reading the standard; that's like putting a word on a blank piece of paper and expecting it to fill in the rest of the story.

You can have Nick's alternative #1, and say that you can form pointer values by any means, and if they happen to point at something of the type you're dereferencing, then you're OK.

The key words here are "by any means", which I'll discuss below.


You can have Nick's alternative #3, and give dynamic history to pointers. That's what the optimizer folks would like; since a_p points to object with sizeof(int) bytes, you can never derive from it a pointer pointing outside that region.

You don't have to do "dynamic history", and I don't think there's much support in the C++ standard for such a concept. All you need do is prevent someone from taking a step which would derive such a pointer.


But, certainly, nothing in the current standard suggests that what you can do with a pointer whose value compares equal to that of i.a depends on whether you got that pointer by writing "&i.a", or by pointer arithmetic, or by "(int *)0x<right address here>".

The C++ standard says this in almost as many words, [basic.compound] p3. (C is very different, and I don't think that statement is true in C.)


(However, if you've never taken the address of any part of "i", then there's no way for a conforming program to know what address i.a has, so we can just assume that they didn't get the right value if they try to just guess a number. And, therefore, we can avoid actually allocating memory. But, if you've taken the address of some part of "i", then the program can know the address of i.a.)

Consider:

  struct S {
   virtual void f();
   int a;
   int b;
  };

  void f(int *a_p) {
    if (*a_p != 3) abort();
  }

  void g() {
    S s;
    s.a = 3;
    f((int*) ((char *)&s.b - ((char*) &s.b - (char*) &s.a))));
  }

The only thing that can possibly make this code abort is that we're not allowed to manipulate the pointer to s.b like this. That's going to be rather surprising, given that we can do an algebraic simplication and just get:

f ((int *) (char*) &s.a);

which nobody is proposing should result in an abort.

The code is indeed invalid under Nick's model #3, but I sure don't see anything in the C++ standard to suggest that anybody expected this code to fail.

I think that [expr.add] paragraph 5 is pretty much a statement that sometimes, when you add and subtract things from pointers, it can fail: "If [blah blah], the evaluation shall not produce an overflow; otherwise, the behavior is undefined."


Of course, it's normal that some expressions which overflow don't overflow when simplified. For instance

char a;
a = 127;
a++;
a--;

can overflow, even though it simplifies to

a = 127;

which can't.

The trick is the 'blah blah'. It talks about elements of an array object. And that takes us back to the start, which is "what object, if any, does this pointer point to?" Is it an array object? How long is it?

Attachment: smime.p7s
Description: S/MIME cryptographic signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]