This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: Bug in loop optimize (invalid postinc to preinc transformation)

To: "Tim Hollebeek" <tim at hollebeek dot com>,"Geoff Keating" <geoffk at geoffk dot org>
Subject: Re: Bug in loop optimize (invalid postinc to preinc transformation)
From: "Dave Korn" <davek-ml at ntlworld dot com>
Date: Fri, 29 Dec 2000 06:18:26 -0000
Cc: "Alexandre Oliva" <aoliva at redhat dot com>,<kenner at vlsi1 dot ultra dot nyu dot edu>,"gcc" <gcc at gcc dot gnu dot org>,"Robert Dewar" <dewar at gnat dot com>
References: <200012282342.SAA29717@cj44686-b.reston1.va.home.com>

----- Original Message -----
From: "Tim Hollebeek" <tim@hollebeek.com>
Sent: Thursday, December 28, 2000 11:42 PM

> If pointers are implementation as unsigned offsets into a flat memory
> model, one of two things is true:

 Nope. Pointers are abstract types. The mere fact that the underlying
implementation uses what are effectively 32 bit unsigned ints (which isn't
even the case on segmented architectures) isn't relevant. A pointer is an
abstract type with semantics and behaviour defined by the C language
standard, and just because a 32 bit int works one way, a pointer doesn't
have to, even though it may be implemented using them.

> Case 1:
> 0x00000000 is the null pointer.  Therefore it isn't within a valid
> object, and is not immediately following a valid array.
>
> >>>THEREFORE WRAPPING IS IRRELEVANT<<< since no valid pointer
> increment or operation can cross the boundary.
>
> Case 2:
>
> 0x00000000 is within a valid object, or immediately follows an array.
> In this case, 0x00000000 cannot be the null pointer, so some other
> value (__null) is.  Overflow isn't an issue since if all computations
> on unsigned pointer values involve first subtracting __null (using
> unsigned arithmetic; wrapping is well defined here) from each pointer,
> than adding it back in when necessary, we reduce to case 1.

  The unstated assumption here is that pointers, like ints, have a well
defined behaviour (wrapping round, in the case of ints) when adding 1.
However pointer arithmetic is not integer arithmetic. Remember that
(((char *)(arbitraryvalue)) + 1) != (((double *)(arbitraryvalue) + 1), so
I think that you first have to show that IF ptr == (T *)0xffffffff  THEN
(ptr+1) == (T *)0. And I don't think you can assume that.

  Anyway, let's look at the original source code again:

>volatile unsigned char *p;
>volatile unsigned char i;

  Ok, I accept that these variables are indeed init'ed to zero! There's
always an 'if' or 'but' in the c spec somewhere that I forget about!

>int main(void) {
>   do {
>  i++;
>  } while(p++<(unsigned char *)0xffffffff);
> /* if 0xffffffff change to 0xfffffffe then no bug */
>  return 0;
>}

  Now, what follows is from the n869 draft of the C9X standard. Let's see
in just how many ways the behaviour of this program may be undefined or
unspecified. The first thing I'm going to attack is the comparison
operation.  From section 6.5.8: Relational operators:

>    4 For the purposes of these operators, a pointer to a nonarray object
behaves the same as a pointer to the first element of an array of length
one with the type of the object as its
element type.

>    5 When two pointers are compared, the result depends on the relative
locations in the address space of the objects pointed to. If two pointers
to object or incomplete types both point to the same object, or both point
one past the last element of the same array object, they compare equal. If
the objects pointed to are members of the same aggregate object, pointers
to structure members declared later compare greater than pointers to
members declared earlier in the structure, and pointers to array elements
with larger subscript values compare greater than pointers to elements of
the same array with lower subscript values. All pointers to members of the
same union object compare equal. If the expression P points to an element
of an array object and the expression Q points to the last element of the
same array object, the pointer expression Q+1 compares greater than P. In
all other cases, the behavior is undefined.

  And from 6.3.2.3: Pointers, we have -

>    3 An integer constant expression with the value 0, or such an
expression cast to type void *, is called a null pointer constant. If a
null pointer constant is converted to a pointer type, the resulting
pointer, called a null pointer, is guaranteed to compare unequal to a
pointer to any object or function.

  Now, it seems to me that when this loop first executes, p is the NULL
pointer: I believe it is guaranteed that the zero-initialisation of static
storage class variables will provide correctly zero values for both floats
and pointers, n'est-ce pas? (6.7.8 #10, I won't quote it now.) And the
pointer derived from the cast operation behaves as a pointer to an array
of one char, mais oui? So: do they both point to the same object? Clearly
not: NULL points to no object at all. Do they both point one past the last
element of the same array? Nope, they don't: otherwise there would be a
valid pointer that compared equal to NULL. Do they point to members of the
same aggregate, array or union? Nope, to be brief. Does one point to an
array and the other to one past the last entry in that same array? Also,
nope. So there you have it: undefined behaviour. The only way in which
NULL may be used in a pointer comparison is with != or ==, which
specifically allow it.

  OK, what about that postincrement? Let's check 6.5.6: Additive
operators.

>    7 For the purposes of these operators, a pointer to a nonarray object
behaves the same as a pointer to the first element of an array of length
one with the type of the object as its element type.

>    8 When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array is
large enough, the result points to an element offset from the original
element such that the difference of the subscripts of the resulting and
original array elements equals the integer expression. In other words, if
the expression P points to the i-th element of an array object, the
expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value
n) point to, respectively, the i+n-th and i?n-th elements of the array
object, provided they exist. Moreover, if the expression P points to the
last element of an array object, the expression (P)+1 points one past the
last element of the array object, and if the expression Q points one past
the last element of an array object, the expression (Q)-1 points to the
last element of the array object. If both the pointer operand and the
result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow;
otherwise, the behavior is undefined. If the result points one past the
last element of the array object, it shall not be used as the operand of a
unary * operator that is evaluated.

  Let us consider, since after all it could happen (undefined behaviour is
a bit like that) that the comparison operator does indeed work exactly as
if it treated the two pointers as unsigned ints of the same bitsize. Then
the loop would run round and round until eventually we get to the top of
the loop with p == (char *)0xffffffff. It increments i and then tries to
evaluate the do ... while () condition. Is it legal to apply post
increment to the pointer at this stage? p is a char *, so under #7 it is
effectively pointing to the first element of an array of length one. So
the first part of #8 fails; since the array is of size 1, it's not big
enough to point at another element 1 past the first. (Hold it! I haven't
forgotten about "one past" yet, I'm just saying that pointers to
one-past-the-array-end don't point to an element of the array, even though
they are valid pointers. To support this thesis, I point out that such
pointers are valid for comparisons and arithmetic, but invalid to
dereference.) In the second part of the clause, we certainly have to
accept that the pointer P points to the last element of the array object,
since the array is of size 1 according to #7. So apparently the expression
(P)+1 must point one past the last element. If the maths is done by
integer maths rules, the expression (P+1) is evaluated to have the same
value (and representation) as NULL, and so cannot be a valid pointer.

  I think we should conclude from all this that a) the original code
exhibits undefined behaviour b) to be ANSI conforming, a compiler that
works with pointers based on 32 bit integer arithmetic must be sure never
to allocate any array or object that occupies address 0xFFFFFFFF.

> Either way, wrapping is well-defined/irrelevant.  Optimization must
> just preserve the correct semantics.  If an optimization causes values
> to be used which might overflow or cause other problems when the
> original values would not, then that's a bad optimization.  But this
> has nothing to do with "pointer wrapping" or "pointers being unsigned"
> or any other such nonsense.  It's just a bad optimization.

  Ah, but this is the whole point of undefined behaviour: it's not defined
because it's accepted that the compiler can't practically be expected to
produce correct code for all the possible dodgy constructs. There is NO
onus on a compiler to produce valid code for a program which invokes
undefined behaviour. In pracise, this means that people writing
optimization code don't need to worry about taking care of cases that
could only occur if the program has invalid behaviour. That's why the
optimization doesn't need to worry about the danger of wrapping round the
constant when it increments it, because the only time this could make a
difference is when we've done something wrong already.

           DaveK

Follow-Ups:
- Re: Bug in loop optimize (invalid postinc to preinc transformation)
  - From: Richard Henderson
- Re: Bug in loop optimize (invalid postinc to preinc transformation)
  - From: Tim Hollebeek

References:
- Re: Bug in loop optimize (invalid postinc to preinc transformation)
  - From: Tim Hollebeek

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]