This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GCC optimizes integer overflow: bug or feature?


Paolo Bonzini wrote:

Some time ago (a year?) I was told on this mailing-list that code breakage due to undefinedness of signed overflow is not too common (I at least claimed with no evidence that it was more than one bug per 1,000 lines). My claim was counterclaimed by something like "most of the time people work with small enough values, so overflows don't happen too many times in practice".

This claim is a non sequitur.


What I would say is, people *don't understand why a compiler needs to assume undefined overflow semantics*, because people work with small values and don't care about the boundary conditions.

But that is generally true for stuff like gets/sprintf/strcpy/unsafe operation de-jour . People don't care for overflow cases of sprintf because they work with small data. Yet when, suddenly, the size of the data grows and funny stuff start to happen *due* to their sprintf et. al.
For example, most programmers that know assembly language will appreciate if the compiler can use the processor's support for loop with a known number of iterations (mtctr on PPC, for example). However, I'm pretty sure that, if you present these two pieces of code to some good programmers,

  unsigned int i;
  for (i = 0; i <= n; i++)
    ...

  unsigned int i;
  for (i = 0; i < n; i++)
    ...

where the compiler uses mtctr only in the second case, most of them will think that the compiler has bug. Almost nobody will realize that the first can loop infinitely, and the second cannot (which is the reason why the compiler cannot optimize them in the same way).

Unfortunately, some compilers (not gcc) would optimize both cases the same way.
Well, these programmers *are* assuming undefined overflow semantics even on unsigned types. Maybe they would like overflow semantics should be defined in some cases and undefined in others? Fine by me, that would be -fwrapv -funsafe-loop-optimizations in GCC; but a language standard cannot go to such detail!

It could add the type modifier "nowrap", such that loop indexes can be marked this way explicitly, signed or unsigned.
On the autoconf mailing list, Paul Eggert mentioned as a good compromise that GCC could treat signed overflow as undefined only for loops and not in general. Except that the original gnulib bug report was in a loop, so this compromise would leave that case undefined.

I'd vote for Paul's suggestion. It would have the least-surprise effect.
You may think that the analogy is far fetched? In that case, I'll pick some gcc source file, at random and look for signed operations in it:
categorize_ctor_elements_1(......) in gcc/expr.c:
_elts += mult * nz;
elt_count += mult * ic;
Both assume that neither the multiplication, nor the addition overflow.

I see this as a *counterexample*: it shows that programmers don't care about having wrapping overflow, in fact they don't care about overflow at all. This code is incorrect not only if overflow is undefined, but also if overflow wraps (-fwrapv); it is correct if overflow aborts (-ftrapv).
No, it is an example for a bug which is difficult to protect against with an assertion. Even more seriously, instead of a simple mere bug with bad behavior we get escalated to undefined behavior. "Undefined" is the worst kind since in theory it is allowed to clean-up your disk, and explode your computer and start WW-III. Can this happen in this case? it depends what the caller does and if gcc can see the caller and the callee at the same time (e.g. if gcc knows that a caller causes mult=MAX_INT/2, it might assume that nz <=2 for that caller, and perform some nasty transformations down the line).

I am not saying that GCC should abandon all optimizations that assume that no execution path gets to undefined cases. I am saying that these things should not be taken lightly. Saying "programmers don't care about having ****[something leading to undefined behavior]" is simply ignoring the graveness of the bad effects. By taking seriously I include the VRP that due to its lacking data structure (at least it used to be so) would be much less effective if gcc would to assume wrapping int.

Instead of having to choose between the bad alternatives of VRP that gives weird results for undefined cases or does barely anything, its data structure should be improved, and each variable should have a set of ranges instead of a single range (like what I have seen in some Constraints/CP papers).

------------

The second point in this example is:

Checking for buffer overflows (undefined code) before they occur is trivial in most cases. Checking for a wrapping "signed int" before it actually wraps is relatively complex and unintuitive (e.g. it is unintuitive why should a/b ever overflow, hint: MIN_INT/-1 --> MIN_INT). I would prefer plain bugs, in which -g and -O3 act just the same, over undefined behavior when I have no choice but to debug a -O3 code !!!
I can't forget the day (several years ago) when I had to debug my MyAllocator<T> class inside


std::set<int,std::less<int>, MyAllocator<int> >::insert(int);

which would crash only with -O2 on PPC, after some sequence of inserts and erases (a dumb strict-aliasing bug, which was tough to hunt down). It was not pretty. This bug should be handed to all those who would like to expand the meaning of "undefined". They should try to debug it, and then say if they would still want to add more "undefined" clauses to the standard. The new gcc warnings could have helped me in this case.


Michael



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]