This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GCC optimizes integer overflow: bug or feature?


This overflow attitude has some resemblance to the attitude that resulted in the Y2K issues. I don't try to troll, I have a detailed explanation below.
Some time ago (a year?) I was told on this mailing-list that code breakage due to undefinedness of signed overflow is not too common (I at least claimed with no evidence that it was more than one bug per 1,000 lines). My claim was counterclaimed by something like "most of the time people work with small enough values, so overflows don't happen too many times in practice". At first, this sounds reasonable. Then, I realized that it sounds awfully similar to "two digits are enough to represent a year".
You may think that the analogy is far fetched? In that case, I'll pick some gcc source file, at random and look for signed operations in it:
categorize_ctor_elements_1(......) in gcc/expr.c:
_elts += mult * nz;
elt_count += mult * ic;
Both assume that neither the multiplication, nor the addition overflow. Can GCC be really sure that this case will be caught outside this function, before it is called? Will it be always so in the future? You can't even add an assert to make sure that no overflow had occurred. You can't write:
temp=mult*nz;
a_gcc_assert(mult !=0 && temp/mult==nz);
a_gcc_assert(_elts+temp >= _elts);
_elts += temp;
You'd have to add a very weird looking assert like:
some_gcc_assert(
nz !=0 || (unsigned HOST_WIDE_INT)mult*nz/nz == (unsigned HOST_WIDE_INT)mult &&
(HOST_WIDE_INT)((unsigned HOST_WIDE_INT)mult*nz) >= 0
);
This is very ugly. You see, it was quite easy to find code that works today but may be easily break due to unrelated code shuffling that happen in other source files, without any warning!


<rant on C standard>
I don't understand why the standard committee did not add a new keyword (say nowrap) instead of going from implementation defined overflows to undefined. If one wants fast induction variables, it would have been possible to write:
int f(int k)
{
nowrap int i;
int ret=k;
for (i=0; i <= k ; ++i)
++ret;
return ret;
}
and get the code transformed to the simple "return k*2+1;" which will be implementation defined for k >= MAX_INT/2 and undefined only for k==MAX_INT.


I know, adding a keyword may break existing code, but so does adding rules for "undefined" behavior. Having a predictable breakage at compile time is better than having code break at random at runtime.
</rant>


Gabriel Dos Reis wrote:
The problem is that what constitutes a valid program tends to differ
from what constitutes a useful program found in the wild.  The
question for GCC is how to accomodate for the useful uses without
getting far away from both conformance and non-conformance.

I don't believe this particular issue of optimization based on
"undefined behaviour" can be resolved by just telling people "hey
look, the C standard says it is undefined, therefore we can optimize.
Exactly. For example take a look at the strict-aliasing issue. First optimizations were introduced such that if you did not conform to strict aliasing rules your code would silently break. In later gcc versions, warnings have been introduced in order to tell the user "hey, this code might break strict aliasing, are you sure you know what you are doing?". The current state of the strict-aliasing warnings is still imperfect, but much better and improving.
On the other hand, the state of signed overflow is quite grim. Even though the signed overflow issue has been here for over a decade, there is almost no warning on high risk code. Unfortunately, it is much more difficult to have good warning on potential overflow bugs than for strict-aliasing (with negligible false positive, and reasonable false negative). This is true, at least for predictable warnings (that don't depend on "irrelevant" architecture/code-gen/optimization flags).
And if you're not happy, just tell the compiler not to optimize".
For not all undefined behaviour are equal, and undefined behaviour is
also a latitude given to implementors to better serve their base
users.
Undefined behavior is bad. I think that any language spec should keep the amount of "undefined" behavior at a bare minimum. The whole issue of C's (or C++'s) undefined behavior is very unintuitive for most code writers:

   * Students that learn C/C++ have too much at their hands as it is.
     They won't be able to internalize these issues on top of other new
     concepts they have to learn: recursion, free store, algorithms, etc.
   * Many lecturers (I have a gut feeling that most) that teach C/C++
     don't understand these issues well enough to teach them in a basic
     C/C++ course.
   * Books barely cover (if at all) these issues. Certainly, these
     issues are not stressed every time the terms "signed" or "type
     casting" are mentioned.
   * At verification conferences (and in other cases) I had the
     opportunity to talk to people that work on formal verification of
     C/C++ from the industry and academia. Surprisingly they don't
     verify/prove that overflow does not happen most of the time. They
     don't seem to be aware to the undefinedness of signed overflow. In
     several cases I asked, and was told that they assume (unsigned)
     modulo arithmetics for signed integers!
     I am not sure (this was two years ago), but I think that this was
     so even for the Microsoft's (named BLAST back then) formal
     verification tool (used for driver verification).


Michael



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]