This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: GCC optimizes integer overflow: bug or feature?
This overflow attitude has some resemblance to the attitude that
resulted in the Y2K issues. I don't try to troll, I have a detailed
explanation below.
Some time ago (a year?) I was told on this mailing-list that code
breakage due to undefinedness of signed overflow is not too common (I at
least claimed with no evidence that it was more than one bug per 1,000
lines). My claim was counterclaimed by something like "most of the time
people work with small enough values, so overflows don't happen too many
times in practice". At first, this sounds reasonable. Then, I realized
that it sounds awfully similar to "two digits are enough to represent a
year".
You may think that the analogy is far fetched? In that case, I'll pick
some gcc source file, at random and look for signed operations in it:
categorize_ctor_elements_1(......) in gcc/expr.c:
_elts += mult * nz;
elt_count += mult * ic;
Both assume that neither the multiplication, nor the addition overflow.
Can GCC be really sure that this case will be caught outside this
function, before it is called? Will it be always so in the future? You
can't even add an assert to make sure that no overflow had occurred. You
can't write:
temp=mult*nz;
a_gcc_assert(mult !=0 && temp/mult==nz);
a_gcc_assert(_elts+temp >= _elts);
_elts += temp;
You'd have to add a very weird looking assert like:
some_gcc_assert(
nz !=0 || (unsigned HOST_WIDE_INT)mult*nz/nz == (unsigned
HOST_WIDE_INT)mult &&
(HOST_WIDE_INT)((unsigned HOST_WIDE_INT)mult*nz) >= 0
);
This is very ugly. You see, it was quite easy to find code that works
today but may be easily break due to unrelated code shuffling that
happen in other source files, without any warning!
<rant on C standard>
I don't understand why the standard committee did not add a new keyword
(say nowrap) instead of going from implementation defined overflows to
undefined. If one wants fast induction variables, it would have been
possible to write:
int f(int k)
{
nowrap int i;
int ret=k;
for (i=0; i <= k ; ++i)
++ret;
return ret;
}
and get the code transformed to the simple "return k*2+1;" which will be
implementation defined for k >= MAX_INT/2 and undefined only for k==MAX_INT.
I know, adding a keyword may break existing code, but so does adding
rules for "undefined" behavior. Having a predictable breakage at compile
time is better than having code break at random at runtime.
</rant>
Gabriel Dos Reis wrote:
The problem is that what constitutes a valid program tends to differ
from what constitutes a useful program found in the wild. The
question for GCC is how to accomodate for the useful uses without
getting far away from both conformance and non-conformance.
I don't believe this particular issue of optimization based on
"undefined behaviour" can be resolved by just telling people "hey
look, the C standard says it is undefined, therefore we can optimize.
Exactly. For example take a look at the strict-aliasing issue. First
optimizations were introduced such that if you did not conform to strict
aliasing rules your code would silently break. In later gcc versions,
warnings have been introduced in order to tell the user "hey, this code
might break strict aliasing, are you sure you know what you are doing?".
The current state of the strict-aliasing warnings is still imperfect,
but much better and improving.
On the other hand, the state of signed overflow is quite grim. Even
though the signed overflow issue has been here for over a decade, there
is almost no warning on high risk code. Unfortunately, it is much more
difficult to have good warning on potential overflow bugs than for
strict-aliasing (with negligible false positive, and reasonable false
negative). This is true, at least for predictable warnings (that don't
depend on "irrelevant" architecture/code-gen/optimization flags).
And if you're not happy, just tell the compiler not to optimize".
For not all undefined behaviour are equal, and undefined behaviour is
also a latitude given to implementors to better serve their base
users.
Undefined behavior is bad. I think that any language spec should keep
the amount of "undefined" behavior at a bare minimum. The whole issue of
C's (or C++'s) undefined behavior is very unintuitive for most code writers:
* Students that learn C/C++ have too much at their hands as it is.
They won't be able to internalize these issues on top of other new
concepts they have to learn: recursion, free store, algorithms, etc.
* Many lecturers (I have a gut feeling that most) that teach C/C++
don't understand these issues well enough to teach them in a basic
C/C++ course.
* Books barely cover (if at all) these issues. Certainly, these
issues are not stressed every time the terms "signed" or "type
casting" are mentioned.
* At verification conferences (and in other cases) I had the
opportunity to talk to people that work on formal verification of
C/C++ from the industry and academia. Surprisingly they don't
verify/prove that overflow does not happen most of the time. They
don't seem to be aware to the undefinedness of signed overflow. In
several cases I asked, and was told that they assume (unsigned)
modulo arithmetics for signed integers!
I am not sure (this was two years ago), but I think that this was
so even for the Microsoft's (named BLAST back then) formal
verification tool (used for driver verification).
Michael