Optimisations and undefined behaviour

David Brown david.brown@hesbynett.no
Fri Nov 6 12:32:00 GMT 2015

On 06/11/15 11:19, Andrew Haley wrote:
> On 05/11/15 20:52, David Brown wrote:
>> I would like to know where the gcc developers stand on this - how can 
>> the rules of the standards be interpreted?  And how /should/ they be 
>> interpreted?  Does gcc aim for the most efficient code, or for the 
>> principle of least astonishment?
>> For example, consider this requirement:
>> Write a function "int foo(int x)" that will call "bar(x)" if x is less 
>> than 50000.  Then if x is less than 1000, return x*x*x - otherwise 
>> return any integer.  (Assume 32-bit int's, no traps or other "funny 
>> stuff" on the target.)
>> It seems reasonable to write:
>> 	int foo(int x) {
>> 		if (x < 50000) bar(x);
>> 		return x*x*x;
>> 	}
>> For x greater than 1290, the cubing is going to overflow and generate a 
>> mathematically incorrect answer.  But that's okay - we don't care about 
>> the result in such cases.
>> The question is, can the compiler reason like this:
>> If x > 1290, then undefined behaviour will occur.  The user clearly 
>> doesn't care what happens when x > 1290.  Therefore, we can assume that 
>> x <= 1290.  Therefore, x < 50000.  Therefore, there is no need to check 
>> the conditional, and bar(x) can be called directly.
> That's right.
>> The point here is that if the code hits undefined behaviour, then the 
>> behaviour of the entire program is undefined.  Thus the undefined 
>> behaviour in "x*x*x" can work backwards.
>> Equally clearly, this would be against the expectations of the 
>> programmer.
> Well, with UB it's not really possible to know the expectations of
> the programmer, and I'm not sure how much sense it makes to even try.

I disagree.  It is clear that the programmer does not expect any
sensible result from x*x*x if x is too big.  But it is also clear that
the programmer only expects bar() do be called if x is less than 50000.

How about this case:

int foo(int x) {
	if (x > 1290) {
		printf("X is wrong here %d, but we don't care\n", x);
	return x*x*x;

The compiler can eliminate the check and the printf.  Yes, the code has
undefined behaviour - the programmer has made a mistake.  But the
compiler's optimisations have made the programmer's job much harder by
removing code that would help his debugging.	

Note that it is not without precedence for the compiler to "guess" what
the programmer meant, as distinct from what he wrote.  There are plenty
of situations where gcc can and does (with the right options) warn about
perfectly legal C code that is almost certainly not what the programmer
intended - such as "if (x = 1)...".

>> And any attempt to avoid the undefined behaviour (such as 
>> changing the last line to "return (x < 1000) ? x*x*x : 0;") would lead 
>> to bigger and slower code than the "obvious" compilation of foo() above.
>> (In tests, gcc and clang both keep the check on x < 50000. 
>> <https://gcc.godbolt.org/> is really handy!)
>> How is the programmer to be sure that the compiler does not apply this 
>> optimisation logic?
> If your code may overflow, use an unsigned type.

Just to be clear here, I am discussing more general concerns rather than
my own programming.  I have studied the standards, and read the gcc
manual, and I write code that avoids undefined behaviour.  But I am
concerned that not all other programmers are as careful or knowledgeable
- and in particular, not all code that is currently used has been
written with the same level of care needed for correct operation using
more modern compilers.  Thus we see cases of code that used to work fine
failing due to more aggressive optimisations.  I want to see how we
avoid this for old code, and how we avoid it happening again in the future.

>> Or consider this code:
>> 	char arr[8192];
>> 	int foo2(int i) {
>> 		if (i < 0) return -1;
>> 		i *= 3;
>> 		if (i < 0) return -2;
>> 		if (i >= 8192) return -3;
>> 		return arr[i]++;
>> 	}
>> The compiler can assume that since integer overflow is undefined 
>> behaviour, and it knows that i >= 0 after the first check, then the 
>> result of "i *= 3" must be non-negative.
> Indeed it can.
>> However, if you pass an integer over (2^31 / 3) to the function, it
>> is likely that after the assembly code for i *= 3 is run, the value
>> in the cpu register will be negative
> You're thinking about this in the wrong way.  C is not defined in
> terms of registers: it is defined by an abstract machine.  It terms of
> the abstract machine, integer overflow is undefined.  Perhaps the
> processor overheats.
>> and the obvious assembly for the "i >= 8192" test will not trigger.
>> The result is access (and modification) to memory well outside the
>> bounds of arr, despite the clear safety check in the code.  The
>> compiler would be allowed to do this even if it follows the
>> Analyzability Annex L in C11 standards - and it would be turning a
>> "bounded undefined behaviour" (integer overflow) into "unbounded
>> undefined behaviour" (modification of memory outside of bounds).
> Indeed so.  Appendix L is interesting, but ensuring that integer
> overflow does not result in wild stores will defeat some
> optimizations.

Is it at least possible to put some restrictions or guarantees in how
far the compiler will go in its optimisations, or how far it will go
without warnings?  I don't want to limit the generation of code, I want
it to be harder for there to be a mismatch between what the compiler
generates and what the programmer intended.  And when there /is/ a
mismatch, I want it to be easier to find - applying undefined behaviour
backwards in the code can make it extremely difficult for the programmer
to find the real problem.

So this is not about the compiler doing something wrong, in terms of
translating C code to efficient object code according the C standards.
It is about avoiding unexpected and undiagnosed new problems in old
code, and helping programmers write correct code that works as intended.

>> In this second example, the skipped tests come /after/ the undefined
>> behaviour, but it could still easily be seen as unexpected behaviour
>> from the compiler.
> Unexpected by who?  Skilled C programmers know that signed types have
> this property, but unsigned types wrap.

This is merely an example - there are many other possible undefined
behaviours.  And while it may be a simple solution to say "it's the
programmer's fault - /skilled/ programmers would not do that", the fact
is that there are many programmers out there who are not that skilled.
The Principle of Least Astonishment is important here - the compiler
should avoid surprising programmers.

More information about the Gcc-help mailing list