This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: GCC optimizes integer overflow: bug or feature?


On 20 December 2006 21:42, Matthew Woehlke wrote:

> Dave Korn wrote:
>>> Particularly lock-free queues whose correct
>>> operation is critically dependent on the order in which the loads and
>>> stores are performed.
>> 
>>   No, absolutely not.  Lock-free queues work by (for example) having a
>> single producer and a single consumer, storing the queue in a circular
>> buffer, and assigning ownership of the queue's head pointer to the
>> producer and the (chasing) tail pointer to the consumer.
>> [snip]
>> The ordering is critical within a single thread of execution; e.g. you must
>>  fill in all the details of the new entry you are adding to the queue
>> /before/ you increment the head pointer,
> 
> Exactly. Guess what the compiler did to us? :-) "Oh, no, I'm /sure/ it's
> OK if I re-order your code so that those assignments happen /after/ I
> handle this dinky little increment for you." Now our code may have been
> wrong in this instance 

  Exactly and unquestionably so.  You wrote wrong code, it worked not how you
expected it, but the compiler *did* do exactly as you told it to and there's
an end of it.  You left out 'volatile', you completely lied to and misled the
compiler!

> Let's be clear. Order matters. /You/ said so yourself. :-) And even if
> the programmer correctly tells the compiler what to do, what (unless the
> compiler inserts memory barriers) keeps the CPU from circumventing both?

  You wrote wrong code, it worked not how you expected it, but the compiler
*did* do exactly as you told it to and there's an end of it.  You left out
'volatile', you completely lied to and misled the compiler about what /it/
could assume about the behaviour of entities in the compiled universe.

> That said, I've seen even stranger things, too. For example:
> 
> foo->bar = make_a_bar();
> foo->bar->none = value;
> 
> being rendered as:
> 
> call make_a_bar
> foo->bar->none = value
> foo->bar = <result of make_a_bar()>
> 
> So what was wrong with my C code in that instance? :-)

  You'd have to show me the actual real code before I could tell you whether
there was a bug in your code or you hit a real bug in the compiler.  Either is
possible; the virtual machine definition in the standard AKA the 'as-if' rule
is what decides which is correct and which is wrong.

  C is no longer a glorified shorthand for assembly code.  It did /used to
be/, but every development since K'n'R days has taken it further from that.
At -O0, it still /almost/ is a glorified assembler.

>>> This is a very real, entirely legitimate example
>>> where the compiler thinking it knows how to do my job better than I do
>>> is wrong.
>> 
>>   Nope, this is a very real example of a bug in your algorithmic design,
>> or of you misleading the compiler, or relying on undefined or
>> implementation-defined behaviour. [snip]
>>   This simply means you have failed to correctly declare a variable
>> volatile that in fact /is/ likely to be spontaneously changed by a
>> separate thread of execution.
> 
> /That/ is very possible. I'm talking about /OLD/ code here, i.e. code
> that was written back in K&R days, back before there /was/ a volatile
> keyword. (Although I had understood that 'volatile' was a no-op in most
> modern compilers? Does it have the semantics that loads/stores of
> volatile variables are not re-ordered with respect to each other?)

  Exactly so, that's why 'volatile' was chosen as the keyword to extend 'asm'
in order to mean "don't reorder past here because it's unpredictable".

  As I said, C is no longer a glorified assembler language, and one of the
main motivations behind that progression is the non-portability of code such
as you describe above, the fact that making assumptions about the precise
details behind a compiled version of any particular code sequence is a mistake
because they'll only turn out to be right on some platforms and with some
compiler versions and not with others.  The compiler is *not* a glorified
assembler; if you really want guarantees about what codegen you get, use a .S
file and write assembler.  

  Or use a 20-year-old version of the compiler if what you *really* want is
the exact codegen it used to do 20 years ago.  It's open source, it's free,
you can get it and use it and keep on using it for ever and it will *always*
produce exactly what you expect by the way of codegen.

  These days, however, you have to accept that your code is not valid; that it
contains implicit assumptions that were only ever going to be true for the
particular compiler version and on the particular platform where it was first
written.  And that code needs maintenance, as a matter of routine.  If you
want to change one part of a system, you must change all the interacting parts
to match.  If you want your source code to work exactly how it did twenty
years ago, the rest of your toolchain better behave exactly the same as it did
twenty years ago.

  Of course, this leaves us a big problem.  How can anything ever change,
advance, or improve, if either everything has to change at once, or everything
will break together?  

  Well, turns out there /is/ an answer.  It's a methodological answer.  And
that answer is to define a specification, and a standard, and to agree on how
the language and the interfaces behave in generic and abstract terms, and
therefore on what behaviour your code /can/ assume, and on what it can't; on
which aspects of the generated code are /essential/, and on which are
/accidental/.  And this strategy has been adopted and used, and you can
certainly argue some points, like whether it's worth the burden of having
undefined behaviour in certain cases in order to allow the possibility of a
core standardised version of the language that is portable even to odd
environments like 1's-complement machines or 36-bit word sizes, but that /is/
why the trade-offs have been adopted, and it /is/ why there is a standard, and
it /is/ why it makes sense to code to a concensually-agreed community-derived
standard and to accept that you can rely on the assumptions it guarantees you
and should not rely on the ones it does not guarantee that were just a
fortuitous coincidence of a particular OS, environment and toolchain that once
made these assumptions valid for you.

> At any rate, I don't recall now if making the variable in question
> 'volatile' helped or not. Maybe this is an exercise in why changing
> long-standing semantics has an insidious and hard to correct effect.
> (Does that conversation sound familiar?)

  But of course; that's always tricky, and as I suggest above, the use of a
standardised semantics defined in as environment-independent a way as possible
is one of the human race's most powerful analytical tools for addressing this
issue.

  However, the answer is almost certainly that it was the lack of a volatile
declaration /somewhere/ that was the root of your problem.  It's not possible
to know for sure without seeing the code, but the fact that adding a barrier
fixed it is highly suggestive.

> You're preaching to the choir.

  Glad to hear it!  I hope I'm preaching nothing more than sound engineering
practise.

> Unfortunately adding proper assembly
> modules to our build system didn't go over so well, and I am /not/ going
> to try to write inline assembler that works on six-or-more different
> compilers. :-) So I know this is just a 'cross your fingers and hope it
> works' approach on non-x86 platforms. On x86 I use the correct,
> previously-quoted inline assembly, which as mentioned acts as a barrier
> for both the compiler /and/ the CPU. As you say, all it's really
> ensuring in other cases is that the compiler remains honest, but that
> was the intent, and I know it isn't perfect.

  And of course I can't dispute the need for pragmatic compromises in reality;
in such cases, your most effective option is often just to bite the bullet and
do something less-than-ideal.

  But I guess you should at least leave some big warnings in the comments!

    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]