Bug 17884

Summary: asm 'volatile' is not honored as documented
Product: gcc Reporter: Fariborz Jahanian <fjahanian>
Component: tree-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED INVALID    
Severity: normal CC: amacleod, bjoern.m.haase, dalej, gcc-bugs, obrien, pardo
Priority: P2 Keywords: documentation
Version: 4.0.0   
Target Milestone: 4.0.0   
Host: powerpc-apple-darwin7.2.0 Target: powerpc-apple-darwin7.2.0
Build: powerpc-apple-darwin7.2.0 Known to work:
Known to fail: Last reconfirmed: 2005-01-11 02:45:23
Attachments: preliminary patch to fix this bug

Description Fariborz Jahanian 2004-10-07 19:30:57 UTC
In the following test case compiled with -O1 the add instruction is found *outside* the protective
"mtfsf". 'volatile' keyword added to 'asm' was meant to prevent such cases. gcc-3.3 behaves
correctly. Is this a know problem? Is there a PR already for it?

double foo( )
{
        double sum;
        double fpenv;
        double DDD, FFF;

        sum = DDD + FFF;
        asm volatile("mtfsf 255,%0" : : "f" (fpenv));
        return sum;
}

_foo:
        mtfsf 255,f0
        fadd f1,f1,f0
        blr
Comment 1 Andrew Pinski 2004-10-07 19:35:47 UTC
Confirmed, this is caused by -ftree-ter.
Comment 2 Andrew Macleod 2004-10-07 20:02:22 UTC
ok, so how is this suppose to work again?
Entering SSA->normal, I see:

  double FFF;
  double DDD;
  double fpenv;
  double sum;
 
  # BLOCK 0
  # PRED: ENTRY [100.0%]  (fallthru,exec)
  sum_3 = DDD_1 + FFF_2;
  __asm__ __volatile__("mtfsf 255,%0"::"f" fpenv_4);
  return sum_3;
  # SUCC: EXIT [100.0%]


is the volatile in ASM suppose to block all movement across it or something?
There is nothing in the asm instruction indicating it has anything to do with
sum_3....

Comment 3 Fariborz Jahanian 2004-10-07 20:37:31 UTC
From: http://gcc.gnu.org/onlinedocs/gcc-3.4.0/gcc/Extended-Asm.html#Extended%20Asm

The volatile keyword indicates that the instruction has important side-effects. GCC will not delete a 
volatile asm if it is reachable. (The instruction can still be deleted if GCC can prove that control-flow 
will never reach the location of the instruction.) In addition, GCC will not reschedule instructions across 
a volatile asm instruction. 
Comment 4 Andrew Macleod 2004-10-07 21:45:08 UTC
Created attachment 7305 [details]
preliminary patch to fix this bug

This ought to fix the problem.	I will do a bootstrap on x86, but someone might
want to give this a try on PPC to make sure it works for them too.

The result for the test file is:
_foo:
	fadd f1,f1,f1
	mtfsf 255,f0
	blr


is that more to your liking? :-)
Comment 5 Fariborz Jahanian 2004-10-07 21:51:46 UTC
Thanks a lot. I will test the patch on apple-ppc-darwin and let you know.
Comment 6 Fariborz Jahanian 2004-10-07 22:02:03 UTC
My test cases look good with this patch. I will do the usual bootstrap, dejagnu testing on
apple-ppc-darwin. But I can't imagine this patch can break anything.
Comment 7 Andrew Macleod 2004-10-08 14:35:09 UTC
So after a discussion, I dont think the patch will be applied.  

A follow on sentence in the documentation reads:

"Note that even a volatile asm instruction can be moved in ways
that appear insignificant to the compiler, such as across jump
instructions.  You can't expect a sequence of volatile asm
instructions to remain perfectly consecutive.  If you want consecutive
output, use a single asm.  Also, GCC will perform some
optimizations across a volatile asm instruction; GCC does not
``forget everything'' when it encounters a volatile asm
instruction the way some other compilers do."


Thats what it is doing in this case, we are moving unrelated instructions across
the volatile asm. 

RTH suggested that we remove the sentence "In addition, GCC will not reschedule
instructions across a volatile asm instruction." from the documention since it
is incorrect given the above note.

It you need instructions not to be moved across a volatile, they must be exposed
in the asm.
Comment 8 Fariborz Jahanian 2004-10-08 16:23:59 UTC
But this is a regression from gcc-3.3. Also, without this patch, there is no other place which checks
for a volatility of an 'asm' statement. Then why not just say in the documentation that 'volatile'
has no effect on an 'asm'? BTW, thanks for preparing the patch for me.
Comment 9 Dale Johannesen 2004-10-08 18:27:12 UTC
The doc also says "You can prevent an asm from being deleted, moved significantly, or combined,
by writing the keyword volatile after the asm."  Which seems to contradict the later statement
that they can be moved across jumps; if that's not significant motion, what is?  Both sentences
go back to at least 2.95.

So what are the semantics of 'asm volatile'?  If it does not disable code motion, it seems to me
Fariborz is right that it's a NOP and should be deprecated.  But there is a lot of code around here
that uses it, and worked as intended with 3.3.  IMO the incorrect statement in the doc is the
last one, not the first two.

Comment 10 Andrew Macleod 2004-10-08 18:37:23 UTC
The claim is the documentation has been wrong for a long time as well. I don't
beleive it has ever acted as a scheduling barrier.   

volatile does not behave as a NOP because it *does* guarantee that the stmt will
not be removed unless it is unreachable. And therefore anything which feeds it
will stick around and not be moved past it.  That gives a volatile asm the same
semantics as a volatile construct in C.  There aren't really other references to
it in the tree optimizers because I beleive we do the same thing for all asms...
ie, I dont beleive we delete ASM of any sort unless they are unreachable.

Comment 11 Diego Novillo 2004-10-08 18:45:53 UTC
Subject: Re:  [4.0 Regression] asm 'volatile'
	is not honored as documented

On Fri, 2004-10-08 at 14:37, amacleod at redhat dot com wrote:

> ie, I dont beleive we delete ASM of any sort unless they are unreachable.
> 
Right.  DCE on trees always marks ASM expressions as inherently
necessary.


Diego.

Comment 12 Fariborz Jahanian 2004-10-08 18:48:48 UTC
If I read you correctly, first you say:
"volatile does not behave as a NOP because it *does* guarantee that the stmt will
not be removed unless it is unreachable".
And then:
" I dont beleive we delete ASM of any sort unless they are unreachable".

So, question remains. what does 'volatile' do to an asm in gcc-4.0?
Comment 13 Andrew Macleod 2004-10-08 18:53:23 UTC
It does whatever it use to do in the RTL part of the compiler. 4.0 hasnt changed
that, whatever it is.  I suspect the RTL optimizers are capable of removing
ASMs. So without volatile, the RTL backend might/could remove it.

IIn the future the tree optimizers might be smart enough to do something
differnt with ASMs, and then it could be relevent whether volatile is set or
not.  It just isn't right now.
Comment 14 Fariborz Jahanian 2004-10-08 19:25:47 UTC
Following test case shows that 'volatile' is not needed to prevent removal of an 'asm'.

% cat t.c
int main()
{
        asm ("nop");
}

% mygccm5o -S -O4 t.c
% cat t.s
.text
        .align 2
        .globl _main
_main:
        nop
        blr
        .subsections_via_symbols
Comment 15 Andrew Pinski 2004-10-08 19:28:53 UTC
asm with no operands always acts as volatile (this is documented too).
Comment 16 Fariborz Jahanian 2004-10-08 19:41:21 UTC
change nop->isync , or something else.
Comment 17 Dale Johannesen 2004-10-08 21:04:12 UTC
The question was discussed extensively in 2001.  The phrase
"In addition, GCC will not reschedule
instructions across a volatile asm instruction." 
was added here, by Mark Mitchell based on an earlier comment by rth.
http://gcc.gnu.org/ml/gcc-patches/2001-04/msg01477.html
If you read the whole thread, rth is quite firm that 'asm volatile' is
a scheduling barrier.
Comment 18 Richard Henderson 2004-10-12 17:39:38 UTC
Ah, I see some amount of confusion here.

Yes, in the referenced message I do argue for "asm volatile" to be a scheduling
barrier.  You'll note that what I'm most concerned about there is that the asm
remain in the body of the function and not get scheduled into the epilogue.
Which does indeed sound like it might could cause problems.  Too bad I didn't
bother to write a test case demonstrating what I had in mind, but there ya go.

As for transformations that are not *scheduling*, the documentation is also quite
clear: 'GCC will perform some optimizations across a volatile asm instruction;
GCC does not “forget everything” when it encounters a volatile asm instruction'.

Which means that if you try to use volatile asms for fp control word
manipulation, you will be sorely disappointed, because *every* existing pass
of the compiler will thwart you.  If you thought this worked before gcc 4.0,
you were mistaken -- tweak the test case a bit and it'll fail on any arbitrary
gcc release.

I have no particular interest in changing this, because gcc doesn't understand
#pragma FENV_ACCESS, and nothing you do until that support is added is going to
help.  I'll state for the record that FENV_ACCESS will be a *huge* amount of work.

We can clarify the documentation, if you can be specific about what you thought
was confusing, but that's the most I'm willing to accept for this PR.
Comment 19 Dale Johannesen 2004-10-12 18:30:16 UTC
OK, thanks.  From this it appears that the only effect of 'asm volatile' that users
can safely rely on is that such an instruction will not be deleted.  If this is
agreeable to everybody, I will revise the documentation to say that.
Comment 20 David Schultz 2005-01-14 21:34:03 UTC
Here's another test case that shows that gcc 3.4.2 on i386 will reschedule other
instructions (in particular, other asms that are not volatile) across a volatile
asm:

#include <sys/types.h>
#define __fldenv(env)           __asm __volatile("fldenv %0" : : "m" (env))
#define __fnstsw(__sw)          __asm("fnstsw %0" : "=am" (*(__sw)))
#define __ldmxcsr(csr)          __asm __volatile("ldmxcsr %0" : : "m" (csr))
#define __stmxcsr(csr)          __asm("stmxcsr %0" : "=m" (*(csr)))

typedef struct {
        struct {
                uint32_t        control;
                uint32_t        status;
                uint32_t        tag;
                char            other[16];
        } x87;
        uint32_t                mxcsr;
} fenv_t;

int feraiseexcept(int __excepts);

int
feupdateenv(const fenv_t *envp)
{
        int mxcsr, status;

        __fnstsw(&status);
        __stmxcsr(&mxcsr);
        __fldenv(envp->x87);            /* volatile */
        __ldmxcsr(envp->mxcsr);         /* volatile */
        feraiseexcept((mxcsr | status) & 0x3f);
        return (0);
}

The code generated at -O2 is as follows:

feupdateenv:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $20, %esp
        movl    8(%ebp), %eax
#APP
        fnstsw -4(%ebp)
        stmxcsr -8(%ebp)
        fldenv (%eax)
        ldmxcsr 28(%eax)
        fnstsw %eax              <--- should be movl -4(%ebp), %eax
#NO_APP
        orl     -8(%ebp), %eax
        andl    $63, %eax
        pushl   %eax
        call    feraiseexcept
        xorl    %eax, %eax
        leave
        ret

The fnstsw is moved (duplicated, actually) across the fldenv and ldmxcsr, which
are volatile.  At -O1, gcc gets this "right".

My vote is to fix the code rather than the documentation; if volatile asms don't
act as sequence points, then in code such as the above, it becomes necessary to
apply the sledgehammer of marking *all* asms volatile.  This is because there's
no way to express to gcc that an asm reads from a hard register such as the
FPSR.  If the instructions that write the FPSR (already marked volatile in the
example) acted as sequence points, we would be guaranteed that instructions that
read the FPSR could not be moved past them, without having to make the latter
volatile.

Note that even if you don't go with my idea, the following proposal isn't quite
accurate:

> From this it appears that the only effect of 'asm volatile' that users
> can safely rely on is that such an instruction will not be deleted.

It is also true (hopefully) that volatile asms cannot be reordered with respect
to each other, w.r.t. function calls, etc.  This is not true of ordinary asms,
as seen above.
Comment 21 Richard Henderson 2005-01-20 01:33:52 UTC
In reply to comment #20:

Again, this is not scheduling, per se.  This is register
rematerialization.  We have a value at some point, and we
decide that it's cheaper to move the computation rather
than store and reload it.

This is really no different than if we decided to CSE the
computation as in

    __fnstsw(&s1);
    __fldenv(envp->x87);            /* volatile */
    __fnstsw(&s2);
->
    __fnstsw(&s1);
    __fldenv(envp->x87);            /* volatile */
    s2 = s1;

I must repeat myself that the original source code is 
buggy.  You've got asms that affect, or are affected by,
architectural state that is not visible to the compiler.
As such you REALLY REALLY MUST mark the asm as volatile.
Comment 22 Richard Henderson 2005-02-02 07:18:25 UTC
No one suggested anything for clarification, but I don't see a bug here.
Comment 23 Dale Johannesen 2005-02-02 18:19:27 UTC
Actually I adjusted the doc to my satisfaction in this thread:
http://gcc.gnu.org/ml/gcc-patches/2004-10/msg01048.html
I suppose it's OK to close now.
Comment 24 Paul Schlie 2005-02-02 22:16:32 UTC
(In reply to comment #23)

Just to double check, however as may be required,  instructions which
may be affected by a modifyable control register state may denote that
dependance in the instruction's rtl description to prevent the scheduling
of any instruction which may modify that state past past it, without needing
to denote the control state modifying asm instruction volatile, as long as
it's following depandant instructions are themselfs reachable and nessisary?

Comment 25 Andrew Pinski 2005-10-01 19:48:15 UTC
*** Bug 24165 has been marked as a duplicate of this bug. ***
Comment 26 Andrew Pinski 2008-09-23 23:50:51 UTC
*** Bug 37631 has been marked as a duplicate of this bug. ***