In the following test case compiled with -O1 the add instruction is found *outside* the protective "mtfsf". 'volatile' keyword added to 'asm' was meant to prevent such cases. gcc-3.3 behaves correctly. Is this a know problem? Is there a PR already for it? double foo( ) { double sum; double fpenv; double DDD, FFF; sum = DDD + FFF; asm volatile("mtfsf 255,%0" : : "f" (fpenv)); return sum; } _foo: mtfsf 255,f0 fadd f1,f1,f0 blr
Confirmed, this is caused by -ftree-ter.
ok, so how is this suppose to work again? Entering SSA->normal, I see: double FFF; double DDD; double fpenv; double sum; # BLOCK 0 # PRED: ENTRY [100.0%] (fallthru,exec) sum_3 = DDD_1 + FFF_2; __asm__ __volatile__("mtfsf 255,%0"::"f" fpenv_4); return sum_3; # SUCC: EXIT [100.0%] is the volatile in ASM suppose to block all movement across it or something? There is nothing in the asm instruction indicating it has anything to do with sum_3....
From: http://gcc.gnu.org/onlinedocs/gcc-3.4.0/gcc/Extended-Asm.html#Extended%20Asm The volatile keyword indicates that the instruction has important side-effects. GCC will not delete a volatile asm if it is reachable. (The instruction can still be deleted if GCC can prove that control-flow will never reach the location of the instruction.) In addition, GCC will not reschedule instructions across a volatile asm instruction.
Created attachment 7305 [details] preliminary patch to fix this bug This ought to fix the problem. I will do a bootstrap on x86, but someone might want to give this a try on PPC to make sure it works for them too. The result for the test file is: _foo: fadd f1,f1,f1 mtfsf 255,f0 blr is that more to your liking? :-)
Thanks a lot. I will test the patch on apple-ppc-darwin and let you know.
My test cases look good with this patch. I will do the usual bootstrap, dejagnu testing on apple-ppc-darwin. But I can't imagine this patch can break anything.
So after a discussion, I dont think the patch will be applied. A follow on sentence in the documentation reads: "Note that even a volatile asm instruction can be moved in ways that appear insignificant to the compiler, such as across jump instructions. You can't expect a sequence of volatile asm instructions to remain perfectly consecutive. If you want consecutive output, use a single asm. Also, GCC will perform some optimizations across a volatile asm instruction; GCC does not ``forget everything'' when it encounters a volatile asm instruction the way some other compilers do." Thats what it is doing in this case, we are moving unrelated instructions across the volatile asm. RTH suggested that we remove the sentence "In addition, GCC will not reschedule instructions across a volatile asm instruction." from the documention since it is incorrect given the above note. It you need instructions not to be moved across a volatile, they must be exposed in the asm.
But this is a regression from gcc-3.3. Also, without this patch, there is no other place which checks for a volatility of an 'asm' statement. Then why not just say in the documentation that 'volatile' has no effect on an 'asm'? BTW, thanks for preparing the patch for me.
The doc also says "You can prevent an asm from being deleted, moved significantly, or combined, by writing the keyword volatile after the asm." Which seems to contradict the later statement that they can be moved across jumps; if that's not significant motion, what is? Both sentences go back to at least 2.95. So what are the semantics of 'asm volatile'? If it does not disable code motion, it seems to me Fariborz is right that it's a NOP and should be deprecated. But there is a lot of code around here that uses it, and worked as intended with 3.3. IMO the incorrect statement in the doc is the last one, not the first two.
The claim is the documentation has been wrong for a long time as well. I don't beleive it has ever acted as a scheduling barrier. volatile does not behave as a NOP because it *does* guarantee that the stmt will not be removed unless it is unreachable. And therefore anything which feeds it will stick around and not be moved past it. That gives a volatile asm the same semantics as a volatile construct in C. There aren't really other references to it in the tree optimizers because I beleive we do the same thing for all asms... ie, I dont beleive we delete ASM of any sort unless they are unreachable.
Subject: Re: [4.0 Regression] asm 'volatile' is not honored as documented On Fri, 2004-10-08 at 14:37, amacleod at redhat dot com wrote: > ie, I dont beleive we delete ASM of any sort unless they are unreachable. > Right. DCE on trees always marks ASM expressions as inherently necessary. Diego.
If I read you correctly, first you say: "volatile does not behave as a NOP because it *does* guarantee that the stmt will not be removed unless it is unreachable". And then: " I dont beleive we delete ASM of any sort unless they are unreachable". So, question remains. what does 'volatile' do to an asm in gcc-4.0?
It does whatever it use to do in the RTL part of the compiler. 4.0 hasnt changed that, whatever it is. I suspect the RTL optimizers are capable of removing ASMs. So without volatile, the RTL backend might/could remove it. IIn the future the tree optimizers might be smart enough to do something differnt with ASMs, and then it could be relevent whether volatile is set or not. It just isn't right now.
Following test case shows that 'volatile' is not needed to prevent removal of an 'asm'. % cat t.c int main() { asm ("nop"); } % mygccm5o -S -O4 t.c % cat t.s .text .align 2 .globl _main _main: nop blr .subsections_via_symbols
asm with no operands always acts as volatile (this is documented too).
change nop->isync , or something else.
The question was discussed extensively in 2001. The phrase "In addition, GCC will not reschedule instructions across a volatile asm instruction." was added here, by Mark Mitchell based on an earlier comment by rth. http://gcc.gnu.org/ml/gcc-patches/2001-04/msg01477.html If you read the whole thread, rth is quite firm that 'asm volatile' is a scheduling barrier.
Ah, I see some amount of confusion here. Yes, in the referenced message I do argue for "asm volatile" to be a scheduling barrier. You'll note that what I'm most concerned about there is that the asm remain in the body of the function and not get scheduled into the epilogue. Which does indeed sound like it might could cause problems. Too bad I didn't bother to write a test case demonstrating what I had in mind, but there ya go. As for transformations that are not *scheduling*, the documentation is also quite clear: 'GCC will perform some optimizations across a volatile asm instruction; GCC does not “forget everything” when it encounters a volatile asm instruction'. Which means that if you try to use volatile asms for fp control word manipulation, you will be sorely disappointed, because *every* existing pass of the compiler will thwart you. If you thought this worked before gcc 4.0, you were mistaken -- tweak the test case a bit and it'll fail on any arbitrary gcc release. I have no particular interest in changing this, because gcc doesn't understand #pragma FENV_ACCESS, and nothing you do until that support is added is going to help. I'll state for the record that FENV_ACCESS will be a *huge* amount of work. We can clarify the documentation, if you can be specific about what you thought was confusing, but that's the most I'm willing to accept for this PR.
OK, thanks. From this it appears that the only effect of 'asm volatile' that users can safely rely on is that such an instruction will not be deleted. If this is agreeable to everybody, I will revise the documentation to say that.
Here's another test case that shows that gcc 3.4.2 on i386 will reschedule other instructions (in particular, other asms that are not volatile) across a volatile asm: #include <sys/types.h> #define __fldenv(env) __asm __volatile("fldenv %0" : : "m" (env)) #define __fnstsw(__sw) __asm("fnstsw %0" : "=am" (*(__sw))) #define __ldmxcsr(csr) __asm __volatile("ldmxcsr %0" : : "m" (csr)) #define __stmxcsr(csr) __asm("stmxcsr %0" : "=m" (*(csr))) typedef struct { struct { uint32_t control; uint32_t status; uint32_t tag; char other[16]; } x87; uint32_t mxcsr; } fenv_t; int feraiseexcept(int __excepts); int feupdateenv(const fenv_t *envp) { int mxcsr, status; __fnstsw(&status); __stmxcsr(&mxcsr); __fldenv(envp->x87); /* volatile */ __ldmxcsr(envp->mxcsr); /* volatile */ feraiseexcept((mxcsr | status) & 0x3f); return (0); } The code generated at -O2 is as follows: feupdateenv: pushl %ebp movl %esp, %ebp subl $20, %esp movl 8(%ebp), %eax #APP fnstsw -4(%ebp) stmxcsr -8(%ebp) fldenv (%eax) ldmxcsr 28(%eax) fnstsw %eax <--- should be movl -4(%ebp), %eax #NO_APP orl -8(%ebp), %eax andl $63, %eax pushl %eax call feraiseexcept xorl %eax, %eax leave ret The fnstsw is moved (duplicated, actually) across the fldenv and ldmxcsr, which are volatile. At -O1, gcc gets this "right". My vote is to fix the code rather than the documentation; if volatile asms don't act as sequence points, then in code such as the above, it becomes necessary to apply the sledgehammer of marking *all* asms volatile. This is because there's no way to express to gcc that an asm reads from a hard register such as the FPSR. If the instructions that write the FPSR (already marked volatile in the example) acted as sequence points, we would be guaranteed that instructions that read the FPSR could not be moved past them, without having to make the latter volatile. Note that even if you don't go with my idea, the following proposal isn't quite accurate: > From this it appears that the only effect of 'asm volatile' that users > can safely rely on is that such an instruction will not be deleted. It is also true (hopefully) that volatile asms cannot be reordered with respect to each other, w.r.t. function calls, etc. This is not true of ordinary asms, as seen above.
In reply to comment #20: Again, this is not scheduling, per se. This is register rematerialization. We have a value at some point, and we decide that it's cheaper to move the computation rather than store and reload it. This is really no different than if we decided to CSE the computation as in __fnstsw(&s1); __fldenv(envp->x87); /* volatile */ __fnstsw(&s2); -> __fnstsw(&s1); __fldenv(envp->x87); /* volatile */ s2 = s1; I must repeat myself that the original source code is buggy. You've got asms that affect, or are affected by, architectural state that is not visible to the compiler. As such you REALLY REALLY MUST mark the asm as volatile.
No one suggested anything for clarification, but I don't see a bug here.
Actually I adjusted the doc to my satisfaction in this thread: http://gcc.gnu.org/ml/gcc-patches/2004-10/msg01048.html I suppose it's OK to close now.
(In reply to comment #23) Just to double check, however as may be required, instructions which may be affected by a modifyable control register state may denote that dependance in the instruction's rtl description to prevent the scheduling of any instruction which may modify that state past past it, without needing to denote the control state modifying asm instruction volatile, as long as it's following depandant instructions are themselfs reachable and nessisary?
*** Bug 24165 has been marked as a duplicate of this bug. ***
*** Bug 37631 has been marked as a duplicate of this bug. ***