I'm not 100% sure this is a bug; it seems like one. A non-volatile asm passes a volatile asm. Intuitively, it seems a volatile asm should be a pretty "heavy" barrier. As a workaround, declaring both volatile does limit code motion, but also limits the optimizer more than just keeping the relative order of the two volatiles. Happens with this compiler: $ g++ -v Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,java,f95,objc,ada,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.0 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --enable-java-awt=gtk-default --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-4.0-1.4.2.0/jre --enable-mpfr --disable-werror --with-tune=pentium4 --enable-checking=release i486-linux-gnu Thread model: posix gcc version 4.0.3 (Ubuntu 4.0.3-1ubuntu5) also happens with (replacing "add" with "addq" may be needed): $g++ -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: /build/configure --prefix=/usr/local/build --target=x86_64-unknown-linux-gnu --disable-nls --enable-threads=posix --enable-symvers=gnu --enable-__cxa_atexit --enable-c99 --enable-long-long --build=i686-host_pc-linux-gnu --host=i686-host_pc-linux-gnu --disable-multilib --enable-shared=libgcc,libmudflap,libssp,libstdc++ --enable-languages=c,c++,fortran --with-sysroot=/usr/grte/v1 --with-root-prefix=/usr/grte/v1 --with-native-system-header-dir=/include --with-local-prefix=/ Thread model: posix gcc version 4.2.2 Build using Makefile: CXX = g++ CXXFLAGS = -Wall -O3 all: bug.dis bug.E bug: bug.cc Makefile $(CXX) $(CXXFLAGS) -o bug bug.cc bug.E: bug.cc Makefile $(CXX) $(CXXFLAGS) -E -o bug.E bug.cc bug.dis: bug objdump --disassemble bug > bug.dis Input program (this is not the CPP output, but the input program does not use CPP and this has a potentially-useful comment): // // The following program compiled with g++ -O3 produces the code shown below. // Note the add of %fs appears in the source between calls to Now() but appears // in the object code before the first call to Now(). The asm with %fs is not // not itself volatile so it may be moved with respect to other code (the // original code is more complicated), but it seems surprising it passes the // volatile asm in Now(). // // 8048397: 64 03 05 00 00 00 00 add %fs:0x0,%eax // 804839e: 89 45 f0 mov %eax,-0x10(%ebp) // 80483a1: e8 aa ff ff ff call 8048350 <_Z3Nowv> // 80483a6: 89 c3 mov %eax,%ebx // 80483a8: 89 d6 mov %edx,%esi // 80483aa: 90 nop // 80483ab: 8b 45 f0 mov -0x10(%ebp),%eax // 80483ae: 8b 38 mov (%eax),%edi // 80483b0: 90 nop // 80483b1: e8 9a ff ff ff call 8048350 <_Z3Nowv> // static inline int *XX() { long long int offset = 64; int *val; asm /*not volatile*/ ("add %%fs:0, %0" : "=r"(val) : "0"(offset)); return val; } const int kCallsPerTrial = 30; typedef long long Tsc; __attribute__((__noinline__)) Tsc Now() { unsigned int eax_lo, edx_hi; asm volatile("rdtsc" : "=a" (eax_lo), "=d" (edx_hi)); Tsc now = ((Tsc)eax_lo) | ((Tsc)(edx_hi) << 32); return now; } int g_sink; bool RunTest(Tsc *tsc, int n) { int val; for (int i = 0; i < n; ++i) { Tsc start = Now(); asm volatile("nop" ::: "memory"); val = *XX(); asm volatile("nop" ::: "memory"); Tsc stop = Now(); g_sink = val; *tsc++ = start; *tsc++ = stop; } return true; } int main(int argc, char **argv) { Tsc tsc[2 * kCallsPerTrial]; RunTest(tsc, kCallsPerTrial); }
A volatile asm is not a full barrier. Please read http://gcc.gnu.org/onlinedocs/gcc-4.3.2/gcc/Extended-Asm.html. " Note that even a volatile asm instruction can be moved relative to other code, including across jump instructions. For example, on many targets there is a system register which can be set to control the rounding mode of floating point operations. " This came about with the fix for PR 17884. *** This bug has been marked as a duplicate of 17884 ***
How can I prevent relative motion? I tried adding a "memory" constraint to all asms, but they are still moved past each other. I expected any common constraint would keep them from crossing. (Adding "volatile" to all asms does prevent relative motion but inhibits other optimizations so is undesirable.)