[PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)

Jakub Jelinek jakub@redhat.com
Thu Jan 15 08:40:00 GMT 2015


On Thu, Jan 15, 2015 at 07:46:18AM +0100, Richard Biener wrote:
> >> That last line means the compiler is free to delete a non-volatile
> >> asm with a memory clobber if that asm is not needed for dataflow.  Or
> >> that is how I read it; it is trying to indicate that if you want to
> >> prevent the memory clobber from being deleted (together with the rest
> >> of the asm), you need to make the asm volatile.
> >>
> >> So as far as I can see the compiler can CSE two identical
> >non-volatile
> >> asms with memory clobber just fine.  Older GCC (I tried 4.7.2) does
> >do
> >> this; current mainline doesn't.  I think it should.
> >No, it should not CSE those two cases.  That's simply wrong and if an 
> >older version did that optimization, that's a bug.
> 
> I think segher has a point here.  If the asm with memory clobber would store to random memory and the point would be to preserve that then the whole distinction with volatile doesn't make much sense (after all without volatile we happily DCE such asm if the regular outputs are not needed).
> 
> This doesn't mean 'memory' is a well-designed thing, of course. Just its
> effects are effectively limited to reads without volatile(?)

Segher's mails talk about "memory" being a write but not read.
If we even can't agree on what non-volatile "memory" means, I think
we should treat it more conservatively, because every user (and there are
lots of people using non-volatile asm with "memory" in the wild) treats it
differently.  Just trying to grep for a few:
glibc:
./sysdeps/alpha/bits/atomic.h:# define atomic_full_barrier()	__asm ("mb" : : : "memory");
./sysdeps/alpha/bits/atomic.h:# define atomic_read_barrier()	__asm ("mb" : : : "memory");
./sysdeps/alpha/bits/atomic.h:# define atomic_write_barrier()	__asm ("wmb" : : : "memory");
./sysdeps/sparc/sparc32/bits/atomic.h:# define atomic_full_barrier() __asm ("" ::: "memory")
./sysdeps/powerpc/powerpc32/bits/atomic.h:# define atomic_read_barrier()	__asm ("lwsync" ::: "memory")
./sysdeps/powerpc/powerpc32/bits/atomic.h:# define atomic_read_barrier()	__asm ("sync" ::: "memory")
./sysdeps/powerpc/powerpc64/bits/atomic.h:#define atomic_read_barrier()	__asm ("lwsync" ::: "memory")
./sysdeps/powerpc/bits/atomic.h:#define atomic_full_barrier()	__asm ("sync" ::: "memory")
./sysdeps/powerpc/bits/atomic.h:#define atomic_write_barrier()	__asm ("eieio" ::: "memory")
./sysdeps/generic/malloc-machine.h:# define atomic_full_barrier() __asm ("" ::: "memory")
./elf/tst-tlsmod3.c:  asm ("" ::: "memory");
./elf/tst-tlsmod4.c:  asm ("" ::: "memory");
./elf/tst-tlsmod1.c:  asm ("" ::: "memory");
./elf/tst-tlsmod2.c:  asm ("" ::: "memory");
./include/atomic.h:# define atomic_full_barrier() __asm ("" ::: "memory")
linux kernel:
./arch/arm/mach-omap2/pm24xx.c:	asm("mcr p15, 0, %0, c7, c0, 4" : : "r" (zero) : "memory", "cc");
./arch/arm/include/asm/irqflags.h:#define local_fiq_enable()  __asm__("cpsie f	@ __stf" : : : "memory", "cc")
./arch/arm/include/asm/irqflags.h:#define local_fiq_disable() __asm__("cpsid f	@ __clf" : : : "memory", "cc")
./arch/x86/include/asm/uaccess_64.h:		asm("":::"memory");
./arch/x86/include/asm/uaccess_64.h:		asm("":::"memory");
./arch/x86/include/asm/segment.h:	asm("mov %%" #seg ",%0":"=r" (value) : : "memory")
./arch/x86/include/asm/stackprotector.h:	asm("mov %0, %%gs" : : "r" (__KERNEL_STACK_CANARY) : "memory");
./arch/arm64/include/asm/irqflags.h:#define local_fiq_enable()	asm("msr	daifclr, #1" : : : "memory")
./arch/arm64/include/asm/irqflags.h:#define local_fiq_disable()	asm("msr	daifset, #1" : : : "memory")
./arch/arm64/include/asm/irqflags.h:#define local_async_enable()	asm("msr	daifclr, #4" : : : "memory")
./arch/arm64/include/asm/irqflags.h:#define local_async_disable()	asm("msr	daifset, #4" : : : "memory")
./arch/tile/lib/memcpy_64.c:				__asm__ ("" : : : "memory");
./arch/tile/lib/memcpy_64.c:				__asm__ ("" : : : "memory");
./arch/tile/include/hv/netio_intf.h:  __asm__("" : : : "memory");
./drivers/net/ethernet/sgi/ioc3-eth.c:	__asm__("sync" ::: "memory")
./lib/sha1.c:  #define setW(x, val) do { W(x) = (val); __asm__("":::"memory"); } while (0)

The glibc barriers are supposedly something that can be CSEd (one barrier instead of
two consecutive barriers is enough), but certainly not moved across any loads/stores
in between.  In the kernel case, the enable/disable probably wouldn't allow even CSE.

So I'm with Jeff that we should treat "memory" at least as unspecified read and write,
and whether we can CSE them if there are no memory loads/stores in between them can
be discussed (most likely the kernel would be safe even in that case, because those
usually don't nest and appear in pairs, or act as barriers (like the glibc case),
or read from segment registers (guess again ok to be CSEd with no intervening loads/stores).

In 4.9 backport I'd prefer not to CSE them at all though, stay conservative.

	Jakub



More information about the Gcc-patches mailing list