This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: Jeff Law <law at redhat dot com>, Segher Boessenkool <segher at kernel dot crashing dot org>, Richard Biener <rguenther at suse dot de>, Eric Botcazou <ebotcazou at adacore dot com>, gcc-patches at gcc dot gnu dot org
- Date: Thu, 15 Jan 2015 09:13:30 +0100
- Subject: Re: [PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)
- Authentication-results: sourceware.org; auth=none
- References: <20150113161819 dot GD1405 at tucnak dot redhat dot com> <20150113163840 dot GA4183 at gate dot crashing dot org> <54B575D7 dot 8030107 at redhat dot com> <20150113201322 dot GJ1405 at tucnak dot redhat dot com> <54B59964 dot 7070707 at redhat dot com> <20150114000315 dot GA32710 at gate dot crashing dot org> <54B609A9 dot 9090800 at redhat dot com> <20150114151906 dot GA21784 at gate dot crashing dot org> <54B74AD9 dot 4010905 at redhat dot com> <DE331E6C-39D7-4F70-9E25-A3A4F7F49640 at gmail dot com>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Thu, Jan 15, 2015 at 07:46:18AM +0100, Richard Biener wrote:
> >> That last line means the compiler is free to delete a non-volatile
> >> asm with a memory clobber if that asm is not needed for dataflow. Or
> >> that is how I read it; it is trying to indicate that if you want to
> >> prevent the memory clobber from being deleted (together with the rest
> >> of the asm), you need to make the asm volatile.
> >>
> >> So as far as I can see the compiler can CSE two identical
> >non-volatile
> >> asms with memory clobber just fine. Older GCC (I tried 4.7.2) does
> >do
> >> this; current mainline doesn't. I think it should.
> >No, it should not CSE those two cases. That's simply wrong and if an
> >older version did that optimization, that's a bug.
>
> I think segher has a point here. If the asm with memory clobber would store to random memory and the point would be to preserve that then the whole distinction with volatile doesn't make much sense (after all without volatile we happily DCE such asm if the regular outputs are not needed).
>
> This doesn't mean 'memory' is a well-designed thing, of course. Just its
> effects are effectively limited to reads without volatile(?)
Segher's mails talk about "memory" being a write but not read.
If we even can't agree on what non-volatile "memory" means, I think
we should treat it more conservatively, because every user (and there are
lots of people using non-volatile asm with "memory" in the wild) treats it
differently. Just trying to grep for a few:
glibc:
./sysdeps/alpha/bits/atomic.h:# define atomic_full_barrier() __asm ("mb" : : : "memory");
./sysdeps/alpha/bits/atomic.h:# define atomic_read_barrier() __asm ("mb" : : : "memory");
./sysdeps/alpha/bits/atomic.h:# define atomic_write_barrier() __asm ("wmb" : : : "memory");
./sysdeps/sparc/sparc32/bits/atomic.h:# define atomic_full_barrier() __asm ("" ::: "memory")
./sysdeps/powerpc/powerpc32/bits/atomic.h:# define atomic_read_barrier() __asm ("lwsync" ::: "memory")
./sysdeps/powerpc/powerpc32/bits/atomic.h:# define atomic_read_barrier() __asm ("sync" ::: "memory")
./sysdeps/powerpc/powerpc64/bits/atomic.h:#define atomic_read_barrier() __asm ("lwsync" ::: "memory")
./sysdeps/powerpc/bits/atomic.h:#define atomic_full_barrier() __asm ("sync" ::: "memory")
./sysdeps/powerpc/bits/atomic.h:#define atomic_write_barrier() __asm ("eieio" ::: "memory")
./sysdeps/generic/malloc-machine.h:# define atomic_full_barrier() __asm ("" ::: "memory")
./elf/tst-tlsmod3.c: asm ("" ::: "memory");
./elf/tst-tlsmod4.c: asm ("" ::: "memory");
./elf/tst-tlsmod1.c: asm ("" ::: "memory");
./elf/tst-tlsmod2.c: asm ("" ::: "memory");
./include/atomic.h:# define atomic_full_barrier() __asm ("" ::: "memory")
linux kernel:
./arch/arm/mach-omap2/pm24xx.c: asm("mcr p15, 0, %0, c7, c0, 4" : : "r" (zero) : "memory", "cc");
./arch/arm/include/asm/irqflags.h:#define local_fiq_enable() __asm__("cpsie f @ __stf" : : : "memory", "cc")
./arch/arm/include/asm/irqflags.h:#define local_fiq_disable() __asm__("cpsid f @ __clf" : : : "memory", "cc")
./arch/x86/include/asm/uaccess_64.h: asm("":::"memory");
./arch/x86/include/asm/uaccess_64.h: asm("":::"memory");
./arch/x86/include/asm/segment.h: asm("mov %%" #seg ",%0":"=r" (value) : : "memory")
./arch/x86/include/asm/stackprotector.h: asm("mov %0, %%gs" : : "r" (__KERNEL_STACK_CANARY) : "memory");
./arch/arm64/include/asm/irqflags.h:#define local_fiq_enable() asm("msr daifclr, #1" : : : "memory")
./arch/arm64/include/asm/irqflags.h:#define local_fiq_disable() asm("msr daifset, #1" : : : "memory")
./arch/arm64/include/asm/irqflags.h:#define local_async_enable() asm("msr daifclr, #4" : : : "memory")
./arch/arm64/include/asm/irqflags.h:#define local_async_disable() asm("msr daifset, #4" : : : "memory")
./arch/tile/lib/memcpy_64.c: __asm__ ("" : : : "memory");
./arch/tile/lib/memcpy_64.c: __asm__ ("" : : : "memory");
./arch/tile/include/hv/netio_intf.h: __asm__("" : : : "memory");
./drivers/net/ethernet/sgi/ioc3-eth.c: __asm__("sync" ::: "memory")
./lib/sha1.c: #define setW(x, val) do { W(x) = (val); __asm__("":::"memory"); } while (0)
The glibc barriers are supposedly something that can be CSEd (one barrier instead of
two consecutive barriers is enough), but certainly not moved across any loads/stores
in between. In the kernel case, the enable/disable probably wouldn't allow even CSE.
So I'm with Jeff that we should treat "memory" at least as unspecified read and write,
and whether we can CSE them if there are no memory loads/stores in between them can
be discussed (most likely the kernel would be safe even in that case, because those
usually don't nest and appear in pairs, or act as barriers (like the glibc case),
or read from segment registers (guess again ok to be CSEd with no intervening loads/stores).
In 4.9 backport I'd prefer not to CSE them at all though, stay conservative.
Jakub