Created attachment 36812 [details] A reproducer For the attached reproducer compiled with g++ -mavx -Ofast we do not use IA sqrt builtin since r230492 thus emitting more insns. r230491 .L8: vmovaps (%r14,%rax), %ymm0 addl $1, %r12d vmovups 0(%r13,%rax), %xmm1 vinsertf128 $0x1, 16(%r13,%rax), %ymm1, %ymm1 vmulps %ymm1, %ymm1, %ymm1 vmulps %ymm0, %ymm0, %ymm0 vaddps %ymm1, %ymm0, %ymm1 vrsqrtps %ymm1, %ymm2 vmulps %ymm1, %ymm2, %ymm0 vmulps %ymm2, %ymm0, %ymm0 vaddps %ymm4, %ymm0, %ymm0 vmulps %ymm3, %ymm2, %ymm2 vmulps %ymm2, %ymm0, %ymm0 vmovups %xmm0, (%r10,%rax) vextractf128 $0x1, %ymm0, 16(%r10,%rax) addq $32, %rax cmpl %r12d, %r9d ja .L8 r230492 .L8: .L8: vmovaps (%r14,%rax), %ymm0 addl $1, %r12d vmovups 0(%r13,%rax), %xmm1 vinsertf128 $0x1, 16(%r13,%rax), %ymm1, %ymm1 vmulps %ymm1, %ymm1, %ymm1 vmulps %ymm0, %ymm0, %ymm0 vaddps %ymm1, %ymm0, %ymm1 vcmpneqps %ymm1, %ymm2, %ymm5 vrsqrtps %ymm1, %ymm0 vandps %ymm5, %ymm0, %ymm0 vmulps %ymm1, %ymm0, %ymm1 vmulps %ymm0, %ymm1, %ymm0 vaddps %ymm4, %ymm0, %ymm0 vmulps %ymm3, %ymm1, %ymm1 vmulps %ymm1, %ymm0, %ymm0 vrcpps %ymm0, %ymm1 vmulps %ymm0, %ymm1, %ymm0 vmulps %ymm0, %ymm1, %ymm0 vaddps %ymm1, %ymm1, %ymm1 vsubps %ymm0, %ymm1, %ymm0 vmovups %xmm0, (%r10,%rax) vextractf128 $0x1, %ymm0, 16(%r10,%rax) addq $32, %rax cmpl %r12d, %r9d ja .L8
Yep, I also saw this. IIRC the recip pass is responsible for this.
Created attachment 36858 [details] gcc6-pr68501.patch Untested fix. The problem is that the vector SQRT is now an internal call, and in that case targetm.builtin_reciprocal is not called at all.
Author: jakub Date: Mon Nov 30 14:56:08 2015 New Revision: 231075 URL: https://gcc.gnu.org/viewcvs?rev=231075&root=gcc&view=rev Log: PR tree-optimization/68501 * target.def (builtin_reciprocal): Replace the 3 arguments with a gcall * one, adjust description. * targhooks.h (default_builtin_reciprocal): Replace the 3 arguments with a gcall * one. * targhooks.c (default_builtin_reciprocal): Likewise. * tree-ssa-math-opts.c (pass_cse_reciprocals::execute): Use targetm.builtin_reciprocal even on internal functions, adjust the arguments and allow replacing an internal function with normal built-in. * config/i386/i386.c (ix86_builtin_reciprocal): Replace the 3 arguments with a gcall * one. Handle internal fns too. * config/rs6000/rs6000.c (rs6000_builtin_reciprocal): Likewise. * config/aarch64/aarch64.c (aarch64_builtin_reciprocal): Likewise. * doc/tm.texi (builtin_reciprocal): Document. Modified: trunk/gcc/ChangeLog trunk/gcc/config/aarch64/aarch64.c trunk/gcc/config/i386/i386.c trunk/gcc/config/rs6000/rs6000.c trunk/gcc/doc/tm.texi trunk/gcc/target.def trunk/gcc/targhooks.c trunk/gcc/targhooks.h trunk/gcc/tree-ssa-math-opts.c
Hopefully fixed for i?86/x86_64/rs6000. On aarch64 I haven't wired this in the builtin_reciprocal function, leaving that to aarch64 maintainers how they want to handle it.
*** Bug 68526 has been marked as a duplicate of this bug. ***