Bug 68501 - [6 Regression] sqrt builtin is not used anymore
Summary: [6 Regression] sqrt builtin is not used anymore
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 6.0
: P3 normal
Target Milestone: 6.0
Assignee: Not yet assigned to anyone
URL:
Keywords:
: 68526 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-11-23 14:43 UTC by Alexander Fomin
Modified: 2015-12-01 15:24 UTC (History)
8 users (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2015-11-23 00:00:00


Attachments
A reproducer (138 bytes, text/x-csrc)
2015-11-23 14:43 UTC, Alexander Fomin
Details
gcc6-pr68501.patch (2.74 KB, patch)
2015-11-27 16:37 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Fomin 2015-11-23 14:43:11 UTC
Created attachment 36812 [details]
A reproducer

For the attached reproducer compiled with g++ -mavx -Ofast we do not use IA sqrt builtin since r230492 thus emitting more insns.

r230491
.L8:
    vmovaps (%r14,%rax), %ymm0
    addl    $1, %r12d
    vmovups 0(%r13,%rax), %xmm1
    vinsertf128 $0x1, 16(%r13,%rax), %ymm1, %ymm1
    vmulps  %ymm1, %ymm1, %ymm1
    vmulps  %ymm0, %ymm0, %ymm0
    vaddps  %ymm1, %ymm0, %ymm1
    vrsqrtps    %ymm1, %ymm2
    vmulps  %ymm1, %ymm2, %ymm0
    vmulps  %ymm2, %ymm0, %ymm0
    vaddps  %ymm4, %ymm0, %ymm0
    vmulps  %ymm3, %ymm2, %ymm2
    vmulps  %ymm2, %ymm0, %ymm0
    vmovups %xmm0, (%r10,%rax)
    vextractf128    $0x1, %ymm0, 16(%r10,%rax)
    addq    $32, %rax
    cmpl    %r12d, %r9d
    ja  .L8 

r230492
.L8:
.L8:
    vmovaps (%r14,%rax), %ymm0
    addl    $1, %r12d
    vmovups 0(%r13,%rax), %xmm1
    vinsertf128 $0x1, 16(%r13,%rax), %ymm1, %ymm1
    vmulps  %ymm1, %ymm1, %ymm1
    vmulps  %ymm0, %ymm0, %ymm0
    vaddps  %ymm1, %ymm0, %ymm1
    vcmpneqps   %ymm1, %ymm2, %ymm5
    vrsqrtps    %ymm1, %ymm0
    vandps  %ymm5, %ymm0, %ymm0
    vmulps  %ymm1, %ymm0, %ymm1
    vmulps  %ymm0, %ymm1, %ymm0
    vaddps  %ymm4, %ymm0, %ymm0
    vmulps  %ymm3, %ymm1, %ymm1
    vmulps  %ymm1, %ymm0, %ymm0
    vrcpps  %ymm0, %ymm1
    vmulps  %ymm0, %ymm1, %ymm0
    vmulps  %ymm0, %ymm1, %ymm0
    vaddps  %ymm1, %ymm1, %ymm1
    vsubps  %ymm0, %ymm1, %ymm0
    vmovups %xmm0, (%r10,%rax)
    vextractf128    $0x1, %ymm0, 16(%r10,%rax)
    addq    $32, %rax
    cmpl    %r12d, %r9d
    ja  .L8
Comment 1 Richard Biener 2015-11-23 15:26:57 UTC
Yep, I also saw this.  IIRC the recip pass is responsible for this.
Comment 2 Jakub Jelinek 2015-11-27 16:37:44 UTC
Created attachment 36858 [details]
gcc6-pr68501.patch

Untested fix.  The problem is that the vector SQRT is now an internal call, and in that case targetm.builtin_reciprocal is not called at all.
Comment 3 Jakub Jelinek 2015-11-30 14:56:40 UTC
Author: jakub
Date: Mon Nov 30 14:56:08 2015
New Revision: 231075

URL: https://gcc.gnu.org/viewcvs?rev=231075&root=gcc&view=rev
Log:
	PR tree-optimization/68501
	* target.def (builtin_reciprocal): Replace the 3 arguments with
	a gcall * one, adjust description.
	* targhooks.h (default_builtin_reciprocal): Replace the 3 arguments
	with a gcall * one.
	* targhooks.c (default_builtin_reciprocal): Likewise.
	* tree-ssa-math-opts.c (pass_cse_reciprocals::execute): Use
	targetm.builtin_reciprocal even on internal functions, adjust
	the arguments and allow replacing an internal function with normal
	built-in.
	* config/i386/i386.c (ix86_builtin_reciprocal): Replace the 3 arguments
	with a gcall * one.  Handle internal fns too.
	* config/rs6000/rs6000.c (rs6000_builtin_reciprocal): Likewise.
	* config/aarch64/aarch64.c (aarch64_builtin_reciprocal): Likewise.
	* doc/tm.texi (builtin_reciprocal): Document.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/aarch64/aarch64.c
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/rs6000/rs6000.c
    trunk/gcc/doc/tm.texi
    trunk/gcc/target.def
    trunk/gcc/targhooks.c
    trunk/gcc/targhooks.h
    trunk/gcc/tree-ssa-math-opts.c
Comment 4 Jakub Jelinek 2015-11-30 15:02:04 UTC
Hopefully fixed for i?86/x86_64/rs6000.  On aarch64 I haven't wired this in the builtin_reciprocal function, leaving that to aarch64 maintainers how they want to handle it.
Comment 5 Pat Haugen 2015-12-01 15:24:06 UTC
*** Bug 68526 has been marked as a duplicate of this bug. ***