This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug rtl-optimization/79357] Doubling a single complex float gives inefficient code

From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Mon, 06 Feb 2017 11:52:52 +0000
Subject: [Bug rtl-optimization/79357] Doubling a single complex float gives inefficient code
Auto-submitted: auto-generated
References: <bug-79357-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79357

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-*
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2017-02-06
          Component|c                           |rtl-optimization
     Ever confirmed|0                           |1

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  We expand from

f (complex float x)
{
  float _2;
  float _3;
  float _4;
  float _5;
  complex float _6;

  <bb 2>:
  _2 = REALPART_EXPR <x_1(D)>;
  _3 = _2 * 2.0e+0;
  _4 = IMAGPART_EXPR <x_1(D)>;
  _5 = _4 * 2.0e+0;
  _6 = COMPLEX_EXPR <_3, _5>;
  return _6;

and this is another case where the ABI is not exposed and thus we can't really
do better on trees.

With _Complex double we manage to optimize it to

f:
.LFB0:
        .cfi_startproc
        vaddsd  %xmm1, %xmm1, %xmm1
        vaddsd  %xmm0, %xmm0, %xmm0
        ret

because there the ABI says real/imagpart are passed in different registers.

Maybe we can teach RTL forwprop about this case(s)... (I teached it somewhat
similar cases on power):

(note 9 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 2 9 3 2 (set (reg:DI 97 [ x ])
        (reg:DI 21 xmm0 [ x ])) "t.c":2 81 {*movdi_internal}
     (expr_list:REG_DEAD (reg:DI 21 xmm0 [ x ])
        (nil)))
(insn 3 2 4 2 (set (mem/c:DI (plus:DI (reg/f:DI 20 frame)
                (const_int -8 [0xfffffffffffffff8])) [0  S8 A64])
        (reg:DI 97 [ x ])) "t.c":2 81 {*movdi_internal}
     (expr_list:REG_DEAD (reg:DI 97 [ x ])
        (nil)))
(insn 4 3 5 2 (set (reg:SF 95)
        (mem/c:SF (plus:DI (reg/f:DI 20 frame)
                (const_int -8 [0xfffffffffffffff8])) [0  S4 A64])) "t.c":2 125
{*movsf_internal}
     (nil))
(insn 5 4 8 2 (set (reg:SF 96)
        (mem/c:SF (plus:DI (reg/f:DI 20 frame)
                (const_int -4 [0xfffffffffffffffc])) [0  S4 A32])) "t.c":2 125
{*movsf_internal}
     (nil))
(note 8 5 11 2 NOTE_INSN_FUNCTION_BEG)

OTOH this is already a blocker -- we expand the initial real/imagpart extracts
to loads from the stack after spilling the complex arg...  initial RTL:

(note 9 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 2 9 3 2 (set (reg:DI 97)
        (reg:DI 21 xmm0 [ x ])) "t.c":2 -1
     (nil))
(insn 3 2 4 2 (set (mem/c:DI (plus:DI (reg/f:DI 82 virtual-stack-vars)
                (const_int -8 [0xfffffffffffffff8])) [0  S8 A64])
        (reg:DI 97)) "t.c":2 -1
     (nil))
(insn 4 3 5 2 (set (reg:SF 95)
        (mem/c:SF (plus:DI (reg/f:DI 82 virtual-stack-vars)
                (const_int -8 [0xfffffffffffffff8])) [0  S4 A64])) "t.c":2 -1
     (nil))
(insn 5 4 6 2 (set (reg:SF 96)
        (mem/c:SF (plus:DI (reg/f:DI 82 virtual-stack-vars)
                (const_int -4 [0xfffffffffffffffc])) [0  S4 A32])) "t.c":2 -1
     (nil))
(insn 6 5 7 2 (set (reg/v:SF 93 [ x ])
        (reg:SF 95)) "t.c":2 -1
     (nil))
(insn 7 6 8 2 (set (reg/v:SF 94 [ x+4 ])
        (reg:SF 96)) "t.c":2 -1
     (nil))
(note 8 7 11 2 NOTE_INSN_FUNCTION_BEG)


handling the initial RTL for the return value will be equally "interesting":

(insn 18 14 19 2 (set (reg:SF 100)
        (reg:SF 91 [ <retval> ])) "t.c":4 -1
     (nil))
(insn 19 18 20 2 (set (reg:SF 101)
        (reg:SF 92 [ <retval>+4 ])) "t.c":4 -1
     (nil))
(insn 20 19 21 2 (set (mem/c:SF (plus:DI (reg/f:DI 82 virtual-stack-vars)
                (const_int -16 [0xfffffffffffffff0])) [0  S4 A32])
        (reg:SF 100)) "t.c":4 -1
     (nil))
(insn 21 20 22 2 (set (mem/c:SF (plus:DI (reg/f:DI 82 virtual-stack-vars)
                (const_int -12 [0xfffffffffffffff4])) [0  S4 A32])
        (reg:SF 101)) "t.c":4 -1
     (nil))
(insn 22 21 23 2 (set (reg:DI 21 xmm0)
        (mem/c:DI (plus:DI (reg/f:DI 82 virtual-stack-vars)
                (const_int -16 [0xfffffffffffffff0])) [0  S8 A32])) "t.c":4 -1
     (nil))
(insn 23 22 0 2 (use (reg:DI 21 xmm0)) "t.c":4 -1
     (nil))

References:
- [Bug c/79357] New: Doubling a single complex float is not vectorised
  - From: drraph at gmail dot com

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]