This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/79357] Doubling a single complex float gives inefficient code
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 06 Feb 2017 11:52:52 +0000
- Subject: [Bug rtl-optimization/79357] Doubling a single complex float gives inefficient code
- Auto-submitted: auto-generated
- References: <bug-79357-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79357
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |x86_64-*-*
Status|UNCONFIRMED |NEW
Last reconfirmed| |2017-02-06
Component|c |rtl-optimization
Ever confirmed|0 |1
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed. We expand from
f (complex float x)
{
float _2;
float _3;
float _4;
float _5;
complex float _6;
<bb 2>:
_2 = REALPART_EXPR <x_1(D)>;
_3 = _2 * 2.0e+0;
_4 = IMAGPART_EXPR <x_1(D)>;
_5 = _4 * 2.0e+0;
_6 = COMPLEX_EXPR <_3, _5>;
return _6;
and this is another case where the ABI is not exposed and thus we can't really
do better on trees.
With _Complex double we manage to optimize it to
f:
.LFB0:
.cfi_startproc
vaddsd %xmm1, %xmm1, %xmm1
vaddsd %xmm0, %xmm0, %xmm0
ret
because there the ABI says real/imagpart are passed in different registers.
Maybe we can teach RTL forwprop about this case(s)... (I teached it somewhat
similar cases on power):
(note 9 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 2 9 3 2 (set (reg:DI 97 [ x ])
(reg:DI 21 xmm0 [ x ])) "t.c":2 81 {*movdi_internal}
(expr_list:REG_DEAD (reg:DI 21 xmm0 [ x ])
(nil)))
(insn 3 2 4 2 (set (mem/c:DI (plus:DI (reg/f:DI 20 frame)
(const_int -8 [0xfffffffffffffff8])) [0 S8 A64])
(reg:DI 97 [ x ])) "t.c":2 81 {*movdi_internal}
(expr_list:REG_DEAD (reg:DI 97 [ x ])
(nil)))
(insn 4 3 5 2 (set (reg:SF 95)
(mem/c:SF (plus:DI (reg/f:DI 20 frame)
(const_int -8 [0xfffffffffffffff8])) [0 S4 A64])) "t.c":2 125
{*movsf_internal}
(nil))
(insn 5 4 8 2 (set (reg:SF 96)
(mem/c:SF (plus:DI (reg/f:DI 20 frame)
(const_int -4 [0xfffffffffffffffc])) [0 S4 A32])) "t.c":2 125
{*movsf_internal}
(nil))
(note 8 5 11 2 NOTE_INSN_FUNCTION_BEG)
OTOH this is already a blocker -- we expand the initial real/imagpart extracts
to loads from the stack after spilling the complex arg... initial RTL:
(note 9 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 2 9 3 2 (set (reg:DI 97)
(reg:DI 21 xmm0 [ x ])) "t.c":2 -1
(nil))
(insn 3 2 4 2 (set (mem/c:DI (plus:DI (reg/f:DI 82 virtual-stack-vars)
(const_int -8 [0xfffffffffffffff8])) [0 S8 A64])
(reg:DI 97)) "t.c":2 -1
(nil))
(insn 4 3 5 2 (set (reg:SF 95)
(mem/c:SF (plus:DI (reg/f:DI 82 virtual-stack-vars)
(const_int -8 [0xfffffffffffffff8])) [0 S4 A64])) "t.c":2 -1
(nil))
(insn 5 4 6 2 (set (reg:SF 96)
(mem/c:SF (plus:DI (reg/f:DI 82 virtual-stack-vars)
(const_int -4 [0xfffffffffffffffc])) [0 S4 A32])) "t.c":2 -1
(nil))
(insn 6 5 7 2 (set (reg/v:SF 93 [ x ])
(reg:SF 95)) "t.c":2 -1
(nil))
(insn 7 6 8 2 (set (reg/v:SF 94 [ x+4 ])
(reg:SF 96)) "t.c":2 -1
(nil))
(note 8 7 11 2 NOTE_INSN_FUNCTION_BEG)
handling the initial RTL for the return value will be equally "interesting":
(insn 18 14 19 2 (set (reg:SF 100)
(reg:SF 91 [ <retval> ])) "t.c":4 -1
(nil))
(insn 19 18 20 2 (set (reg:SF 101)
(reg:SF 92 [ <retval>+4 ])) "t.c":4 -1
(nil))
(insn 20 19 21 2 (set (mem/c:SF (plus:DI (reg/f:DI 82 virtual-stack-vars)
(const_int -16 [0xfffffffffffffff0])) [0 S4 A32])
(reg:SF 100)) "t.c":4 -1
(nil))
(insn 21 20 22 2 (set (mem/c:SF (plus:DI (reg/f:DI 82 virtual-stack-vars)
(const_int -12 [0xfffffffffffffff4])) [0 S4 A32])
(reg:SF 101)) "t.c":4 -1
(nil))
(insn 22 21 23 2 (set (reg:DI 21 xmm0)
(mem/c:DI (plus:DI (reg/f:DI 82 virtual-stack-vars)
(const_int -16 [0xfffffffffffffff0])) [0 S8 A32])) "t.c":4 -1
(nil))
(insn 23 22 0 2 (use (reg:DI 21 xmm0)) "t.c":4 -1
(nil))