Consider the following code: #include <complex.h> complex float f(complex float x) { return 2*x; } gcc with -O3 -march=core-avx2 gives: f: vmovq QWORD PTR [rsp-8], xmm0 vmovss xmm1, DWORD PTR [rsp-8] vmovss xmm0, DWORD PTR [rsp-4] vaddss xmm1, xmm1, xmm1 vaddss xmm0, xmm0, xmm0 vmovss DWORD PTR [rsp-16], xmm1 vmovss DWORD PTR [rsp-12], xmm0 vmovq xmm0, QWORD PTR [rsp-16] ret Better and still suitable without -ffast-math would be: f: vmulps xmm0, xmm0, XMMWORD PTR .L_2il0floatpacket.1[rip] #3.12 ret
I omitted .L_2il0floatpacket.1: .long 0x40000000,0x40000000,0x00000000,0x00000000 .long 0x40000000,0x00000000 from the assembly. In other words it just multiplies by 2.0f.
Confirmed. We expand from f (complex float x) { float _2; float _3; float _4; float _5; complex float _6; <bb 2>: _2 = REALPART_EXPR <x_1(D)>; _3 = _2 * 2.0e+0; _4 = IMAGPART_EXPR <x_1(D)>; _5 = _4 * 2.0e+0; _6 = COMPLEX_EXPR <_3, _5>; return _6; and this is another case where the ABI is not exposed and thus we can't really do better on trees. With _Complex double we manage to optimize it to f: .LFB0: .cfi_startproc vaddsd %xmm1, %xmm1, %xmm1 vaddsd %xmm0, %xmm0, %xmm0 ret because there the ABI says real/imagpart are passed in different registers. Maybe we can teach RTL forwprop about this case(s)... (I teached it somewhat similar cases on power): (note 9 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (insn 2 9 3 2 (set (reg:DI 97 [ x ]) (reg:DI 21 xmm0 [ x ])) "t.c":2 81 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 21 xmm0 [ x ]) (nil))) (insn 3 2 4 2 (set (mem/c:DI (plus:DI (reg/f:DI 20 frame) (const_int -8 [0xfffffffffffffff8])) [0 S8 A64]) (reg:DI 97 [ x ])) "t.c":2 81 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 97 [ x ]) (nil))) (insn 4 3 5 2 (set (reg:SF 95) (mem/c:SF (plus:DI (reg/f:DI 20 frame) (const_int -8 [0xfffffffffffffff8])) [0 S4 A64])) "t.c":2 125 {*movsf_internal} (nil)) (insn 5 4 8 2 (set (reg:SF 96) (mem/c:SF (plus:DI (reg/f:DI 20 frame) (const_int -4 [0xfffffffffffffffc])) [0 S4 A32])) "t.c":2 125 {*movsf_internal} (nil)) (note 8 5 11 2 NOTE_INSN_FUNCTION_BEG) OTOH this is already a blocker -- we expand the initial real/imagpart extracts to loads from the stack after spilling the complex arg... initial RTL: (note 9 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (insn 2 9 3 2 (set (reg:DI 97) (reg:DI 21 xmm0 [ x ])) "t.c":2 -1 (nil)) (insn 3 2 4 2 (set (mem/c:DI (plus:DI (reg/f:DI 82 virtual-stack-vars) (const_int -8 [0xfffffffffffffff8])) [0 S8 A64]) (reg:DI 97)) "t.c":2 -1 (nil)) (insn 4 3 5 2 (set (reg:SF 95) (mem/c:SF (plus:DI (reg/f:DI 82 virtual-stack-vars) (const_int -8 [0xfffffffffffffff8])) [0 S4 A64])) "t.c":2 -1 (nil)) (insn 5 4 6 2 (set (reg:SF 96) (mem/c:SF (plus:DI (reg/f:DI 82 virtual-stack-vars) (const_int -4 [0xfffffffffffffffc])) [0 S4 A32])) "t.c":2 -1 (nil)) (insn 6 5 7 2 (set (reg/v:SF 93 [ x ]) (reg:SF 95)) "t.c":2 -1 (nil)) (insn 7 6 8 2 (set (reg/v:SF 94 [ x+4 ]) (reg:SF 96)) "t.c":2 -1 (nil)) (note 8 7 11 2 NOTE_INSN_FUNCTION_BEG) handling the initial RTL for the return value will be equally "interesting": (insn 18 14 19 2 (set (reg:SF 100) (reg:SF 91 [ <retval> ])) "t.c":4 -1 (nil)) (insn 19 18 20 2 (set (reg:SF 101) (reg:SF 92 [ <retval>+4 ])) "t.c":4 -1 (nil)) (insn 20 19 21 2 (set (mem/c:SF (plus:DI (reg/f:DI 82 virtual-stack-vars) (const_int -16 [0xfffffffffffffff0])) [0 S4 A32]) (reg:SF 100)) "t.c":4 -1 (nil)) (insn 21 20 22 2 (set (mem/c:SF (plus:DI (reg/f:DI 82 virtual-stack-vars) (const_int -12 [0xfffffffffffffff4])) [0 S4 A32]) (reg:SF 101)) "t.c":4 -1 (nil)) (insn 22 21 23 2 (set (reg:DI 21 xmm0) (mem/c:DI (plus:DI (reg/f:DI 82 virtual-stack-vars) (const_int -16 [0xfffffffffffffff0])) [0 S8 A32])) "t.c":4 -1 (nil)) (insn 23 22 0 2 (use (reg:DI 21 xmm0)) "t.c":4 -1 (nil))
Note that we do not generate better code for typedef float vec __attribute__((vector_size(8))); vec g(vec x){return 2*x;} (we don't consider larger vector modes when lowering/expanding vector operations, there is already a PR about that)
(In reply to Marc Glisse from comment #3) > Note that we do not generate better code for > typedef float vec __attribute__((vector_size(8))); > vec g(vec x){return 2*x;} > (we don't consider larger vector modes when lowering/expanding vector > operations, there is already a PR about that) The testcase in comment #3 was fixed for GCC 11+. The original testcase still has issues has SLP does not happen but then we are to another problem. See: #define complex _Complex typedef float vec __attribute__((vector_size(8))); complex float f(complex float x) { vec t = *(vec*)&x; t= (2*t); return *(complex float*)&t; } Note I think there is a dup for the above issue too.