[Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Tue Dec 7 08:22:56 GMT 2021
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756
--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> ---
Good:
<bb 2> [local count: 1073741824]:
_1 = *m_12(D);
_14 = VEC_PERM_EXPR <v_13(D), v_13(D), { 0, 0, 0, 0 }>;
_2 = _1 * _14;
_3 = MEM[(__v4sf *)m_12(D) + 16B];
_15 = VEC_PERM_EXPR <v_13(D), v_13(D), { 1, 1, 1, 1 }>;
_4 = _3 * _15;
_5 = _2 + _4;
_6 = MEM[(__v4sf *)m_12(D) + 32B];
_16 = VEC_PERM_EXPR <v_13(D), v_13(D), { 2, 2, 2, 2 }>;
_7 = _6 * _16;
_8 = _5 + _7;
_9 = MEM[(__v4sf *)m_12(D) + 48B];
_17 = VEC_PERM_EXPR <v_13(D), v_13(D), { 3, 3, 3, 3 }>;
_10 = _9 * _17;
_18 = _8 + _10;
return _18;
Bad:
<bb 2> [local count: 1073741824]:
_1 = *m_12(D);
_30 = BIT_FIELD_REF <v_13(D), 32, 0>;
v_28 = BIT_INSERT_EXPR <v_27(D), _30, 0>;
_29 = VEC_PERM_EXPR <v_28, v_28, { 0, 0, 0, 0 }>;
_2 = _1 * _29;
_3 = MEM[(__v4sf *)m_12(D) + 16B];
_26 = BIT_FIELD_REF <v_13(D), 32, 32>;
v_24 = BIT_INSERT_EXPR <v_23(D), _26, 0>;
_25 = VEC_PERM_EXPR <v_24, v_24, { 0, 0, 0, 0 }>;
_4 = _3 * _25;
_5 = _2 + _4;
_6 = MEM[(__v4sf *)m_12(D) + 32B];
_14 = BIT_FIELD_REF <v_13(D), 32, 64>;
v_16 = BIT_INSERT_EXPR <v_17(D), _14, 0>;
_15 = VEC_PERM_EXPR <v_16, v_16, { 0, 0, 0, 0 }>;
_7 = _6 * _15;
_8 = _5 + _7;
_9 = MEM[(__v4sf *)m_12(D) + 48B];
_18 = BIT_FIELD_REF <v_13(D), 32, 96>;
v_20 = BIT_INSERT_EXPR <v_21(D), _18, 0>;
_19 = VEC_PERM_EXPR <v_20, v_20, { 0, 0, 0, 0 }>;
_10 = _9 * _19;
_22 = _8 + _10;
return _22;
So what's missing is converting the extract element, insert at 0 & splat
into splat element N.
_30 = BIT_FIELD_REF <v_13(D), 32, 0>;
v_28 = BIT_INSERT_EXPR <v_27(D), _30, 0>;
_29 = VEC_PERM_EXPR <v_28, v_28, { 0, 0, 0, 0 }>;
Shows a missing no-op (insert into default-def at 0 from extract from same
position can simply return the vector we extract from).
_26 = BIT_FIELD_REF <v_13(D), 32, 32>;
v_24 = BIT_INSERT_EXPR <v_23(D), _26, 0>;
_25 = VEC_PERM_EXPR <v_24, v_24, { 0, 0, 0, 0 }>;
is a bit more complicated - the VEC_PERM_EXPR indices should be modified
based on the fact we only pick the just inserted elements and those
were extracted from another (compatible) vector.
More information about the Gcc-bugs
mailing list