[Bug target/39821] 120% slowdown with vectorizer
pinskia at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Mon Jul 26 06:19:30 GMT 2021
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39821
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|tree-optimization |target
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The code generation for aarch64 looks fine:
dotproduct_order4:
.LFB1:
.cfi_startproc
ldr q1, [x0]
ldr q2, [x1]
smull v0.2d, v2.2s, v1.2s
smlal2 v0.2d, v2.4s, v1.4s
addp d0, v0.2d
fmov x0, d0
ret
vect__6.41_18 = MEM <vector(4) int> [(int32_t *)v1_2(D)];
vect__10.44_13 = MEM <vector(4) int> [(int32_t *)v2_3(D)];
vect_patt_25.45_8 = WIDEN_MULT_LO_EXPR <vect__10.44_13, vect__6.41_18>;
vect_patt_25.45_4 = WIDEN_MULT_HI_EXPR <vect__10.44_13, vect__6.41_18>;
vect_accum_14.46_31 = vect_patt_25.45_4 + vect_patt_25.45_8;
_33 = .REDUC_PLUS (vect_accum_14.46_31); [tail call]
---- CUT ----
Even the gimple level for x86_64 looks ok:
vect__6.41_18 = MEM <vector(4) int> [(int32_t *)v1_2(D)];
vect__10.44_13 = MEM <vector(4) int> [(int32_t *)v2_3(D)];
vect_patt_25.45_8 = WIDEN_MULT_LO_EXPR <vect__10.44_13, vect__6.41_18>;
vect_patt_25.45_4 = WIDEN_MULT_HI_EXPR <vect__10.44_13, vect__6.41_18>;
vect_accum_14.46_31 = vect_patt_25.45_4 + vect_patt_25.45_8;
_33 = VEC_PERM_EXPR <vect_accum_14.46_31, { 0, 0 }, { 1, 2 }>;
_34 = vect_accum_14.46_31 + _33;
stmp_accum_14.47_35 = BIT_FIELD_REF <_34, 64, 0>;
But the expansion looks bad.
More information about the Gcc-bugs
mailing list