This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/68956] [6 regression] Vectorizer miscompilation of 416.gamess
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 17 Dec 2015 13:20:30 +0000
- Subject: [Bug tree-optimization/68956] [6 regression] Vectorizer miscompilation of 416.gamess
- Auto-submitted: auto-generated
- References: <bug-68956-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68956
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |NEW
CC| |rguenth at gcc dot gnu.org
Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot gnu.org
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
t.f:13:0: note: loop vectorized
is the offending vectorization. We if-convert with masked-loads:
<bb 7>:
# i_1 = PHI <1(6), i_37(8)>
# ij_3 = PHI <ij_2(6), ij_25(8)>
ij_25 = ij_3 + 1;
ic_26 = i_1 <= _39;
_27 = jc_24 & ic_26;
_54 = &*in1_28(D)[ij_3];
_ifc__55 = _27;
_29 = MASK_LOAD (_54, 64B, _ifc__55);
_56 = &*in2_30(D)[ij_3];
_31 = MASK_LOAD (_56, 64B, _ifc__55);
_32 = _29 + _31;
sum_33 = (real(kind=4)) _32;
_43 = (real(kind=8)) sum_33;
prephitmp_41 = _27 ? _43 : 0.0;
*out_35(D)[ij_3] = prephitmp_41;
i_37 = i_1 + 1;
if (i_1 == j_5)
goto <bb 14>;
but to me there is nothing obviously wrong with .optimized:
<bb 7>:
# vect_vec_iv_.20_99 = PHI <{ 1, 2, 3, 4, 5, 6, 7, 8 }(6),
vect_vec_iv_.20_100(7)>
# ivtmp.51_57 = PHI <0(6), ivtmp.51_15(7)>
# ivtmp.52_18 = PHI <ivtmp.52_13(6), ivtmp.52_42(7)>
# ivtmp.55_45 = PHI <ivtmp.55_50(6), ivtmp.55_16(7)>
# ivtmp.57_51 = PHI <ivtmp.57_34(6), ivtmp.57_52(7)>
vectp.30_122 = (vector(4) real(kind=8) *) ivtmp.55_45;
vectp.26_112 = (vector(4) real(kind=8) *) ivtmp.52_18;
vect_vec_iv_.20_100 = vect_vec_iv_.20_99 + { 8, 8, 8, 8, 8, 8, 8, 8 };
mask_ic_26.21_102 = vect_vec_iv_.20_99 <= vect_cst__101;
mask__27.22_105 = mask_ic_26.21_102 & vect_cst__104;
mask_patt_58.24_107 = [vec_unpack_lo_expr] mask__27.22_105;
mask_patt_58.24_108 = [vec_unpack_hi_expr] mask__27.22_105;
vect_patt_59.25_114 = MASK_LOAD (vectp.26_112, 8B, mask_patt_58.24_107);
_47 = ivtmp.52_18 + 32;
_46 = (vector(4) real(kind=8) *) _47;
vect_patt_59.25_116 = MASK_LOAD (_46, 8B, mask_patt_58.24_108);
vect_patt_61.29_124 = MASK_LOAD (vectp.30_122, 8B, mask_patt_58.24_107);
_48 = ivtmp.55_45 + 32;
_49 = (vector(4) real(kind=8) *) _48;
vect_patt_61.29_126 = MASK_LOAD (_49, 8B, mask_patt_58.24_108);
vect__32.32_127 = vect_patt_59.25_114 + vect_patt_61.29_124;
vect__32.32_128 = vect_patt_59.25_116 + vect_patt_61.29_126;
vect_sum_33.33_129 = VEC_PACK_TRUNC_EXPR <vect__32.32_127, vect__32.32_128>;
vect__43.34_130 = [vec_unpack_lo_expr] vect_sum_33.33_129;
vect__43.34_131 = [vec_unpack_hi_expr] vect_sum_33.33_129;
vect_patt_63.36_135 = VEC_COND_EXPR <mask_patt_58.24_107, vect__43.34_130, {
0.0, 0.0, 0.0, 0.0 }>;
vect_patt_63.36_136 = VEC_COND_EXPR <mask_patt_58.24_108, vect__43.34_131, {
0.0, 0.0, 0.0, 0.0 }>;
_62 = (void *) ivtmp.57_51;
MEM[base: _62, offset: 0B] = vect_patt_63.36_135;
MEM[base: _62, offset: 32B] = vect_patt_63.36_136;
ivtmp.51_15 = ivtmp.51_57 + 1;
ivtmp.52_42 = ivtmp.52_18 + 64;
ivtmp.55_16 = ivtmp.55_45 + 64;
ivtmp.57_52 = ivtmp.57_51 + 64;
if (ivtmp.51_15 >= bnd.16_65)
goto <bb 11>;
so I suspect a backend / RTL optimization issue.
Confirmed at least. Bisection would be nice.