This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/67530] New: Failure to eliminate dead code produced by vector lowering
- From: "wschmidt at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 09 Sep 2015 21:25:38 +0000
- Subject: [Bug tree-optimization/67530] New: Failure to eliminate dead code produced by vector lowering
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67530
Bug ID: 67530
Summary: Failure to eliminate dead code produced by vector
lowering
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: wschmidt at gcc dot gnu.org
CC: bergner at gcc dot gnu.org, rguenth at gcc dot gnu.org
Target Milestone: ---
Host: powerpc64le-unknown-linux-gnu
Target: powerpc64le-unknown-linux-gnu
Build: powerpc64le-unknown-linux-gnu
Test case gcc.dg/fold-compare-7.c is as follows:
/* { dg-do compile } */
/* { dg-options "-O2" } */
typedef float vecf __attribute__((vector_size(8*sizeof(float))));
long f(vecf *f1, vecf *f2){
return ((*f1 == *f2) < 0)[2];
}
The initial GIMPLE code is simple (compiled with -O2):
f (vecf * f1, vecf * f2)
{
long int D.2301;
vector(8) int D.2299;
vector(8) float D.2302;
vector(8) float D.2303;
vector(8) int D.2304;
int D.2305;
D.2302 = *f1;
D.2303 = *f2;
D.2304 = D.2302 == D.2303;
D.2299 = D.2304;
D.2305 = BIT_FIELD_REF <D.2299, 32, 64>;
D.2301 = (long int) D.2305;
return D.2301;
}
However, the vector lowering code expands this so we have a lot of comparisons
and BIT_FIELD_REF expressions, most of which turn out to not be needed. The
optimized tree dump shows:
<bb 2>:
_3 = *f1_2(D);
_5 = *f2_4(D);
_9 = BIT_FIELD_REF <_3, 32, 0>;
_10 = BIT_FIELD_REF <_5, 32, 0>;
_11 = _9 == _10 ? -1 : 0;
_12 = BIT_FIELD_REF <_3, 32, 32>;
_13 = BIT_FIELD_REF <_5, 32, 32>;
_14 = _12 == _13 ? -1 : 0;
_15 = BIT_FIELD_REF <_3, 32, 64>;
_16 = BIT_FIELD_REF <_5, 32, 64>;
_17 = _15 == _16 ? -1 : 0;
_18 = BIT_FIELD_REF <_3, 32, 96>;
_19 = BIT_FIELD_REF <_5, 32, 96>;
_20 = _18 == _19 ? -1 : 0;
_21 = BIT_FIELD_REF <_3, 32, 128>;
_22 = BIT_FIELD_REF <_5, 32, 128>;
_23 = _21 == _22 ? -1 : 0;
_24 = BIT_FIELD_REF <_3, 32, 160>;
_25 = BIT_FIELD_REF <_5, 32, 160>;
_26 = _24 == _25 ? -1 : 0;
_27 = BIT_FIELD_REF <_3, 32, 192>;
_28 = BIT_FIELD_REF <_5, 32, 192>;
_29 = _27 == _28 ? -1 : 0;
_30 = BIT_FIELD_REF <_3, 32, 224>;
_31 = BIT_FIELD_REF <_5, 32, 224>;
_32 = _30 == _31 ? -1 : 0;
_6 = {_11, _14, _17, _20, _23, _26, _29, _32};
_7 = _17;
_8 = (long int) _17;
return _8;
Note that the only instructions in here that matter are:
_3 = *f1_2(D);
_5 = *f2_4(D);
_15 = BIT_FIELD_REF <_3, 32, 64>;
_16 = BIT_FIELD_REF <_5, 32, 64>;
_17 = _15 == _16 ? -1 : 0;
_8 = (long int) _17;
return _8;
We end up generating really horrible code for this. The middle end should be
able to detect that _6 is dead and clean up the rest of this. We don't even
get rid of the dead copy _7 = _17.
I haven't looked at this carefully, but presumably this is due to the late
running of pass_lower_vector. Perhaps running DCE again would be appropriate
if pass_lower_vector makes any changes?