[Bug tree-optimization/102054] New: slightly worse code as PRE on some code got disabled for loop vectorization
linkw at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Wed Aug 25 07:14:27 GMT 2021
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102054
Bug ID: 102054
Summary: slightly worse code as PRE on some code got disabled
for loop vectorization
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: linkw at gcc dot gnu.org
Target Milestone: ---
This is a test case reduced from SPEC2017 bmk 541.leela_r source FastBoard.cpp,
when I was investigating the O2 vectorization degradation on SPEC2017 run. It's
an issue similar to PR100794, but which is only applied at O2 and fixed by
re-running pcom at O2. This one is applied for O3 vectorization as well.
TEST CASE:
class FastBoard {
public:
static const int NBR_SHIFT = 4;
static const int MAXBOARDSIZE = 19;
static const int MAXSQ = ((MAXBOARDSIZE + 2) * (MAXBOARDSIZE + 2));
enum square_t {
BLACK = 0, WHITE = 1, EMPTY = 2, INVAL = 3
};
bool self_atari(int color, int vertex);
protected:
int m_dirs[4];
square_t m_square[MAXSQ];
int nbr_libs[20];
};
bool FastBoard::self_atari(int color, int vertex) {
int nbr_libs_cnt = 0;
nbr_libs[nbr_libs_cnt++] = vertex;
for (int k = 0; k < 20; k++) {
int ai = vertex + m_dirs[k];
if (m_square[ai] == FastBoard::EMPTY) {
bool found = false;
for (int i = 0; i < nbr_libs_cnt; i++) {
if (nbr_libs[i] == ai) {
found = true;
break;
}
}
if (!found) {
if (nbr_libs_cnt > 1)
return false;
nbr_libs[nbr_libs_cnt++] = ai;
}
}
}
return true;
}
Options: -mcpu=power9 -Ofast (or -O2 -ftree-vectorize) etc.
With -fno-tree-loop-vectorize, it passes down the vertex_11 for nbr_libs[0].
<bb 3> [local count: 1014686026]:
# prephitmp_26 = PHI <pretmp_28(5), vertex_11(D)(10)>
# ivtmp.17_27 = PHI <ivtmp.17_3(5), ivtmp.17_8(10)>
if (ai_15 == prephitmp_26)
goto <bb 8>; [5.50%]
else
goto <bb 4>; [94.50%]
<bb 4> [local count: 958878295]:
if (ivtmp.17_27 != _31)
goto <bb 5>; [93.84%]
else
goto <bb 11>; [6.16%]
<bb 5> [local count: 899822494]:
ivtmp.17_3 = ivtmp.17_27 + 4;
_21 = (void *) ivtmp.17_3;
pretmp_28 = MEM[(int *)_21];
goto <bb 3>; [100.00%]
Without -fno-tree-loop-vectorize, it has the below IRs instead, always do the
load before ai comparison.
<bb 4> [local count: 1014686026]:
# ivtmp.12_27 = PHI <ivtmp.12_28(5), ivtmp.12_26(3)>
ivtmp.12_28 = ivtmp.12_27 + 4;
_22 = (void *) ivtmp.12_28;
_3 = MEM[(int *)_22];
if (_3 == ai_15)
goto <bb 8>; [5.50%]
else
goto <bb 5>; [94.50%]
<bb 5> [local count: 958878295]:
if (ivtmp.12_28 != _30)
goto <bb 4>; [93.84%]
else
goto <bb 10>; [6.16%]
More information about the Gcc-bugs
mailing list