[Bug tree-optimization/88767] New: 'unroll and jam' not optimizing some loops
helijia at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Wed Jan 9 10:56:00 GMT 2019
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767
Bug ID: 88767
Summary: 'unroll and jam' not optimizing some loops
Product: gcc
Version: 9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: helijia at gcc dot gnu.org
Target Milestone: ---
The test source is as follows:
__attribute__((noinline)) void calculate(const double* __restrict__ A, const
double* __restrict__ B, double* __restrict__ C) {
unsigned int l_m = 0;
unsigned int l_n = 0;
unsigned int l_k = 0;
A = (const double*)__builtin_assume_aligned(A,16);
B = (const double*)__builtin_assume_aligned(B,16);
C = (double*)__builtin_assume_aligned(C,16);
for ( l_n = 0; l_n < 9; l_n++ ) { // loop 1
for ( l_m = 0; l_m < 10; l_m++ ) { C[(l_n*10)+l_m] = 0.0; } // loop 2
for ( l_k = 0; l_k < 17; l_k++ ) { // loop 3
for ( l_m = 0; l_m < 10; l_m++ ) { // loop 4
C[(l_n*10)+l_m] += A[(l_k*20)+l_m] * B[(l_n*20)+l_k];
}
}
}
}
#define SIZE 36
double A[SIZE][SIZE] __attribute__((aligned(16)));
double B[SIZE][SIZE] __attribute__((aligned(16)));
double C[SIZE][SIZE] __attribute__((aligned(16)));
int main()
{
long r, i, j;
for (i=0; i < SIZE; i++) {
for (j=0; j < SIZE; j++) {
A[i][j] = 1.0;
B[i][j] = 2.0;
C[i][j] = 3.0;
}
}
for (r=0; r < 1000000; r++) {
calculate(&A[0][0],&B[0][0], &C[0][0]);
}
return 0;
}
First, I compile the test case with the following command. g++
unroll_jam_bug.cpp -O3 -funroll-loops -floop-unroll-and-jam -o unroll_jam_bug
-fdump-tree-unrolljam-details. In the generated file of
unroll_jam_bug.cpp.143t.unrolljam, I found that there is no unroll and jam
optimization for the loop in the calculate function.
Second, I added the -fdump-tree-all parameter to the command line. I found that
the innermost loop(loop 3 and 4) is completely unrolled because
pass_data_complete_unrolli pass thinks innermost loop is small. As the inner
loop is fully expanded, the original loop becomes large. When the loop is
expanded in the pass_loop_jam pass, the number of unroll_factor * loop
instruction > 200 will be judged. If the result is true, the optimization will
be abandoned. Otherwise, the optimization will proceed.
By the second analysis, I tried to ban the unrolli optimization.So I use the
following command line. g++ unroll_jam_bug.cpp -O3 -mcpu=power8
-fdisable-tree-cunrolli -floop-unroll-and-jam -o unroll_jam_bug
-fdump-tree-unrolljam-details
Using this command, loop unroll and jam
optimization will be executed, but there seems to be room for optimization.
Original code:
for ( l_n = 0; l_n < 9; l_n++ ) {
for ( l_m = 0; l_m < 10; l_m++ ) { C[(l_n*10)+l_m] = 0.0; }
for ( l_k = 0; l_k < 17; l_k++ ) {
for ( l_m = 0; l_m < 10; l_m++ ) {
C[(l_n*10)+l_m] += A[(l_k*20)+l_m] * B[(l_n*20)+l_k];
}
}
}
After unroll and jam pass:
for ( l_n = 0; l_n < 9; l_n++ ) {
for ( l_m = 0; l_m < 10; l_m++ ) { C[(l_n*10)+l_m] = 0.0; }
for ( l_k = 0; l_k < 17; l_k += 2 ) {
for ( l_m = 0; l_m < 10; l_m++ ) {
C[(l_n*10)+l_m] += A[(l_k*20)+l_m] * B[(l_n*20)+l_k];
C[(l_n*10)+l_m] += A[(l_k*20 + 20)+l_m] * B[(l_n*20)+l_k + 1];
}
}
}
More information about the Gcc-bugs
mailing list