GCC loop optimization should unroll and transform loops using partial sums where beneficial for expensive, independent computations where the target has additional function units available. Before double fValue = 0; int j; for (j = 0; j < NZ; j++) fValue += Q[j] / r[j]; After double fValue = 0; double fValue1 = 0; int j; for (j = 0; j < NZ; j=j+2){ fValue += Q[j] / r[j]; fValue1 += Q[j+1] / r[j+1]; } for (j = (NZ/2)*2; j < NZ; j++){ fValue += Q[j] / r[j]; } fValue = fValue + fValue1;
Currently not implemented in GCC.
well actually related to PR25621, I think, and partially implemented via -fvariable-expansion-in-unroller -ffast-math would be nice to have this enabled (and well working) at -O3 or so.
I tried -funroll-loops -fvariable-expansion-in-unroller. I did not see any additional benefit from -fvariable-expansion-in-unroller. Unrolling helped somewhat, the intermediate sum was computed in each loop iteration instead of sunk after the loop.
Interesting idea, I'll have a look.
Segher pointed out that the transformed code example is has a bug. The first revised loop should test j+1 < NZ. for (j = 0; j+1 < NZ; j += 2){ fValue += Q[j] / r[j]; fValue1 += Q[j+1] / r[j+1]; }