Bug 56770 - Partial sums loop optimization
Summary: Partial sums loop optimization
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.9.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2013-03-28 19:45 UTC by David Edelsohn
Modified: 2021-08-29 01:58 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-08-28 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Edelsohn 2013-03-28 19:45:27 UTC
GCC loop optimization should unroll and transform loops using partial sums where beneficial for expensive, independent computations where the target has additional function units available.

Before

    double fValue = 0;
    int j;
    for (j = 0; j < NZ; j++)
        fValue += Q[j] / r[j];

After

    double fValue = 0;
    double fValue1 = 0;
    int j;
    for (j = 0; j < NZ; j=j+2){
        fValue += Q[j] / r[j];
        fValue1 += Q[j+1] / r[j+1];
    }

    for (j = (NZ/2)*2; j < NZ; j++){
        fValue += Q[j] / r[j];
    }

    fValue = fValue + fValue1;
Comment 1 David Edelsohn 2013-03-29 15:29:38 UTC
Currently not implemented in GCC.
Comment 2 Joost VandeVondele 2013-03-29 15:55:38 UTC
well actually related to PR25621, I think, and partially implemented via 

-fvariable-expansion-in-unroller -ffast-math

would be nice to have this enabled (and well working) at -O3 or so.
Comment 3 David Edelsohn 2013-03-29 18:11:54 UTC
I tried -funroll-loops -fvariable-expansion-in-unroller.  I did not see any additional benefit from -fvariable-expansion-in-unroller.

Unrolling helped somewhat, the intermediate sum was computed in each loop iteration instead of sunk after the loop.
Comment 4 Steven Bosscher 2013-03-29 19:38:21 UTC
Interesting idea, I'll have a look.
Comment 5 David Edelsohn 2013-03-29 19:53:44 UTC
Segher pointed out that the transformed code example is has a bug.  The first revised loop should test j+1 < NZ.

    for (j = 0; j+1 < NZ; j += 2){
        fValue += Q[j] / r[j];
        fValue1 += Q[j+1] / r[j+1];
    }