Bug 71992 - Missed BB SLP vectorization in GCC
Summary: Missed BB SLP vectorization in GCC
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 7.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2016-07-25 11:46 UTC by vekumar
Modified: 2016-07-25 12:08 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-07-25 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description vekumar 2016-07-25 11:46:53 UTC
The below test case fails to vectorize.
gcc version 7.0.0 20160724 (experimental) (GCC)

gcc -Ofast -mavx -fvect-cost-model=unlimited slp.c -S -fdump-tree-slp-all

struct st
{
        double x;
        double y;
        double z;
        double p;
        double q;
}*obj;

double a,b,c;

void slp_test()
{

        obj->x = a*a+3.0;
        obj->y= b*b+c;
        obj->z= a+b*3.0;
        obj->p= a+b*3.0;
        obj->q =a+b+c;

}

LLVM is able to SLP vectorize looks like it is creating vector of [a,c]  and [b*3.0,b*b] and does vector add.

GCC is not SLP vectorizing.  Group slitting also not working. I expected it to get split and vectorize these statements.

  obj->z= a+b*3.0;
  obj->p= a+b*3.0;

Another case 

struct st
{
        double x;
        double y;
        double z;
        double p;
        double q;
}*obj;

double a,b,c;

void slp_test()
{

        obj->x = a*b;
        obj->y= b+c;
        obj->z= a+b*3.0;
        obj->p= a+b*3.0;
        obj->q =a+b+c;

}


LLVM forms vector [b*3.0,a+b] [a,c] and does vector addition.
Comment 1 Richard Biener 2016-07-25 12:08:48 UTC
Confirmed.  I think doing it as

 [a, b, b, b] * [a, b, 3., 3.] + [3., c, a, a]

would be "optimal" (not factoring in vector construction cost of course).

The issue is how SLP construction works and the number of swaps / builds
from scalars do.

One issue is that we even try with a group-size of 5.  Fixing that
doesn't fix it though as we do not consider building a vector from scalars
until we tried to swap the parent op (and if that fails we don't go back
building children from scalars).  Only trying with a group size of 4
would also regress the case where we'd have split after the first element.

That said, the whole SLP discovery needs a different algorithmic approach
to fix cases like this.