[Bug tree-optimization/91573] New: Vectorization failure for a loop to do multiply-add

hliu at amperecomputing dot com gcc-bugzilla@gcc.gnu.org
Wed Aug 28 07:41:00 GMT 2019


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91573

            Bug ID: 91573
           Summary: Vectorization failure for a loop to do multiply-add
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hliu at amperecomputing dot com
  Target Milestone: ---

The following code can not be vectorized ( compiling with gcc -O3 ):

=== begin code ===

char src[512];
char dst[512];

#define WIDTH 8

void foo(int height, int a, int b, int c, int d, int dst_stride) {
    char * ptr_src = src;
    char * ptr_dst = dst;

    for( int y = 0; y < height; y++ )
    {
        for( int x = 0; x < WIDTH; x++ )
            ptr_dst[x] = ( a*ptr_src[x] + b*ptr_src[x+1] + c*ptr_src[x] +
d*ptr_src[x+1]) >> 6;
        ptr_dst += dst_stride;
        ptr_src += 32;
    }
}

=== end code ===

However, the case can be vectorized with either following modifications:
1) If the calculation is simpler, e.g.
     ptr_dst[x] = ( a*ptr_src[x] + c*ptr_src[x] ) >> 6;

2) If WIDTH is larger. e.g.
     #define WIDTH 16

This case is a hot loop from real application. It can be exposed on both
AArch64 and X86-64 platform.


More information about the Gcc-bugs mailing list