Bug 70849 - Loop can be vectorized through gathers on AVX2 platforms.
Summary: Loop can be vectorized through gathers on AVX2 platforms.
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 7.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2016-04-28 13:35 UTC by Yuri Rumyantsev
Modified: 2021-12-26 21:08 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-12-26 00:00:00


Attachments
test-case to reproduce (220 bytes, text/plain)
2016-04-28 13:37 UTC, Yuri Rumyantsev
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yuri Rumyantsev 2016-04-28 13:35:59 UTC
Simple test which will be attached is not vectorized as not profitable:
test.c:11:5: note: cost model: the vector iteration cost = 2061 divided by the scalar iteration cost = 9 is greater or equal to the vectorization factor = 8.
test.c:11:5: note: not vectorized: vectorization not profitable.
test.c:11:5: note: not vectorized: vector version will never be profitable.

but it can be vectorized as icc does using gathers:
   LOOP BEGIN at test.c(11,5)
      remark #15388: vectorization support: reference c1[j] has aligned access   [ test.c(12,7) ]
      remark #15388: vectorization support: reference c2[j] has aligned access   [ test.c(13,7) ]
      remark #15388: vectorization support: reference c1[j] has aligned access   [ test.c(12,7) ]
      remark #15388: vectorization support: reference c2[j] has aligned access   [ test.c(13,7) ]
      remark #15415: vectorization support: gather was generated for the variable <f[j+base]>, strided by 256   [ test.c(12,16) ]
      remark #15415: vectorization support: gather was generated for the variable <f[j+base+1]>, strided by 256   [ test.c(13,16) ]
      remark #15415: vectorization support: gather was generated for the variable <f[j+base]>, strided by 256   [ test.c(12,16) ]
      remark #15415: vectorization support: gather was generated for the variable <f[j+base+1]>, strided by 256   [ test.c(13,16) ]
      remark #15305: vectorization support: vector length 8
      remark #15300: LOOP WAS VECTORIZED
      remark #15449: unmasked aligned unit stride stores: 4 
      remark #15460: masked strided loads: 4 
      remark #15475: --- begin vector loop cost summary ---
      remark #15476: scalar loop cost: 18 
      remark #15477: vector loop cost: 12.000 
      remark #15478: estimated potential speedup: 1.500 
      remark #15488: --- end vector loop cost summary ---
   LOOP END
Comment 1 Yuri Rumyantsev 2016-04-28 13:37:30 UTC
Created attachment 38365 [details]
test-case to reproduce

Must be compiled with -O3 -mavx2 options
Comment 2 Richard Biener 2016-04-28 14:07:37 UTC
Confirmed.  The vectorizer uses interleaving for this.  We don't consider other
options (like using scalar loads or gather loads) if that turns out not to be
profitable.