This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/82136] x86: -mavx256-split-unaligned-load should try to fold other shuffles into the load/vinsertf128


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82136

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2017-09-12
                 CC|                            |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC just applies the general interleaving strategy here which for existing
groups can be indeed quite bad.  And it gets worse because of the splitting
which isn't exposed to the vectorizer.

In the end the GIMPLE IL more nicely explains what the vectorizer tries to
do -- extract even/odd, mult/add and then interleave high/low:

  vect_x_13.2_26 = MEM[base: _2, offset: 0B];
  vect_x_13.3_22 = MEM[base: _2, offset: 32B];
  vect_perm_even_21 = VEC_PERM_EXPR <vect_x_13.2_26, vect_x_13.3_22, { 0, 2, 4,
6 }>;
  vect_perm_odd_20 = VEC_PERM_EXPR <vect_x_13.2_26, vect_x_13.3_22, { 1, 3, 5,
7 }>;
  vect__7.4_19 = vect_perm_odd_20 * vect_perm_even_21;
  vect__8.5_18 = vect_perm_odd_20 + vect_perm_even_21;
  vect_inter_high_34 = VEC_PERM_EXPR <vect__7.4_19, vect__8.5_18, { 0, 4, 1, 5
}>;
  vect_inter_low_29 = VEC_PERM_EXPR <vect__7.4_19, vect__8.5_18, { 2, 6, 3, 7
}>;
  MEM[base: _2, offset: 0B] = vect_inter_high_34;
  MEM[base: _2, offset: 32B] = vect_inter_low_29;

not sure what ends up messing things up here (I guess AVX256 doesn't have
full width extract even/odd and interleave high/low ...).

Looks like with -mprefer-avx128 we never try the larger vector size (Oops?).
At least we figure vectorization isn't profitable.

So all this probably boils down to costs of permutes not being modeled.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]