[Bug target/66369] gcc 4.8.3/5.1.0 miss optimisation with vpmovmskb

ubizjak at gmail dot com gcc-bugzilla@gcc.gnu.org
Tue Jun 2 10:21:00 GMT 2015


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66369

--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> ---
I have looked briefly at this. The compiler actually generates the following:

        vpmovmskb       %ymm0, %edx     # 16    avx2_pmovmskb   [length = 4]
        testl   %edx, %edx      # 18    *cmpsi_ccno_1/1 [length = 2]
        je      .L5     # 19    *jcc_1  [length = 2]
        movslq  %edx, %rdx      # 21    *extendsidi2_rex64/2    [length = 3]
        tzcntq  %rdx, %rdx      # 52    *ctzdi2_falsedep        [length = 5]

from:

  int _14;
  long unsigned int v.1_15;
  int _16;
  ...
  _14 = __builtin_ia32_pmovmskb256 (_13);
  if (_14 != 0)
    goto <bb 5>;
  else
    goto <bb 6>;

  <bb 5>:
  v.1_15 = (long unsigned int) _14;
  _16 = __builtin_ctzl (v.1_15);
  _17 = (long int) _16;

The intrinsic returns "int", and from the above tree dump, the compiler won't
even consider to combine the sign-extension with vpmovmskb.

So, why not:

   unsigned int v;

   v = (unsigned int) _mm256_movemask_epi8( ... );
   if (v != 0)
      return (long) __builtin_ctz( v );


More information about the Gcc-bugs mailing list