[Bug target/66369] gcc 4.8.3/5.1.0 miss optimisation with vpmovmskb
ubizjak at gmail dot com
gcc-bugzilla@gcc.gnu.org
Tue Jun 2 10:21:00 GMT 2015
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66369
--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> ---
I have looked briefly at this. The compiler actually generates the following:
vpmovmskb %ymm0, %edx # 16 avx2_pmovmskb [length = 4]
testl %edx, %edx # 18 *cmpsi_ccno_1/1 [length = 2]
je .L5 # 19 *jcc_1 [length = 2]
movslq %edx, %rdx # 21 *extendsidi2_rex64/2 [length = 3]
tzcntq %rdx, %rdx # 52 *ctzdi2_falsedep [length = 5]
from:
int _14;
long unsigned int v.1_15;
int _16;
...
_14 = __builtin_ia32_pmovmskb256 (_13);
if (_14 != 0)
goto <bb 5>;
else
goto <bb 6>;
<bb 5>:
v.1_15 = (long unsigned int) _14;
_16 = __builtin_ctzl (v.1_15);
_17 = (long int) _16;
The intrinsic returns "int", and from the above tree dump, the compiler won't
even consider to combine the sign-extension with vpmovmskb.
So, why not:
unsigned int v;
v = (unsigned int) _mm256_movemask_epi8( ... );
if (v != 0)
return (long) __builtin_ctz( v );
More information about the Gcc-bugs
mailing list