[Bug target/58790] [missed optimization] reduction of masks of builtin vectors not transformed to ptest or movemask instructions

kretz at kde dot org gcc-bugzilla@gcc.gnu.org
Thu May 16 15:14:00 GMT 2019


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58790

Matthias Kretz <kretz at kde dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|4.9.0                       |10.0

--- Comment #2 from Matthias Kretz <kretz at kde dot org> ---
Completely different idea how to handle mask reduction and create more
potential for optimization:

Add a new builtin "__builtin_is_zero(x)" which takes any __vector(N) type and
returns true if all bits of x are 0.

none_equal(a, b) { return __builtin_is_zero(a == b); }
all_equal(a, b) { return __builtin_is_zero(~(a == b)); }
any_equal(a, b) { return !__builtin_is_zero(a == b); }
some_equal(a, b) { return !__builtin_is_zero(a == b) && !__bulitin_is_zero(~(a
== b)) }

The x86 backend could then translate those to movmsk or ptest/vtestp[sd].
Examples:
with SSE4:
__builtin_is_zero(x) -> ptest(x, x); return ZF
__builtin_is_zero(~x) -> ptest(x, -1); return CF
__builtin_is_zero(integer < 0) -> ptest(integer, signmask); return ZF
__builtin_is_zero(x & k) -> ptest(x, k); return ZF
__builtin_is_zero(~x & k) -> ptest(x, k); return CF
__builtin_is_zero((integer < 0) & k) -> ptest(integer, signmask & k); return ZF

without SSE4:
__builtin_is_zero(x) -> movmsk(x == 0) == 0
__builtin_is_zero(mask) -> movmsk(mask) == 0  // i.e. when the argument is
known
                                              // to have only 0 or -1 values
__builtin_is_zero(a == b) -> movmsk(a == b) == 0
__builtin_is_zero(~(a == b)) -> movmsk(a == b) == "full bitmask" // 0x3, 0xf,
0xff, 0xffff, or 0xffffffff depending on the actual movmsk instruction used.

I assume this would make PR90483 a lot more natural to implement.


More information about the Gcc-bugs mailing list