This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/56829] Feature request: "generic" builtin to support control flow in vectorized code ("movemask", "vec_any/all_*")
- From: "peter at cordes dot ca" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 07 Jul 2015 18:16:55 +0000
- Subject: [Bug tree-optimization/56829] Feature request: "generic" builtin to support control flow in vectorized code ("movemask", "vec_any/all_*")
- Auto-submitted: auto-generated
- References: <bug-56829-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56829
Peter Cordes <peter at cordes dot ca> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |peter at cordes dot ca
--- Comment #3 from Peter Cordes <peter at cordes dot ca> ---
x86 has packed-compare and movemask instructions, but it also has a PTEST
instruction that sets flags directly from the result of a vector op. In some
cases it's more efficient than movemsk + test/jcc (esp. if you can make use of
the AND / ANDN ops it does, instead of just testing a vector against itself).
I recently wrote an answer on stackoverflow comparing PTEST vs. PCMPEQB /
PMOVMSKB for comparing two vectors for equality. Lower latency, but only equal
or fewer uops in this case that was ideal for PTEST.
http://stackoverflow.com/a/31198132/224132
Just something to keep in mind when designing gcc's arch-agnostic vector
support, that at least x86 can branch on vector PTEST, without needing any
compare / movemask. Requiring things to be written in terms of a movemask
wouldn't be horrible for x86, though.