[Bug c++/91940] __builtin_bswap16 loop optimization

Mon Sep 30 17:17:00 GMT 2019

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91940

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2019-09-30
                 CC|                            |jakub at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The loop with the rotate is vectorized, while the one with __builtin_bswap16 is
not.  For rotates if the ISA doesn't have vector support for rotates, we use
vect_recog_rotate_pattern to undo the matching of hand written rotate into a
rotate by breaking it up again into shifts + blend.
For __builtin_bswap* we have vectorizable_bswap support but it only works if
there is no type promotion in the call argument; in such case it is not handled
using rotates etc., but as a permutation of the vector elements (if supported).
Unfortunately, for __builtin_bswap16 the argument is promoted.
So, the options are look through the argument promotion for vectorizable_bswap,
or in tree-vect-patterns.c pattern match the __builtin_bswap16 on a promoted
integer to a call with non-promoted argument, and optionally check if the
permutation would be supported and maybe fall back to rotate that
vect_recog_rotate_pattern can produce.