[Bug tree-optimization/61559] FAIL: gcc.dg/builtin-bswap-8.c on i686 with -mmovbe
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Thu Sep 4 08:56:00 GMT 2014
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61559
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #5)
> (In reply to Eric Botcazou from comment #4)
> > I guess the transformations should accept MEMs instead of just REGs but, no,
> > I'm not particularly interested in quirks of CISC architectures, I have
> > enough to do with those of RISC architectures.
>
> The problem is that with both function arguments in memory, combine
> simplifies sequence of bswaps with memory argument ( == movbe) in foo7 to:
>
> Failed to match this instruction:
> (set (reg:SI 84 [ D.2318 ])
> (xor:SI (mem/c:SI (plus:SI (reg/f:SI 16 argp)
> (const_int 4 [0x4])) [2 b+0 S4 A32])
> (mem/c:SI (reg/f:SI 16 argp) [2 a+0 S4 A32])))
>
> This is invalid RTX, where both input arguments are in memory.
>
> The optimized tree dump for foo7 is:
>
> <bb 2>:
> _2 = __builtin_bswap32 (a_1(D));
> _4 = __builtin_bswap32 (b_3(D));
> _5 = _4 ^ _2;
> _6 = __builtin_bswap32 (_5); [tail call]
> return _6;
Seems to me we want
(bit_xor (bswap32 @0) (bswap32 @1)) -> (bswap32 (bit_xor @0 @1))
in match-and-simplify speak.
On trunk this transform would go to tree-ssa-forwprop.c as pattern.
It would apply to all bitwise binary ops and all bswap builtins
(all bit/byte-shuffling operations applying the same shuffle to
both operands).
(for bitop in bit_xor bit_ior bit_and
(for bswap in BUILT_IN_BSWAP16 BUILT_IN_BSWAP32 BUILT_IN_BSWAP64
(simplify
(bitop (bswap @0) (bswap @1))
(bswap (bitop @0 @1))))
(simplify
(bitop (vec_perm @1 @2 @0) (vec_perm @3 @4 @0))
(vec_perm (bitop @1 @3) (bitop @2 @4) @0)))
not sure if the vector permute one is profitable (but I guess a
permute is always more expensive than a bit operation).
The requested transform of course relies on somebody transforming
bswap (bswap (x)) to x and for vec_perm detecting a cancelling
operation (tree-ssa-forwprop.c can do that already I think).
Mine. Fixed by the above on match-and-simplify.
> It looks to me that the optimization has to be re-implemented as tree
> optimization (probably by extending fold_builtin_bswap in builtins.c). This
> generic optimization will also benefit targets without bswap RTX pattern,
> e.g. plain i386, as observed in Comment #2.
>
> I'm recategorizing the PR as a tree-optimization.
More information about the Gcc-bugs
mailing list