This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/82498] Missed optimization for x86 rotate instruction
- From: "jakub at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 11 Oct 2017 17:17:35 +0000
- Subject: [Bug target/82498] Missed optimization for x86 rotate instruction
- Auto-submitted: auto-generated
- References: <bug-82498-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82498
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Two further cases:
unsigned
f10 (unsigned x, unsigned char y)
{
y %= __CHAR_BIT__ * __SIZEOF_INT__;
return (x << y) | (x >> (-y & ((__CHAR_BIT__ * __SIZEOF_INT__) - 1)));
}
unsigned
f11 (unsigned x, unsigned short y)
{
y %= __CHAR_BIT__ * __SIZEOF_INT__;
return (x << y) | (x >> (-y & ((__CHAR_BIT__ * __SIZEOF_INT__) - 1)));
}
On f11 GCC generates also efficient code, on f10 useless &.
Guess the f10 case would be improved by addition of a
*<rotate_insn><mode>3_mask_1 define_insn_and_split (and similarly the
inefficient/nonportable f1 code would be slightly improved).
Looking at LLVM, f1/f3/f5 are worse in LLVM than in GCC, and in all cases
instead of cmov it uses branching; f7/f8/f9/f10/f11 all generate efficient code
though, so the same like GCC in case of f8 and f11.