LLVM lacks an intrinsic for performing bitwise rotation, relying instead on spotting the classic C idioms for specifying rotation using two shifts. Unfortunately, when the rotation is defined by variable, its ability to spot rotation code is poor. Code that supports a variable rotation also needs to handle rotation-by-zero, which the underlying instruction has no problem with, but when translated into the classic C idiom, results in an undefined shift (because shifting a 32-bit integer by 32 bits isn't allowed). In the following code, only rotate32_undefined1, rotate32_undefined2 and _rotl32_doubleand1 compiles to a simple rotate instruction. rotl32_zerocheck also compiles to a rotate, but it contains a redundant test for zero -- a test that is necessary in the C code but not necessary for the rotate. Somewhat annoyingly, Clang 3.5 also has poor rotation detection, and only detects it for rotl_doubleand2 and rotl_doubleand3, as well as rotl32_undefined2. It is filed as http://llvm.org/bugs/show_bug.cgi?id=20750 Thus GCC and Clang differ as to which code they want for a rotate, and neither is good at recognizing variations of the rotate idiom. --- C code --- unsigned int rotl32_undefined1(unsigned int v, unsigned char r) { r = r & 31; return (v << r) | (v >> (32 - r)); } unsigned int rotl32_undefined2(unsigned int v, unsigned char r) { return (v << (r & 31)) | (v >> (32 - (r & 31))); } unsigned int rotl32_zerocheck(unsigned int v, unsigned char r) { r = r & 31; return r ? (v << r) | (v >> (32 - r)) : v; } unsigned int rotl32_doubleand1(unsigned int v, unsigned char r) { r = r & 31; return (v << r) | (v >> ((32 - r) & 31)); } unsigned int rotl32_doubleand2(unsigned int v, unsigned char r) { return (v << (r & 31)) | (v >> ((32 - (r & 31)) & 31)); } unsigned int rotl32_doubleand3(unsigned int v, unsigned char r) { return (v << (r & 31)) | (v >> ((32 - r) & 31)); } --- Assembly output, gcc 4.9.0 -O3 -fomit-frame-pointer --- _rotl32_undefined1: LFB0: movl %esi, %ecx movl %edi, %eax andl $31, %ecx roll %cl, %eax ret LFE0: .section __TEXT,__text_cold,regular,pure_instructions LCOLDE0: .text LHOTE0: .section __TEXT,__text_cold,regular,pure_instructions LCOLDB1: .text LHOTB1: .align 4,0x90 .globl _rotl32_undefined2 _rotl32_undefined2: LFB1: movl %esi, %ecx movl %edi, %eax andl $31, %ecx roll %cl, %eax ret LFE1: .section __TEXT,__text_cold,regular,pure_instructions LCOLDE1: .text LHOTE1: .section __TEXT,__text_cold,regular,pure_instructions LCOLDB2: .text LHOTB2: .align 4,0x90 .globl _rotl32_zerocheck _rotl32_zerocheck: LFB2: movl %esi, %ecx movl %edi, %eax andl $31, %ecx roll %cl, %eax testb %cl, %cl cmove %edi, %eax ret LFE2: .section __TEXT,__text_cold,regular,pure_instructions LCOLDE2: .text LHOTE2: .section __TEXT,__text_cold,regular,pure_instructions LCOLDB3: .text LHOTB3: .align 4,0x90 .globl _rotl32_doubleand1 _rotl32_doubleand1: LFB3: movl %esi, %ecx movl %edi, %eax andl $31, %ecx roll %cl, %eax ret LFE3: .section __TEXT,__text_cold,regular,pure_instructions LCOLDE3: .text LHOTE3: .section __TEXT,__text_cold,regular,pure_instructions LCOLDB4: .text LHOTB4: .align 4,0x90 .globl _rotl32_doubleand2 _rotl32_doubleand2: LFB4: movl %esi, %ecx movl %edi, %eax negl %ecx shrl %cl, %eax movl %esi, %ecx andl $31, %ecx sall %cl, %edi orl %edi, %eax ret
Possibly this should have been filed as a tree-level bug? Also, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886 which mostly is about issues with wider types, but may also cover this bug.
I believe we handle at least some "correct" forms now - Jakub?
We handle at least (x << n) | (x >> ((-n) & 31)) (N can be 0 here) since PR57157.
Problem here is conversions that happen in places where we don't handle them. A couple more patterns should do it (it could be a good test for the optional convert feature in the match branch). For the zerocheck version, it would work if r was an int, but here we get 2 insns in the branch (convert+rotate) which is quite a bit harder to handle than a single rotate insn.
Also, I would have expected the pattern *<rotate_insn><mode>3_mask to avoid generating 'andl' before 'roll'.
(In reply to Marek Polacek from comment #3) > We handle at least > (x << n) | (x >> ((-n) & 31)) > (N can be 0 here) since PR57157. Although this code does work with LLVM, testing with GCC 4.9.0, and this implementation unsigned int rotl32_doubleand4(unsigned int v, unsigned char r) { return (v << (r & 31)) | (v >> ((-r) & 31)); } (or variations with r being a char or int) produces code like this: _rotl32_doubleand4: LFB6: movl %esi, %ecx movl %edi, %eax negl %ecx shrl %cl, %eax movl %esi, %ecx sall %cl, %edi orl %edi, %eax ret ... so it failed to spot the idiom entirely.
Author: jakub Date: Sat Oct 14 18:47:14 2017 New Revision: 253760 URL: https://gcc.gnu.org/viewcvs?rev=253760&root=gcc&view=rev Log: PR middle-end/62263 PR middle-end/82498 * tree-ssa-forwprop.c (simplify_rotate): Allow def_arg1[N] to be any operand_equal_p operands. For & (B - 1) require B to be power of 2. Recognize (X << (Y & (B - 1))) | (X >> ((-Y) & (B - 1))) and similar patterns. * c-c++-common/rotate-5.c (f2): New function. Move old function to ... (f4): ... this. Use 127 instead of 128. (f3, f5, f6): New functions. (main): Test all f[1-6] functions, with both 0 and 1 as second arguments. * c-c++-common/rotate-6.c: New test. * c-c++-common/rotate-6a.c: New test. * c-c++-common/rotate-7.c: New test. * c-c++-common/rotate-7a.c: New test. * c-c++-common/rotate-8.c: New test. Added: trunk/gcc/testsuite/c-c++-common/rotate-6.c trunk/gcc/testsuite/c-c++-common/rotate-6a.c trunk/gcc/testsuite/c-c++-common/rotate-7.c trunk/gcc/testsuite/c-c++-common/rotate-7a.c trunk/gcc/testsuite/c-c++-common/rotate-8.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/c-c++-common/rotate-5.c trunk/gcc/tree-ssa-forwprop.c
Author: jakub Date: Sat Oct 14 18:48:38 2017 New Revision: 253761 URL: https://gcc.gnu.org/viewcvs?rev=253761&root=gcc&view=rev Log: PR middle-end/62263 PR middle-end/82498 * tree-ssa-phiopt.c (value_replacement): Comment fix. Handle up to 2 preparation statements for ASSIGN in MIDDLE_BB. * c-c++-common/rotate-8.c: Expect no PHIs in optimized dump. Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/c-c++-common/rotate-8.c trunk/gcc/tree-ssa-phiopt.c
With the recent changes on the trunk, I'm getting optimal generated code at least on x86_64/i686 for all reasonable and less reasonable patterns I've tried. Of course, one can write arbitrarily weirdo patterns and not everything will be recognized.