We have RROTATE_EXPR and LROTATE_EXPR, but the patterns are not reliably detected at the tree level. I'm attaching a testcase reduced from the Linux kernel, which has at least one sequence that can be rewritten using a rolw instruction: movzwl %cx, %edx movzwl %ax, %eax shrw $8, %cx movl %edx, -44(%ebp) sall $8, %edx movw %cx, -50(%ebp) orw %dx, -50(%ebp) Other opportunities exist. At the most basic level, a function like unsigned short rol8 (unsigned short word, unsigned int shift) { return (word << 8) | (word >> 8); } should be transformed at the tree level by recognizing the rotate.
Created attachment 21428 [details] A testcase which shows the problem.
pathetic... :)
Related to PR17886, where it says that: "gcc can detect the (x << y)|(x >> (bitwidth-y)) idiom for rotate and convert it into the machine rotate instruction. But it only works when y is a constant and is not long long." But apparently even this case is not detected anymore.
Fold used to detect these. Maybe we're now having different conversions inbetween.
Created attachment 21946 [details] Another testcase Here is another test case. Here on this x86_64-unknown-linux-gnu system, gcc recognizes the rotate for 32 and 64 bits, and fails for 16 and 8 bits. Constants or not doesn't seem to make a difference.
(In reply to comment #5) > Created attachment 21946 [details] > Another testcase > > Here is another test case. Here on this x86_64-unknown-linux-gnu system, gcc > recognizes the rotate for 32 and 64 bits, and fails for 16 and 8 bits. > Constants or not doesn't seem to make a difference. I should ad that I tested with "gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3".
In my case it does detect an even more complex scenario but it doesn't detect it when using integer typedefs from <sys/types.h>, more specifically: unsigned long rotate_left(unsigned long a, unsigned int shift) { return a << shift | a >> (sizeof(a) * 8 - shift); } results 080483d4 <rotate_left>: 80483d4: 55 push %ebp 80483d5: 89 e5 mov %esp,%ebp 80483d7: 53 push %ebx 80483d8: 8b 45 0c mov 0xc(%ebp),%eax 80483db: 8b 55 08 mov 0x8(%ebp),%edx 80483de: 89 d3 mov %edx,%ebx 80483e0: 89 c1 mov %eax,%ecx 80483e2: d3 c3 rol %cl,%ebx 80483e4: 89 d8 mov %ebx,%eax 80483e6: 5b pop %ebx 80483e7: 5d pop %ebp 80483e8: c3 ret but u_int64_t rotate_left(u_int64_t a, u_int32_t shift) { return a << shift | a >> (sizeof(a) * 8 - shift); } results 080483d4 <rotate_left>: 80483d4: 55 push %ebp 80483d5: 89 e5 mov %esp,%ebp 80483d7: 56 push %esi 80483d8: 53 push %ebx 80483d9: 83 ec 10 sub $0x10,%esp 80483dc: 8b 45 08 mov 0x8(%ebp),%eax 80483df: 89 45 f0 mov %eax,-0x10(%ebp) 80483e2: 8b 45 0c mov 0xc(%ebp),%eax 80483e5: 89 45 f4 mov %eax,-0xc(%ebp) 80483e8: 8b 45 f0 mov -0x10(%ebp),%eax 80483eb: 8b 55 f4 mov -0xc(%ebp),%edx 80483ee: 8b 4d 10 mov 0x10(%ebp),%ecx 80483f1: 89 c3 mov %eax,%ebx 80483f3: 89 d6 mov %edx,%esi 80483f5: 0f a5 de shld %cl,%ebx,%esi 80483f8: d3 e3 shl %cl,%ebx 80483fa: f6 c1 20 test $0x20,%cl 80483fd: 74 04 je 8048403 <ed_rotate_left+0x2f> 80483ff: 89 de mov %ebx,%esi 8048401: 31 db xor %ebx,%ebx 8048403: b9 40 00 00 00 mov $0x40,%ecx 8048408: 2b 4d 10 sub 0x10(%ebp),%ecx 804840b: 0f ad d0 shrd %cl,%edx,%eax 804840e: d3 ea shr %cl,%edx 8048410: f6 c1 20 test $0x20,%cl 8048413: 74 04 je 8048419 <ed_rotate_left+0x45> 8048415: 89 d0 mov %edx,%eax 8048417: 31 d2 xor %edx,%edx 8048419: 89 c1 mov %eax,%ecx 804841b: 09 d9 or %ebx,%ecx 804841d: 89 4d e8 mov %ecx,-0x18(%ebp) 8048420: 89 d1 mov %edx,%ecx 8048422: 09 f1 or %esi,%ecx 8048424: 89 4d ec mov %ecx,-0x14(%ebp) 8048427: 8b 45 e8 mov -0x18(%ebp),%eax 804842a: 8b 55 ec mov -0x14(%ebp),%edx 804842d: 83 c4 10 add $0x10,%esp 8048430: 5b pop %ebx 8048431: 5e pop %esi 8048432: 5d pop %ebp 8048433: c3 ret Here are the typedefs from <sys/types.h>: typedef unsigned int u_int32_t __attribute__ ((__mode__ (__SI__))); typedef unsigned int u_int64_t __attribute__ ((__mode__ (__DI__))); I also tried with u_int32_t/u_int16_t and other combinations but I get the same results. gcc version 4.5.4 (Gentoo 4.5.4 p1.0, pie-0.4.7)
(In reply to comment #7) > unsigned long rotate_left(unsigned long a, unsigned int shift) > { > return a << shift | a >> (sizeof(a) * 8 - shift); > } We have a regression in C++ in 4.8 there. Comparing -O3 -fdump-tree-optimized in gcc-4.7, g++-4.7 and gcc-4.8: D.1708_3 = a_1(D) r<< shift_2(D); and in g++-4.8: shift.0_2 = (int) shift_1(D); _4 = a_3(D) << shift.0_2; _5 = 64 - shift_1(D); _6 = (int) _5; _7 = a_3(D) >> _6; _8 = _7 | _4;
The patches for bug 57157 may fix this bug also.
They certainly mean to.
Author: jakub Date: Mon May 13 11:04:26 2013 New Revision: 198823 URL: http://gcc.gnu.org/viewcvs?rev=198823&root=gcc&view=rev Log: PR tree-optimization/45216 PR tree-optimization/57157 * tree-ssa-forwprop.c (simplify_rotate): Only recognize the (-Y) & (B - 1) variant if OP is |. * expmed.c (expand_shift_1): For rotations by const0_rtx just return shifted. Use (-op1) & (prec - 1) as other_amount instead of prec - op1. * c-c++-common/rotate-1.c: Add 32 tests with +. * c-c++-common/rotate-1a.c: Adjust. * c-c++-common/rotate-2.c: Add 32 tests with +, expect only 48 rotates. * c-c++-common/rotate-2b.c: New test. * c-c++-common/rotate-3.c: Add 32 tests with +. * c-c++-common/rotate-4.c: Add 32 tests with +, expect only 48 rotates. * c-c++-common/rotate-4b.c: New test. * c-c++-common/rotate-5.c: New test. Added: trunk/gcc/testsuite/c-c++-common/rotate-5.c Modified: trunk/gcc/ChangeLog trunk/gcc/expmed.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/c-c++-common/rotate-1.c trunk/gcc/testsuite/c-c++-common/rotate-1a.c trunk/gcc/testsuite/c-c++-common/rotate-2.c trunk/gcc/testsuite/c-c++-common/rotate-3.c trunk/gcc/testsuite/c-c++-common/rotate-4.c trunk/gcc/tree-ssa-forwprop.c
It looks like a patch was committed - can this be closed?
.
(In reply to Bernd Schmidt from comment #12) > It looks like a patch was committed - can this be closed? There is still PR62263, so not all cases are handled, but if all those mentioned in this PR are fixed, then it is enough to keep the other one open.