Bug 45216

Summary: Rotate expressions not recognized at tree level
Product: gcc Reporter: Bernd Schmidt <bernds>
Component: tree-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: enhancement CC: gcc-bugs, jakub, kai.extern, rguenth
Priority: P3 Keywords: missed-optimization
Version: 4.6.0   
Target Milestone: ---   
Host: i686-pc-linux-gnu Target: i686-pc-linux-gnu
Build: i686-pc-linux-gnu Known to work:
Known to fail: Last reconfirmed: 2010-08-06 23:02:11
Bug Depends on: 57157    
Bug Blocks:    
Attachments: A testcase which shows the problem.
Another testcase

Description Bernd Schmidt 2010-08-06 22:18:56 UTC
We have RROTATE_EXPR and LROTATE_EXPR, but the patterns are not reliably detected at the tree level.  I'm attaching a testcase reduced from the Linux kernel, which has at least one sequence that can be rewritten using a rolw instruction:

	movzwl	%cx, %edx
	movzwl	%ax, %eax
	shrw	$8, %cx
	movl	%edx, -44(%ebp)
	sall	$8, %edx
	movw	%cx, -50(%ebp)
	orw	%dx, -50(%ebp)

Other opportunities exist.  At the most basic level, a function like

unsigned short rol8 (unsigned short word, unsigned int shift)
{
  return (word << 8) | (word >> 8);
}

should be transformed at the tree level by recognizing the rotate.
Comment 1 Bernd Schmidt 2010-08-06 22:19:32 UTC
Created attachment 21428 [details]
A testcase which shows the problem.
Comment 2 Steven Bosscher 2010-08-06 23:02:11 UTC
pathetic... :)
Comment 3 Steven Bosscher 2010-08-06 23:17:38 UTC
Related to PR17886, where it says that: "gcc can detect the (x << y)|(x >> (bitwidth-y)) idiom for rotate and convert it into the machine rotate instruction. But it only works when y is a constant and is not long long." But apparently even this case is not detected anymore.
Comment 4 Richard Biener 2010-08-06 23:41:39 UTC
Fold used to detect these.  Maybe we're now having different conversions inbetween.
Comment 5 Kai Henningsen 2010-10-03 08:30:20 UTC
Created attachment 21946 [details]
Another testcase

Here is another test case. Here on this x86_64-unknown-linux-gnu system, gcc recognizes the rotate for 32 and 64 bits, and fails for 16 and 8 bits. Constants or not doesn't seem to make a difference.
Comment 6 Kai Henningsen 2010-10-03 08:33:20 UTC
(In reply to comment #5)
> Created attachment 21946 [details]
> Another testcase
> 
> Here is another test case. Here on this x86_64-unknown-linux-gnu system, gcc
> recognizes the rotate for 32 and 64 bits, and fails for 16 and 8 bits.
> Constants or not doesn't seem to make a difference.

I should ad that I tested with "gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3".
Comment 7 Nick Kossifidis 2012-12-01 01:37:35 UTC
In my case it does detect an even more complex scenario but it doesn't detect it when using integer typedefs from <sys/types.h>, more specifically:

unsigned long rotate_left(unsigned long a, unsigned int shift)
{
        return a << shift | a >> (sizeof(a) * 8 - shift);
}

results

080483d4 <rotate_left>:
 80483d4:	55                   	push   %ebp
 80483d5:	89 e5                	mov    %esp,%ebp
 80483d7:	53                   	push   %ebx
 80483d8:	8b 45 0c             	mov    0xc(%ebp),%eax
 80483db:	8b 55 08             	mov    0x8(%ebp),%edx
 80483de:	89 d3                	mov    %edx,%ebx
 80483e0:	89 c1                	mov    %eax,%ecx
 80483e2:	d3 c3                	rol    %cl,%ebx
 80483e4:	89 d8                	mov    %ebx,%eax
 80483e6:	5b                   	pop    %ebx
 80483e7:	5d                   	pop    %ebp
 80483e8:	c3                   	ret  

but

u_int64_t rotate_left(u_int64_t a, u_int32_t shift)
{
        return a << shift | a >> (sizeof(a) * 8 - shift);
}

results

080483d4 <rotate_left>:
 80483d4:	55                   	push   %ebp
 80483d5:	89 e5                	mov    %esp,%ebp
 80483d7:	56                   	push   %esi
 80483d8:	53                   	push   %ebx
 80483d9:	83 ec 10             	sub    $0x10,%esp
 80483dc:	8b 45 08             	mov    0x8(%ebp),%eax
 80483df:	89 45 f0             	mov    %eax,-0x10(%ebp)
 80483e2:	8b 45 0c             	mov    0xc(%ebp),%eax
 80483e5:	89 45 f4             	mov    %eax,-0xc(%ebp)
 80483e8:	8b 45 f0             	mov    -0x10(%ebp),%eax
 80483eb:	8b 55 f4             	mov    -0xc(%ebp),%edx
 80483ee:	8b 4d 10             	mov    0x10(%ebp),%ecx
 80483f1:	89 c3                	mov    %eax,%ebx
 80483f3:	89 d6                	mov    %edx,%esi
 80483f5:	0f a5 de             	shld   %cl,%ebx,%esi
 80483f8:	d3 e3                	shl    %cl,%ebx
 80483fa:	f6 c1 20             	test   $0x20,%cl
 80483fd:	74 04                	je     8048403 <ed_rotate_left+0x2f>
 80483ff:	89 de                	mov    %ebx,%esi
 8048401:	31 db                	xor    %ebx,%ebx
 8048403:	b9 40 00 00 00       	mov    $0x40,%ecx
 8048408:	2b 4d 10             	sub    0x10(%ebp),%ecx
 804840b:	0f ad d0             	shrd   %cl,%edx,%eax
 804840e:	d3 ea                	shr    %cl,%edx
 8048410:	f6 c1 20             	test   $0x20,%cl
 8048413:	74 04                	je     8048419 <ed_rotate_left+0x45>
 8048415:	89 d0                	mov    %edx,%eax
 8048417:	31 d2                	xor    %edx,%edx
 8048419:	89 c1                	mov    %eax,%ecx
 804841b:	09 d9                	or     %ebx,%ecx
 804841d:	89 4d e8             	mov    %ecx,-0x18(%ebp)
 8048420:	89 d1                	mov    %edx,%ecx
 8048422:	09 f1                	or     %esi,%ecx
 8048424:	89 4d ec             	mov    %ecx,-0x14(%ebp)
 8048427:	8b 45 e8             	mov    -0x18(%ebp),%eax
 804842a:	8b 55 ec             	mov    -0x14(%ebp),%edx
 804842d:	83 c4 10             	add    $0x10,%esp
 8048430:	5b                   	pop    %ebx
 8048431:	5e                   	pop    %esi
 8048432:	5d                   	pop    %ebp
 8048433:	c3                   	ret  

Here are the typedefs from <sys/types.h>:

typedef unsigned int u_int32_t __attribute__ ((__mode__ (__SI__)));
typedef unsigned int u_int64_t __attribute__ ((__mode__ (__DI__)));

I also tried with u_int32_t/u_int16_t and other combinations but I get the same results.

gcc version 4.5.4 (Gentoo 4.5.4 p1.0, pie-0.4.7)
Comment 8 Marc Glisse 2012-12-01 08:51:12 UTC
(In reply to comment #7)
> unsigned long rotate_left(unsigned long a, unsigned int shift)
> {
>         return a << shift | a >> (sizeof(a) * 8 - shift);
> }

We have a regression in C++ in 4.8 there. Comparing -O3 -fdump-tree-optimized in gcc-4.7, g++-4.7 and gcc-4.8:

  D.1708_3 = a_1(D) r<< shift_2(D);

and in g++-4.8:

  shift.0_2 = (int) shift_1(D);
  _4 = a_3(D) << shift.0_2;
  _5 = 64 - shift_1(D);
  _6 = (int) _5;
  _7 = a_3(D) >> _6;
  _8 = _7 | _4;
Comment 9 Steven Bosscher 2013-05-09 14:43:06 UTC
The patches for bug 57157 may fix this bug also.
Comment 10 Jakub Jelinek 2013-05-09 14:51:19 UTC
They certainly mean to.
Comment 11 Jakub Jelinek 2013-05-13 11:13:35 UTC
Author: jakub
Date: Mon May 13 11:04:26 2013
New Revision: 198823

URL: http://gcc.gnu.org/viewcvs?rev=198823&root=gcc&view=rev
Log:
	PR tree-optimization/45216
	PR tree-optimization/57157
	* tree-ssa-forwprop.c (simplify_rotate): Only recognize
	the (-Y) & (B - 1) variant if OP is |.
	* expmed.c (expand_shift_1): For rotations by const0_rtx just
	return shifted.  Use (-op1) & (prec - 1) as other_amount
	instead of prec - op1.

	* c-c++-common/rotate-1.c: Add 32 tests with +.
	* c-c++-common/rotate-1a.c: Adjust.
	* c-c++-common/rotate-2.c: Add 32 tests with +, expect
	only 48 rotates.
	* c-c++-common/rotate-2b.c: New test.
	* c-c++-common/rotate-3.c: Add 32 tests with +.
	* c-c++-common/rotate-4.c: Add 32 tests with +, expect
	only 48 rotates.
	* c-c++-common/rotate-4b.c: New test.
	* c-c++-common/rotate-5.c: New test.

Added:
    trunk/gcc/testsuite/c-c++-common/rotate-5.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/expmed.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/c-c++-common/rotate-1.c
    trunk/gcc/testsuite/c-c++-common/rotate-1a.c
    trunk/gcc/testsuite/c-c++-common/rotate-2.c
    trunk/gcc/testsuite/c-c++-common/rotate-3.c
    trunk/gcc/testsuite/c-c++-common/rotate-4.c
    trunk/gcc/tree-ssa-forwprop.c
Comment 12 Bernd Schmidt 2016-01-26 13:26:33 UTC
It looks like a patch was committed - can this be closed?
Comment 13 Jakub Jelinek 2016-01-26 13:36:40 UTC
.
Comment 14 Marc Glisse 2016-01-26 14:26:38 UTC
(In reply to Bernd Schmidt from comment #12)
> It looks like a patch was committed - can this be closed?

There is still PR62263, so not all cases are handled, but if all those mentioned in this PR are fixed, then it is enough to keep the other one open.