23810 – missed 64-bit shift+mask optimizations on 32-bit arch

Bug 23810 - missed 64-bit shift+mask optimizations on 32-bit arch

Summary: missed 64-bit shift+mask optimizations on 32-bit arch

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	4.1.0

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2005-09-11 00:18 UTC by Ken Raeburn
Modified:	2021-07-26 19:43 UTC (History)
CC List:	1 user (show)

See Also:
Host:	i686-pc-linux-gnu
Target:	i686-pc-linux-gnu
Build:	i686-pc-linux-gnu
Known to work:
Known to fail:	4.7.0
Last reconfirmed:	2021-07-26 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ken Raeburn 2005-09-11 00:18:46 UTC


(Sources are from CVS as of about 6AM US/Eastern time today.)

I'm testing out how well gcc optimizes some code for reversing bit
strings.  It appears that on x86 at least, double-word shifts followed
by masks that zero out all the bits that crossed the word boundary are
not optimized as well as they could be.

In the included file, compiled with "-O9 -fomit-frame-pointer",
functions rt and rt2 both result in assembly code including a
double-word shift, bringing two bits from the upper half of the
argument into the top of the lower half of the double-word value, then
masks that word with 0x33333333, which zeros out those bits:

    rt:
	    movl	8(%esp), %edx
	    movl	4(%esp), %eax
	    shrdl	$2, %edx, %eax
	    shrl	$2, %edx
	    andl	$858993459, %eax
	    andl	$858993459, %edx
	    ret

Okay, in this case, the only optimization would be to make the shift
not reference both %edx and %eax, and drop the reference to the upper
half flom the RTL during optimization.  To highlight the issue a
little more, rt4 is like rt but only returns the lower half.  Still,
the upper half is read in from memory (and shifted!) needlessly:

    rt4:
	    movl	8(%esp), %edx
	    movl	4(%esp), %eax
	    shrdl	$2, %edx, %eax
	    andl	$858993459, %eax
	    shrl	$2, %edx
	    ret

Function left shows the same problem, shifting in the opposite
direction:

    left:
	    movl	4(%esp), %eax
	    movl	8(%esp), %edx
	    shldl	$2, %eax, %edx
	    sall	$2, %eax
	    andl	$-858993460, %edx
	    andl	$-858993460, %eax
	    ret

The "andl" of %edx with 0xcccccccc will clobber the bits brought in
from %eax.

I haven't got the hang of reading ppc assembly yet, but I think the
Mac OS X compiler (10.4.2 = "gcc version 4.0.0 (Apple Computer,
Inc. build 5026)") is missing similar optimizations.  I haven't tried
the cvs code on ppc.

Environment:
System: Linux kal-el 2.4.17 #4 SMP Sun Apr 6 16:25:37 EDT 2003 i686 GNU/Linux
Architecture: i686

	
host: i686-pc-linux-gnu
build: i686-pc-linux-gnu
target: i686-pc-linux-gnu
configured with: ../src/configure --enable-maintainer-mode --prefix=/u3/raeburn/gcc/linux/Install --enable-languages=c,c++,java,objc --no-create --no-recursion : (reconfigured) ../src/configure --prefix=/u3/raeburn/gcc/linux/Install

How-To-Repeat:

typedef unsigned long long uint64_t;
typedef unsigned long uint32_t;

uint64_t rt (uint64_t n) { return (n >> 2) & 0x3333333333333333ULL; }
uint64_t rt2 (uint64_t n) { return (n & (0x3333333333333333ULL << 2)) >> 2; }
uint32_t rt4 (uint64_t n) { return (n >> 2) & 0x33333333; }
uint64_t left(uint64_t n) {
  return (n << 2) & (0xFFFFFFFFFFFFFFFFULL & ~0x3333333333333333ULL);
}

Comment 1 Richard Biener 2012-01-11 14:33:57 UTC

Most of the issues are still present on trunk.