This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] x86 peephole2s to optimize "1LL << x"

From: Richard Henderson <rth at redhat dot com>
To: Roger Sayle <roger at eyesopen dot com>
Cc: gcc-patches at gcc dot gnu dot org
Date: Sat, 11 Sep 2004 23:04:00 -0700
Subject: Re: [PATCH] x86 peephole2s to optimize "1LL << x"
References: <20040912003819.GA7127@redhat.com> <Pine.LNX.4.44.0409112030120.32322-100000@www.eyesopen.com>

On Sat, Sep 11, 2004 at 09:14:08PM -0600, Roger Sayle wrote:
> But the real bottom line (in my mind) is that implementating your method
> (A), 2*setcc + 2*shift in the x86 backend results in an "1LL << x"
> implementation an astounding seven times faster than current mainline.

Thanks.  I've come up with a final sequence,

        movl    %ecx, %edi
        shrl    $5, %edi
        andl    $1, %edi
        movl    %edi, %esi
        sall    %cl, %edi
        xorl    $1, %esi
        sall    %cl, %esi

that I will use when the registers chosen aren't appropriate for setcc.
It does use one more shift, but given the disparity between 

> a	26.98s	 2.98s		 13.1%		2*setcc+2*shift
> b	30.03s	 6.03s		 26.6%		push+shift+2*cmov

I'm betting it's still a win.  You could run it through your
test harness if you like (using eax/edx, not esi/edi).

I'm doing one more bootstrap and test before checking this in.

r~

Follow-Ups:
- Re: [PATCH] x86 peephole2s to optimize "1LL << x"
  - From: Roger Sayle

References:
- Re: [PATCH] x86 peephole2s to optimize "1LL << x"
  - From: Richard Henderson
- Re: [PATCH] x86 peephole2s to optimize "1LL << x"
  - From: Roger Sayle

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]