This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] x86 peephole2s to optimize "1LL << x"
- From: Roger Sayle <roger at eyesopen dot com>
- To: Richard Henderson <rth at redhat dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Sun, 12 Sep 2004 07:06:17 -0600 (MDT)
- Subject: Re: [PATCH] x86 peephole2s to optimize "1LL << x"
On Sat, 11 Sep 2004, Richard Henderson wrote:
> Thanks. I've come up with a final sequence,
>
> movl %ecx, %edi
> shrl $5, %edi
> andl $1, %edi
> movl %edi, %esi
> sall %cl, %edi
> xorl $1, %esi
> sall %cl, %esi
>
> that I will use when the registers chosen aren't appropriate for setcc.
> It does use one more shift, but given the disparity between
>
> > a 26.98s 2.98s 13.1% 2*setcc+2*shift
> > b 30.03s 6.03s 26.6% push+shift+2*cmov
>
> I'm betting it's still a win. You could run it through your
> test harness if you like (using eax/edx, not esi/edi).
Indeed, the final method above "f", times fractionally faster than "a",
but it's probably within the noise. The median time is "29.96s". As
you notice above, the problem is that its not only shift and cmov that
are relatively slow on the P4, but also setcc!
The "Agner Fog" table of timings I have for the P4 show:
shift by immediate 4 cycles
shift by %cl 6 cycles
conditional move 6 cycles
setcc 5 cycles
I think these need to be taken as a guideline, as the experimental
evidence confirms that 2*setcc + 2*shift < 2*cmov + shift, but clearly
replacing these with simple logical operands and moves is a good thing.
Many thanks again for your efforts with this.
Roger
--