This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] x86 peephole2s to optimize "1LL << x"


On Sat, 11 Sep 2004, Richard Henderson wrote:
> Thanks.  I've come up with a final sequence,
>
>         movl    %ecx, %edi
>         shrl    $5, %edi
>         andl    $1, %edi
>         movl    %edi, %esi
>         sall    %cl, %edi
>         xorl    $1, %esi
>         sall    %cl, %esi
>
> that I will use when the registers chosen aren't appropriate for setcc.
> It does use one more shift, but given the disparity between
>
> > a	26.98s	 2.98s		 13.1%		2*setcc+2*shift
> > b	30.03s	 6.03s		 26.6%		push+shift+2*cmov
>
> I'm betting it's still a win.  You could run it through your
> test harness if you like (using eax/edx, not esi/edi).

Indeed, the final method above "f", times fractionally faster than "a",
but it's probably within the noise.  The median time is "29.96s".  As
you notice above, the problem is that its not only shift and cmov that
are relatively slow on the P4, but also setcc!

The "Agner Fog" table of timings I have for the P4 show:

	shift by immediate	4 cycles
	shift by %cl		6 cycles
	conditional move	6 cycles
	setcc			5 cycles

I think these need to be taken as a guideline, as the experimental
evidence confirms that 2*setcc + 2*shift < 2*cmov + shift, but clearly
replacing these with simple logical operands and moves is a good thing.


Many thanks again for your efforts with this.

Roger
--


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]