This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fold zero extensions into bit-wise ANDs


In article <Pine.LNX.4.33.0204151729250.26035-100000@www.eyesopen.com> you write:
>
>As an excellent counter example, consider
>
>long long foo(unsigned char bar)
>{
>  return bar & 0x24;
>}
>
>with the current mainline CVS we'd generate:
>
>foo:	movzbl	4(%esp), %ecx
>	xorl	%edx, %edx
>	andb	$36, %cl
>	movzbl	%cl, %eax
>	ret
>
>with my patch applied it now generates
>
>foo:	movzbl	4(%esp), %eax
>	andl	$36, %eax
>	cltd
>	ret
>
>which by my reckoning is shorter, faster and uses less registers.

Hmm, the sequence I'd _expect_ to be optimal would seem to be

	foo:
		movl	4(%esp),%eax
		xorl	%edx,%edx
		andl	$36, %eax
		ret

since:

 - there's no point in doing the movzbl when we'll clear it by hand, and
   a plain "movl" is smaller (and faster at least on some machines).
   This, of course, only works when you know the alignment of the data to
   be ok (which we know in this example due to it being an argument to
   the function, but maybe that's not the common case)

 - "xorl reg,reg" is certainly recommended by Intel over cltd, which is
   rather slow and also has a (unnecessary) data dependency.

The xorl is definitely preferred, the movl/movzbl thing is just a detail
that only works in some cases.

			Linus


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]