Re: x86_64 varargs setup jump table

> On 07/17/2010 07:25 AM, Bernd Schmidt wrote:
> >  	leaq	0(,%rax,4), %rcx
> >  	movl	$.L2, %eax
> >  	subq	%rcx, %rax
> >  	jmp	*%rax
> I've often thought this was over-engineering in the x86_64 abi.
> This jump table is trading memory bandwidth for unpredictability
> in the branch target.
> I've often wondered if we'd get better performance if we changed
> to a simple comparison against zero.  I.e.
> 	test	%al,%al
> 	jz	1f
> 	// 8 xmm stores
> 1:
> H.J., do you think you'd be able to measure performance on this?

THe orginal problem was the fact that early K8 chips had no way of effectively
storing SSE register to memory whithout knowing its type.  So the stores in
prologue executed very slow when reformating happent.  Same reason was
for not having callee saved/restored SSE regs.

On current chips this is not big issue, so I do not care what way we output.
In fact I used to have patch for doing the jz but lost it.  I think we might
keep supporting both to get some checking that ABI is not terribly broken
(i.e. that no other copmilers just feeds rax with random value, but always
by number of args).

> r~

