Alpha CPU-specific builtins

Tue Jun 4 14:40:00 GMT 2002

Richard Henderson <rth@redhat.com> writes:

> I have need of some builtins myself, so I went ahead and 
> finished up your current patch.

Cool, thanks a lot. I guess this means you will have to fix any
further bugs ;)

I noticed that minsb8(0, a) generates clr t0; minsb8 t0,a0,v0, while
minsb8(a, 0) generates minsb8 a0,zero,v0, which is weird, because I
can't find any unsymmetry between the two parameters...

>   * extql extqh mind endianness.

Hmm, just out of curiosity, was there ever any big endian Alpha
system, or is all this purely precautional?

> + void
> + alpha_expand_builtin_vector_binop (gen, mode, op0, op1, op2)
> +      rtx (*gen) PARAMS ((rtx, rtx, rtx));
> +      enum machine_mode mode;
> +      rtx op0, op1, op2;
> + {
> +   op0 = gen_lowpart (mode, op0);
> + 
> +   if (op1 == const0_rtx)
> +     op1 = CONST0_RTX (mode);
> +   else
> +     op1 = gen_lowpart (mode, op1);
> +   if (op1 == const0_rtx)

            ^ I think you really want 2 here...

> +     op2 = CONST0_RTX (mode);
> +   else
> +     op2 = gen_lowpart (mode, op2);
> + 
> +   emit_insn ((*gen) (op0, op1, op2));
> + }

Also, somehow the scheduling looks worse, though that might be because
of unrelated changes... for my test function, it used to be like this
(pca56):

ldl     a4,4(a1)	1
ldl     t2,0(a1)	1
ldq     t3,0(a0)	2
ldq     t10,8(a0)	2

unpkbw  t2,a5		3
subl    t5,0x1,t5	3
unpkbw  a4,t9		4
and     t3,t6,t0	4

and     t10,t6,t8	5
addq    t0,a5,v0	5
and     t3,t4,a3	6
addq    t8,t9,a4	6

and     t10,t4,a5	7
xor     v0,a3,t1	7
xor     a4,a5,a3	8
lda     a0,16(a0)	8

maxsw4  t1,0,at		9
maxsw4  a3,0,t3		10
minsw4  at,t7,t12	11
minsw4  t3,t7,t0	12

pkwb    t12,t11		13
pkwb    t0,v0		14
stl     t11,0(a1)	15
nop			15

stl     v0,4(a1)	16
addq    a1,a2,a1	16
bne     t5,20 <add_pixels_clamped+0x20> 17
ret

and now it is:

ldl     t2,0(a1)	1
ldq     t3,0(a0)	1
ldl     at,4(a1)	2
ldq     t8,8(a0)	2

unpkbw  t2,v0		3
and     t3,t6,t1	3
and     t3,t5,t12	4
and     t8,t5,a4	4

addq    t1,v0,t11	5
subl    t4,0x1,t4	5
xor     t11,t12,t10	6
lda     a0,16(a0)	6

maxsw4  t10,zero,t9	7
unpkbw  at,a5		8
minsw4  t9,t7,t2	9
pkwb    t2,t0		11

stl     t0,0(a1)	13
and     t8,t6,t0	13
addq    t0,a5,a3	14
xor     a3,a4,t3	15

maxsw4  t3,zero,v0	16
minsw4  v0,t7,t2	18
pkwb    t2,t0		20
nop			20

stl     t0,4(a1)	22
addq    a1,a2,a1	22
bne     t4,20 <add_pixels_clamped+0x20> 23
ret

(or perhaps my counting is wrong?)

Anyway, thanks for your work :)

-- 
	Falk