Alpha CPU-specific builtins
Falk Hueffner
falk.hueffner@student.uni-tuebingen.de
Tue Jun 4 14:40:00 GMT 2002
Richard Henderson <rth@redhat.com> writes:
> I have need of some builtins myself, so I went ahead and
> finished up your current patch.
Cool, thanks a lot. I guess this means you will have to fix any
further bugs ;)
I noticed that minsb8(0, a) generates clr t0; minsb8 t0,a0,v0, while
minsb8(a, 0) generates minsb8 a0,zero,v0, which is weird, because I
can't find any unsymmetry between the two parameters...
> * extql extqh mind endianness.
Hmm, just out of curiosity, was there ever any big endian Alpha
system, or is all this purely precautional?
> + void
> + alpha_expand_builtin_vector_binop (gen, mode, op0, op1, op2)
> + rtx (*gen) PARAMS ((rtx, rtx, rtx));
> + enum machine_mode mode;
> + rtx op0, op1, op2;
> + {
> + op0 = gen_lowpart (mode, op0);
> +
> + if (op1 == const0_rtx)
> + op1 = CONST0_RTX (mode);
> + else
> + op1 = gen_lowpart (mode, op1);
> + if (op1 == const0_rtx)
^ I think you really want 2 here...
> + op2 = CONST0_RTX (mode);
> + else
> + op2 = gen_lowpart (mode, op2);
> +
> + emit_insn ((*gen) (op0, op1, op2));
> + }
Also, somehow the scheduling looks worse, though that might be because
of unrelated changes... for my test function, it used to be like this
(pca56):
ldl a4,4(a1) 1
ldl t2,0(a1) 1
ldq t3,0(a0) 2
ldq t10,8(a0) 2
unpkbw t2,a5 3
subl t5,0x1,t5 3
unpkbw a4,t9 4
and t3,t6,t0 4
and t10,t6,t8 5
addq t0,a5,v0 5
and t3,t4,a3 6
addq t8,t9,a4 6
and t10,t4,a5 7
xor v0,a3,t1 7
xor a4,a5,a3 8
lda a0,16(a0) 8
maxsw4 t1,0,at 9
maxsw4 a3,0,t3 10
minsw4 at,t7,t12 11
minsw4 t3,t7,t0 12
pkwb t12,t11 13
pkwb t0,v0 14
stl t11,0(a1) 15
nop 15
stl v0,4(a1) 16
addq a1,a2,a1 16
bne t5,20 <add_pixels_clamped+0x20> 17
ret
and now it is:
ldl t2,0(a1) 1
ldq t3,0(a0) 1
ldl at,4(a1) 2
ldq t8,8(a0) 2
unpkbw t2,v0 3
and t3,t6,t1 3
and t3,t5,t12 4
and t8,t5,a4 4
addq t1,v0,t11 5
subl t4,0x1,t4 5
xor t11,t12,t10 6
lda a0,16(a0) 6
maxsw4 t10,zero,t9 7
unpkbw at,a5 8
minsw4 t9,t7,t2 9
pkwb t2,t0 11
stl t0,0(a1) 13
and t8,t6,t0 13
addq t0,a5,a3 14
xor a3,a4,t3 15
maxsw4 t3,zero,v0 16
minsw4 v0,t7,t2 18
pkwb t2,t0 20
nop 20
stl t0,4(a1) 22
addq a1,a2,a1 22
bne t4,20 <add_pixels_clamped+0x20> 23
ret
(or perhaps my counting is wrong?)
Anyway, thanks for your work :)
--
Falk
More information about the Gcc-patches
mailing list