This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: md description for intruction that modifies multiple operands
- From: fnf at intrinsity dot com (Fred Fish)
- To: Richard dot Earnshaw at arm dot com
- Cc: fnf at intrinsity dot com, echristo at redhat dot com (Eric Christopher),gcc at gcc dot gnu dot org
- Date: Fri, 30 May 2003 12:25:10 -0500 (CDT)
- Subject: Re: md description for intruction that modifies multiple operands
- Reply-to: fnf at intrinsity dot com
I do appreciate all the input I've gotten so far. It has helped to
clarify that the hardest part of this problem is the requirement for
sequential register allocation. Given that, perhaps we can simplify
how we use concatenated vector types.
> If that's the case, then you might be able to make something like the
> following work ...
I've read the docs on "subreg" and looked at some of the code
templates in the mips.md file and it does look promising, though I'm
not sure if the vec_select that we use does essentially the same thing
or not.
Perhaps it would be useful to more closely examine the specific case
of the block2 instruction, which is similar to block4, but we only
have to deal with two matrix_t operands. Here is a source code
example:
typedef int matrix_t __attribute__((__mode__(V16SI)));
matrix_t foo (matrix_t t0, matrix_t t1)
{
__BLOCK2_M (t0, t1);
return (t0);
}
When compiled with our current implementation, this produces the
rather ugly code below, even though optimization (-O2) has been used:
foo:
set.m.m $m2,$m0 # 10 fm_block2_concat/1 [length = 8]
set.m.m $m3,$m1
set.m.m $m1,$m3 # 34 movv32si_regreg [length = 8]
set.m.m $m0,$m2
block2.m $m0 # 11 fm_block2_internal [length = 4]
j $31 # 37 return [length = 4]
set.m.m $m0,$m0 # 24 fm_block2_split0/1 [length = 4]
My interpretation of what the compiler has done is:
(1) The vec_concat instruction allocates a V32SI type
and generates the code to copy the component parts into
the register pair m2/m3 allocated to hold the V32SI.
Perhaps this copying would go way if we could use subreg?
set.m.m $m2,$m0 # 10 fm_block2_concat/1 [length = 8]
set.m.m $m3,$m1
(2) Copy the V32SI to another V32SI as input to the block2.
Not sure why??
set.m.m $m1,$m3 # 34 movv32si_regreg [length = 8]
set.m.m $m0,$m2
(3) Expand code for the block2, leaving the output V32SI
in m0/m1
block2.m $m0 # 11 fm_block2_internal [length = 4]
(4) Split out the V16SI part of the V32SI that we want to
return to the caller. (Note this can be eliminated)
set.m.m $m0,$m0 # 24 fm_block2_split0/1 [length = 4]
Here are the relevant md file entries, starting with the define_expand
that is used to generate the initial RTL for the __BLOCK2_M builtin:
(define_expand "fm_block2"
[(set (match_dup:V32SI 2)
(vec_concat:V32SI (match_operand:V16SI 0 "move_operand" "")
(match_operand:V16SI 1 "move_operand" "")))
(set (match_dup:V32SI 2)
(unspec:V32SI [(match_dup:V32SI 2)] 281 ))
(set (match_dup:V16SI 0)
(vec_select:V16SI (match_dup:V32SI 2) (parallel [(const_int 0)])))
(set (match_dup:V16SI 1)
(vec_select:V16SI (match_dup:V32SI 2) (parallel [(const_int 1)])))]
"TARGET_FM"
"{ operands[2] = gen_reg_rtx (V32SImode); }")
(define_insn "fm_block2_concat"
[(set (match_operand:V32SI 0 "register_operand" "=&w,&w")
(vec_concat:V32SI (match_operand:V16SI 1 "move_operand" "v,m")
(match_operand:V16SI 2 "move_operand" "v,m")))]
"TARGET_FM"
"@
set.m.m\\t%H0,%1\;set.m.m\\t%I0,%2
load.m\\t%H0,%1\;load.m\\t%I0,%2"
[(set_attr "type" "fm")
(set_attr "length" "8,8")])
(define_insn "movv32si_regreg"
[(set (match_operand:V32SI 0 "register_operand" "=w")
(match_operand:V32SI 1 "register_operand" "w"))]
"TARGET_FM"
"set.m.m\\t%L0,%L1\;set.m.m\\t%M0,%M1"
[(set_attr "type" "fm")
(set_attr "length" "8")])
(define_insn "fm_block2_internal"
[(set (match_operand:V32SI 0 "register_operand" "=w" )
(unspec:V32SI [(match_operand:V32SI 1 "register_operand" "0")] 281))]
"TARGET_FM"
"block2.m\\t%0"
[(set_attr "type" "fm")])
(define_insn "fm_block2_split0"
[(set (match_operand:V16SI 0 "nonimmediate_operand" "=v,m")
(vec_select:V16SI (match_operand:V32SI 1 "register_operand" "w,w") (parallel [(const_int 0)])))]
"TARGET_FM"
"@
set.m.m\\t%0,%H1
store.m\\t%H1,%0"
[(set_attr "type" "fm")])
(define_insn "fm_block2_split1"
[(set (match_operand:V16SI 0 "nonimmediate_operand" "=v,m")
(vec_select:V16SI (match_operand:V32SI 1 "register_operand" "w,w") (parallel [(const_int 1)])))]
"TARGET_FM"
"@
set.m.m\\t%0,%I1
store.m\\t%I1,%0"
[(set_attr "type" "fm")])
Any suggestions on how to improve this implementation would be
appreciated. If using subreg can eliminate some of the explicit
packing/unpacking of V16SI and V32SI types that would be great.
-Fred