This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: md description for intruction that modifies multiple operands

From: fnf at intrinsity dot com (Fred Fish)
To: Richard dot Earnshaw at arm dot com
Cc: fnf at intrinsity dot com, echristo at redhat dot com (Eric Christopher),gcc at gcc dot gnu dot org
Date: Fri, 30 May 2003 12:25:10 -0500 (CDT)
Subject: Re: md description for intruction that modifies multiple operands
Reply-to: fnf at intrinsity dot com

I do appreciate all the input I've gotten so far.  It has helped to
clarify that the hardest part of this problem is the requirement for
sequential register allocation.  Given that, perhaps we can simplify
how we use concatenated vector types.

> If that's the case, then you might be able to make something like the 
> following work ...

I've read the docs on "subreg" and looked at some of the code
templates in the mips.md file and it does look promising, though I'm
not sure if the vec_select that we use does essentially the same thing
or not.

Perhaps it would be useful to more closely examine the specific case
of the block2 instruction, which is similar to block4, but we only
have to deal with two matrix_t operands.  Here is a source code
example:

	typedef int matrix_t __attribute__((__mode__(V16SI)));

	matrix_t foo (matrix_t t0, matrix_t t1)
	{
	  __BLOCK2_M (t0, t1);
	  return (t0);
	}

When compiled with our current implementation, this produces the
rather ugly code below, even though optimization (-O2) has been used:

    foo:
        set.m.m $m2,$m0    # 10   fm_block2_concat/1  [length = 8]
        set.m.m $m3,$m1
        set.m.m $m1,$m3    # 34   movv32si_regreg     [length = 8]
        set.m.m $m0,$m2
        block2.m $m0       # 11   fm_block2_internal  [length = 4]
        j       $31        # 37   return              [length = 4]
        set.m.m $m0,$m0    # 24   fm_block2_split0/1  [length = 4]

My interpretation of what the compiler has done is:

	(1) The vec_concat instruction allocates a V32SI type
	and generates the code to copy the component parts into
	the register pair m2/m3 allocated to hold the V32SI.
	Perhaps this copying would go way if we could use subreg?

          set.m.m $m2,$m0    # 10   fm_block2_concat/1  [length = 8]
          set.m.m $m3,$m1

	(2) Copy the V32SI to another V32SI as input to the block2.
	Not sure why??

          set.m.m $m1,$m3    # 34   movv32si_regreg     [length = 8]
          set.m.m $m0,$m2

	(3) Expand code for the block2, leaving the output V32SI
	in m0/m1

          block2.m $m0       # 11   fm_block2_internal  [length = 4]

	(4) Split out the V16SI part of the V32SI that we want to
	return to the caller.  (Note this can be eliminated)

          set.m.m $m0,$m0    # 24   fm_block2_split0/1  [length = 4]

Here are the relevant md file entries, starting with the define_expand
that is used to generate the initial RTL for the __BLOCK2_M builtin:

	(define_expand "fm_block2"
	  [(set (match_dup:V32SI 2)
	        (vec_concat:V32SI (match_operand:V16SI 0 "move_operand" "")
	                          (match_operand:V16SI 1 "move_operand" "")))
	   (set (match_dup:V32SI 2)
	        (unspec:V32SI [(match_dup:V32SI 2)] 281 ))
	   (set (match_dup:V16SI 0)
	        (vec_select:V16SI (match_dup:V32SI 2) (parallel [(const_int 0)])))
	   (set (match_dup:V16SI 1)
	        (vec_select:V16SI (match_dup:V32SI 2) (parallel [(const_int 1)])))]
	  "TARGET_FM"
	  "{ operands[2] = gen_reg_rtx (V32SImode); }")

	(define_insn "fm_block2_concat"
	  [(set (match_operand:V32SI 0 "register_operand" "=&w,&w")
	        (vec_concat:V32SI (match_operand:V16SI 1 "move_operand" "v,m")
	                          (match_operand:V16SI 2 "move_operand" "v,m")))]
	  "TARGET_FM"
	  "@
	   set.m.m\\t%H0,%1\;set.m.m\\t%I0,%2
	   load.m\\t%H0,%1\;load.m\\t%I0,%2"
	  [(set_attr "type" "fm")
	   (set_attr "length" "8,8")])
	
	(define_insn "movv32si_regreg"
	  [(set (match_operand:V32SI 0 "register_operand" "=w")
	        (match_operand:V32SI 1 "register_operand" "w"))]
	  "TARGET_FM"
	  "set.m.m\\t%L0,%L1\;set.m.m\\t%M0,%M1"
	  [(set_attr "type" "fm")
	   (set_attr "length" "8")])
	
	(define_insn "fm_block2_internal"
	  [(set (match_operand:V32SI 0 "register_operand" "=w" )
	        (unspec:V32SI [(match_operand:V32SI 1 "register_operand" "0")] 281))]
	  "TARGET_FM"
	  "block2.m\\t%0"
	  [(set_attr "type" "fm")])
	
	(define_insn "fm_block2_split0"
	  [(set (match_operand:V16SI 0 "nonimmediate_operand" "=v,m")
	        (vec_select:V16SI (match_operand:V32SI 1 "register_operand" "w,w") (parallel [(const_int 0)])))]
	  "TARGET_FM"
	  "@
	   set.m.m\\t%0,%H1
	   store.m\\t%H1,%0"
	  [(set_attr "type" "fm")])
	
	(define_insn "fm_block2_split1"
	  [(set (match_operand:V16SI 0 "nonimmediate_operand" "=v,m")
	        (vec_select:V16SI (match_operand:V32SI 1 "register_operand" "w,w") (parallel [(const_int 1)])))]
	  "TARGET_FM"
	  "@
	   set.m.m\\t%0,%I1
	   store.m\\t%I1,%0"
	  [(set_attr "type" "fm")])
	
Any suggestions on how to improve this implementation would be
appreciated.  If using subreg can eliminate some of the explicit
packing/unpacking of V16SI and V32SI types that would be great.

-Fred

References:
- Re: md description for intruction that modifies multiple operands
  - From: Richard Earnshaw

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]