This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: md description for intruction that modifies multiple operands
- From: Richard Earnshaw <rearnsha at arm dot com>
- To: fnf at intrinsity dot com
- Cc: Richard dot Earnshaw at arm dot com, echristo at redhat dot com (Eric Christopher), gcc at gcc dot gnu dot org
- Date: Fri, 30 May 2003 10:23:54 +0100
- Subject: Re: md description for intruction that modifies multiple operands
- Organization: ARM Ltd.
- Reply-to: Richard dot Earnshaw at arm dot com
> > > (set (match_operand:V16SI 4 "register_operand" "=v")
> > > (unspec:V16SI [(match_dup 1) (match_dup 3) (match_dup 7)] 462))
> > > (set (match_operand:V16SI 6 "register_operand" "=v")
> > > (unspec:V16SI [(match_dup 1) (match_dup 3) (match_dup 7)] 463))]
>
> BTW, after seeing the new RTL, I realized that the last "set" above
> should probably be:
>
> (set (match_operand:V16SI 6 "register_operand" "=v")
> (unspec:V16SI [(match_dup 1) (match_dup 3) (match_dup 5)] 463))]
>
> Note the change from 7 -> 5. Hopefully this is correct.
>
> > This looks like an expansion problem. How are you calling
> > gen_fm_block4()? You need to pass 8 arguments to it now, something like
> >
> > gen_fm_block4(t0, t0, t1, t1, t2, t2, t3, t3);
>
> That was the problem. I fixed it and the generated code for the example
> is now:
>
> foo:
> j $31
> block4.m $m0,$m1,$m2,$m3
>
> which is completely optimal. The function args are passed in m0
> through m3, the block4 is called with them in the right order, and the
> function returns with the result left in m0.
>
> However, I'm not clear on whether or not the template guarantees that
> the register allocation will be sequential. I suspect not. So we may
> still have the problem of training the register allocator to ensure
> that the operands to the block4.m instruction are always some
> sequential set of four registers out of the possible 16 (m0-m15).
There's no way to do this, unfortunately. ARM has a similar problem with
the load-multiple operations. In that case a bit set of registers to load
is encoded in the instruction and the marked registers are filled
sequentially from memory from the lowest numbered register at the lowest
address. We work around this by using specific hard registers for that
pattern and then using peepholes for spotting a few cases
opportunistically. Take a look at the movstrqi pattern in arm.md if you
want some ideas.
>
> I won't even try to think yet about the block4v instruction, which
> requires a set like {m0,m4,m8,m12} or {m1,m5,m9,m13}. :-(
>
Equally impossible for the same reasons, and maybe more ;-(
R.