This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: moving v16sf reg with multiple sub-regs

From: "Dylan Cuthbert" <dylan at q-games dot com>
To: gcc at gcc dot gnu dot org
Date: Tue, 22 Feb 2005 01:31:23 +0900
Subject: Re: moving v16sf reg with multiple sub-regs
References: <cv7gku$5qu$1@sea.gmane.org>

Further investigation.

If I remove the define_expand for movv16sf and throw in a dummy define_insn that supports reg<->reg mem<->reg reg<->mem, then the redundant move is optimized away.

But of course, the store load and move all use 4 instructions each so this produces inefficient code.

Any idea how I can get the same removal of redundant temporaries and still get the multiple instructions for each operation interspersed nicely?

Dylan

"Dylan Cuthbert" <dylan@q-games.com> wrote in message cv7gku$5qu$1@sea.gmane.org">news:cv7gku$5qu$1@sea.gmane.org...

Hi there,

I have implemented a move of a v16sf type like this because it is held by 4 v4sf registers:

--- snip ---
(define_expand "movv16sf"
 [(set (match_operand:V16SF 0 "nonimmediate_operand" "")
(match_operand:V16SF 1 "general_operand" ""))]
 ""
 "  if ((reload_in_progress | reload_completed) == 0
     && !register_operand (operands[0], V16SFmode)
     && !nonmemory_operand (operands[1], V16SFmode))
   operands[1] = force_reg (V16SFmode, operands[1]);
move_v16sf( operands );
DONE;
")
--- end snip ---

and in the config's .c file:

--- snip ---
void
move_v16sf (operands )
    rtx operands[];
{
 rtx op0 = operands[0];
 rtx op1 = operands[1];
 enum rtx_code code0 = GET_CODE (operands[0]);
 enum rtx_code code1 = GET_CODE (operands[1]);
 int subreg_offset0 = 0;
 int subreg_offset1 = 0;
 enum delay_type delay = DELAY_NONE;
 if (code0 == REG)
   {
     int regno0 = REGNO (op0) + subreg_offset0;
     if (code1 == REG)
{
  int regno1 = REGNO (op1) + subreg_offset1;
  /* Just in case, don't do anything for assigning a register
     to itself, unless we are filling a delay slot.  */
  if (regno0 == regno1 && set_nomacro == 0) return;
emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 0 ), gen_rtx_SUBREG( V4SFmode, op1, 0 ) ); emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 16 ), gen_rtx_SUBREG( V4SFmode, op1, 16 ) ); emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 32 ), gen_rtx_SUBREG( V4SFmode, op1, 32 ) ); emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 48 ), gen_rtx_SUBREG( V4SFmode, op1, 48 ) ); } else if (code1 == MEM) { rtx src_reg;

src_reg = copy_addr_to_reg ( XEXP (op1,0) );

emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 0 ), gen_rtx_MEM( V4SFmode, src_reg ) ); emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 16 ), gen_rtx_MEM( V4SFmode, plus_constant( src_reg, 16 ) ) ); emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 32 ), gen_rtx_MEM( V4SFmode, plus_constant( src_reg, 32 ) ) ); emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 48 ), gen_rtx_MEM( V4SFmode, plus_constant( src_reg, 48 ) ) ); }

}
 else if (code0 == MEM)
   {
     if (code1 == REG)
{
  rtx dest_reg;
dest_reg = copy_addr_to_reg ( XEXP (op0,0) );

emit_move_insn( gen_rtx_MEM( V4SFmode, dest_reg ), gen_rtx_SUBREG (V4SFmode, op1, 0 ) ); emit_move_insn( gen_rtx_MEM( V4SFmode, plus_constant( dest_reg, 16) ), gen_rtx_SUBREG (V4SFmode, op1, 16 ) ); emit_move_insn( gen_rtx_MEM( V4SFmode, plus_constant( dest_reg, 32) ), gen_rtx_SUBREG (V4SFmode, op1, 32 ) ); emit_move_insn( gen_rtx_MEM( V4SFmode, plus_constant( dest_reg, 48) ), gen_rtx_SUBREG (V4SFmode, op1, 48 ) ); } }
}
--- end snip ---
This works ok, but it produces inefficient code, here some sample source code:

--- snip ---
typedef int v4 __attribute__((mode(V4SF)));
typedef int m4 __attribute__((mode(V16SF)));
v4 vec1, vec2;
m4 frog;
int main( int argc, char* argv[] )
{
m4 blob;
asm( "some_instruction %0,%1,%2,%3" : "=&j" (blob): "j" (vec1), "j" (vec2), "j" (frog) ); asm( "some_instruction2 %0,%1" : "=&j" (frog) : "j" (blob) );
return 0;
}
--- end snip ---

where j is the register class for v4sf and v16sf types. This produces a move of the v16sf type between the two asm instructions, when it doesn't need to, does anyone have any ideas why this move isn't eliminated?
#APP
       some_instruction r10,r22,r20,r00
#NO_APP
       move r00,r10
       move r01,r11
       move r02,r12
       move r03,r13
#APP
       some_instruction2 r10, r00
r10 isn't needed to be preserved (it isn't written out) but it seems to be making a copy anyway. Worse, if "blob" is defined in global space like "frog", then it also writes out r10 to memory when it shouldn't.

Any ideas appreciated.

Regards

References:
- moving v16sf reg with multiple sub-regs
  - From: Dylan Cuthbert

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]