This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

[patch] h8300.c: Improve shifts.


Hi,

Attached is a patch to improve shifts by non-constant amounts, such as
n << m, on the H8/300 target.

For such shifts, the original gcc uses a scratch register as a loop
counter to complete the required amount of shifts.

This does not work well when the variable holding the shift amount has
no use other than specifying the shift amounts.  We could simply use
that variable as a loop counter instead of relying on a scratch
register.

The patch addresses this issue by generating RTL code for shifts by
non-constant amounts.  The original code does pretty much everything
at the insn emit time, preventing optimizations on the register
allocation.

This improvement is family independent as all the ingredients emitted
by expand_a_shift () is also family independent.  That is, the code
works on H8/300, H8/300H, and H8/S.

I have a couple of questions regarding the patch.

1) Do I need to emit NOTE_INSN_LOOP_BEG and NOTE_INSN_LOOP_END as
   appropriate?  Please let me know if this is the case.

2) The patch emits 'ble' (branch if less than or equal to) for the
   first conditional branch just like the original code.  I chose
   'signed' as the 6th argument of emit_cmp_and_insns ().  Do I need
   to reflect the actual signedness of the variable holding the shift
   amount?  Please let me know if this is a problem.

Here is a code example.  dummy1 and dummy2 are used to keep registers
busy.  We can see that the unpatched version is suffering from the
shortage of registers.

/* h8300-hms-gcc -ms -Wall -O2 -fomit-frame-pointer */

int global1;
int global2;

unsigned int
test_gcc (int n, int m, int dummy1, int dummy2)
{
  n >>= m;
  global1 = dummy1;
  global2 = dummy2;
  return n;
}

/* Without the patch:
_test_gcc:
	push.l	er4          <- This will be gone with the patch
	mov.w	@(10,er7),r3
	mov.b	r1l,r4l      <- We could use r1l as the loop counter.
	ble	.Lle1
.Llt1:
	shar.w	r0
	add	#0xff,r4l
	bne	.Llt1
.Lle1:
	mov.w	r2,@_global1
	mov.w	r3,@_global2
	pop.l	er4          <- This will be gone with the patch
	rts
*/

/* With the patch:
_test_gcc:
	mov.b	r1l,r1l
	ble	.L4
.L3:
	shar	r0h
	rotxr	r0l
	add.b	#-1,r1l
	bne	.L3
.L4:
	mov.w	r2,@_global1
	mov.w	@(2,r7),r2
	mov.w	r2,@_global2
	rts
*/

One word about the patch.  As I removed support for shifts by
non-constant shifts from emit_a_shift, I reduced the indentation
nesting level of some part of the function by calling abort () for
shifts by non-constant shifts.  This why the patch for emit_a_shift ()
is so big.

Thanks,

Kazu Hirata

2000-08-07  Kazu Hirata  <kazu@hxi.com>

	* config/h8300/h8300.c (expand_a_shift): Emit RTL code for shifts
	by non-constant amounts.
	(emit_a_shift): Remove support for shifts by non-constant amounts.

	* config/h8300/h8300.md: Fix comments regarding the shift
	algorithm.

============================================================

===File /kazu@isuzu.hxi.com:/home/kazu/gnu/gcc/h8300.patch===
Index: h8300.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/config/h8300/h8300.c,v
retrieving revision 1.37
diff -c -3 -p -r1.37 h8300.c
*** h8300.c	2000/08/06 18:03:58	1.37
--- h8300.c	2000/08/07 06:01:58
*************** expand_a_shift (mode, code, operands)
*** 1734,1752 ****
       int code;
       rtx operands[];
  {
!   emit_move_insn (operands[0], operands[1]);
  
!   /* Need a loop to get all the bits we want  - we generate the
!      code at emit time, but need to allocate a scratch reg now.  */
  
!   emit_insn (gen_rtx_PARALLEL
! 	     (VOIDmode,
! 	      gen_rtvec (2,
! 			 gen_rtx_SET (VOIDmode, operands[0],
! 				      gen_rtx (code, mode, operands[0],
! 					       operands[2])),
! 			 gen_rtx_CLOBBER (VOIDmode,
! 					  gen_rtx_SCRATCH (QImode)))));
  
    return 1;
  }
--- 1734,1802 ----
       int code;
       rtx operands[];
  {
!   rtx dst = operands[0];
!   rtx src = operands[1];
!   rtx shift_amount = operands[2];
! 
!   /* We shift in place.  */
!   emit_move_insn (dst, src);
! 
!   /* If the shift amount is not a constant, we must loop.  We rely on
!      emit_a_shift () just for a shift by one bit.  */
!   if (GET_CODE (shift_amount) != CONST_INT)
!     {
!       rtx counter = gen_reg_rtx (QImode);
!       rtx start_label = gen_label_rtx ();
!       rtx end_label = gen_label_rtx ();
!       rtx tmp;
! 
!       /* If the shift amount is less than or equal to 0,
! 	 we go out of the loop.  */
!       emit_cmp_and_jump_insns (shift_amount, GEN_INT (0),
! 			       LE, NULL_RTX, QImode, 0, 0, end_label);
! 
!       /* Initialize the loop counter.  */
!       emit_move_insn (counter, shift_amount);
! 
!       emit_label (start_label);
! 
!       /* Shift by one bit.  */
!       emit_insn (gen_rtx_PARALLEL
! 		 (VOIDmode,
! 		  gen_rtvec (2,
! 			     gen_rtx_SET (VOIDmode, dst,
! 					  gen_rtx (code, mode, dst,
! 						   GEN_INT (1))),
! 			     gen_rtx_CLOBBER (VOIDmode,
! 					      gen_rtx_SCRATCH (QImode)))));
! 
!       /* Decrement the counter by 1.  */
!       tmp = gen_rtx_PLUS (QImode, counter, GEN_INT (-1));
!       emit_insn (gen_rtx_SET (VOIDmode, counter, tmp));
! 
!       /* If the loop counter is non-zero, we go back to the beginning
!          of the loop.  */
!       emit_cmp_and_jump_insns (counter, GEN_INT (0),
! 			       NE, NULL_RTX, QImode, 1, 0, start_label);
  
!       emit_label (end_label);
!     }
! 
!   /* The shift amount is a constant.  */
!   else
!     {
!       /* Need a loop to get all the bits we want - we generate the
! 	 code at emit time, but need to allocate a scratch reg now.  */
  
!       emit_insn (gen_rtx_PARALLEL
! 		 (VOIDmode,
! 		  gen_rtvec (2,
! 			     gen_rtx_SET (VOIDmode, dst,
! 					  gen_rtx (code, mode, dst,
! 						   shift_amount)),
! 			     gen_rtx_CLOBBER (VOIDmode,
! 					      gen_rtx_SCRATCH (QImode)))));
!     }
  
    return 1;
  }
*************** get_shift_alg (cpu, shift_type, mode, co
*** 2459,2465 ****
    return SHIFT_LOOP;
  }
  
! /* Emit the assembler code for doing shifts.  */
  
  const char *
  emit_a_shift (insn, operands)
--- 2509,2517 ----
    return SHIFT_LOOP;
  }
  
! /* Emit the assembler code for doing shifts.  We only accept a shift
!    by a constant amount.  A shift by a non-constant amount is taken
!    care of by expand_a_shift () at the RTL generation time.  */
  
  const char *
  emit_a_shift (insn, operands)
*************** emit_a_shift (insn, operands)
*** 2475,2480 ****
--- 2527,2534 ----
    enum rtx_code code = GET_CODE (shift);
    enum shift_type shift_type;
    enum shift_mode shift_mode;
+   int n = INTVAL (operands[2]);
+   enum shift_alg alg;
  
    loopend_lab++;
  
*************** emit_a_shift (insn, operands)
*** 2508,2664 ****
        abort ();
      }
  
    if (GET_CODE (operands[2]) != CONST_INT)
!     {
!       /* Indexing by reg, so have to loop and test at top.  */
!       output_asm_insn ("mov.b	%X2,%X4", operands);
!       fprintf (asm_out_file, "\tble	.Lle%d\n", loopend_lab);
! 
!       /* Get the assembler code to do one shift.  */
!       get_shift_alg (cpu_type, shift_type, mode, 1, &assembler,
! 		     &assembler2, &cc_valid);
  
!       fprintf (asm_out_file, ".Llt%d:\n", loopend_lab);
!       output_asm_insn (assembler, operands);
!       output_asm_insn ("add	#0xff,%X4", operands);
!       fprintf (asm_out_file, "\tbne	.Llt%d\n", loopend_lab);
!       fprintf (asm_out_file, ".Lle%d:\n", loopend_lab);
  
!       return "";
!     }
!   else
!     {
!       int n = INTVAL (operands[2]);
!       enum shift_alg alg;
  
!       /* If the count is negative, make it 0.  */
!       if (n < 0)
! 	n = 0;
!       /* If the count is too big, truncate it.
!          ANSI says shifts of GET_MODE_BITSIZE are undefined - we choose to
! 	 do the intuitive thing.  */
!       else if (n > GET_MODE_BITSIZE (mode))
! 	n = GET_MODE_BITSIZE (mode);
  
!       alg = get_shift_alg (cpu_type, shift_type, mode, n, &assembler,
! 			   &assembler2, &cc_valid);
  
!       switch (alg)
  	{
! 	case SHIFT_INLINE:
! 	  /* Emit two bit shifts first.  */
! 	  while (n > 1 && assembler2 != NULL)
! 	    {
! 	      output_asm_insn (assembler2, operands);
! 	      n -= 2;
! 	    }
  
! 	  /* Now emit one bit shifts for any residual.  */
! 	  while (n > 0)
! 	    {
! 	      output_asm_insn (assembler, operands);
! 	      n -= 1;
! 	    }
  
! 	  /* Keep track of CC.  */
! 	  if (cc_valid)
! 	    {
! 	      cc_status.value1 = operands[0];
! 	      cc_status.flags |= cc_valid;
! 	    }
! 	  return "";
  
! 	case SHIFT_ROT_AND:
  	  {
! 	    int m = GET_MODE_BITSIZE (mode) - n;
! 	    int mask = (shift_type == SHIFT_ASHIFT
! 			? ((1 << (GET_MODE_BITSIZE (mode) - n)) - 1) << n
! 			: (1 << (GET_MODE_BITSIZE (mode) - n)) - 1);
! 	    char insn_buf[200];
! 
! 	    /* Not all possibilities of rotate are supported.  They shouldn't
! 	       be generated, but let's watch for 'em.  */
! 	    if (assembler == 0)
! 	      abort ();
! 
! 	    /* Emit two bit rotates first.  */
! 	    while (m > 1 && assembler2 != NULL)
! 	      {
! 		output_asm_insn (assembler2, operands);
! 		m -= 2;
! 	      }
  
! 	    /* Now single bit rotates for any residual.  */
! 	    while (m > 0)
! 	      {
! 		output_asm_insn (assembler, operands);
! 		m -= 1;
! 	      }
  
! 	    /* Now mask off the high bits.  */
! 	    if (TARGET_H8300)
! 	      {
! 		switch (mode)
! 		  {
! 		  case QImode:
! 		    sprintf (insn_buf, "and #%d,%%X0", mask);
! 		    cc_status.value1 = operands[0];
! 		    cc_status.flags |= CC_NO_CARRY;
! 		    break;
! 		  case HImode:
! 		    sprintf (insn_buf, "and #%d,%%s0\n\tand #%d,%%t0",
! 			     mask & 255, mask >> 8);
! 		    break;
! 		  case SImode:
! 		    abort ();
! 		  default:
! 		    break;
! 		  }
! 	      }
! 	    else
  	      {
! 		sprintf (insn_buf, "and.%c #%d,%%%c0",
! 			 "bwl"[shift_mode], mask,
! 			 mode == QImode ? 'X' : mode == HImode ? 'T' : 'S');
  		cc_status.value1 = operands[0];
  		cc_status.flags |= CC_NO_CARRY;
  	      }
- 	    output_asm_insn (insn_buf, operands);
- 	    return "";
  	  }
! 
! 	case SHIFT_SPECIAL:
! 	  output_asm_insn (assembler, operands);
! 	  return "";
  
! 	case SHIFT_LOOP:
! 	  /* A loop to shift by a "large" constant value.
! 	     If we have shift-by-2 insns, use them.  */
! 	  if (assembler2 != NULL)
! 	    {
! 	      fprintf (asm_out_file, "\tmov.b	#%d,%sl\n", n / 2,
! 		       names_big[REGNO (operands[4])]);
! 	      fprintf (asm_out_file, ".Llt%d:\n", loopend_lab);
! 	      output_asm_insn (assembler2, operands);
! 	      output_asm_insn ("add	#0xff,%X4", operands);
! 	      fprintf (asm_out_file, "\tbne	.Llt%d\n", loopend_lab);
! 	      if (n % 2)
! 		output_asm_insn (assembler, operands);
! 	    }
! 	  else
! 	    {
! 	      fprintf (asm_out_file, "\tmov.b	#%d,%sl\n", n,
! 		       names_big[REGNO (operands[4])]);
! 	      fprintf (asm_out_file, ".Llt%d:\n", loopend_lab);
! 	      output_asm_insn (assembler, operands);
! 	      output_asm_insn ("add	#0xff,%X4", operands);
! 	      fprintf (asm_out_file, "\tbne	.Llt%d\n", loopend_lab);
! 	    }
! 	  return "";
  
! 	default:
! 	  abort ();
  	}
      }
  }
  
--- 2562,2699 ----
        abort ();
      }
  
+   /* We don't accept a non-constant amount.  */
    if (GET_CODE (operands[2]) != CONST_INT)
!     abort ();
  
!   /* If the count is negative, make it 0.  */
!   if (n < 0)
!     n = 0;
  
!   /* If the count is too big, truncate it.
!      ANSI says shifts of GET_MODE_BITSIZE are undefined - we choose to
!      do the intuitive thing.  */
!   else if (n > GET_MODE_BITSIZE (mode))
!     n = GET_MODE_BITSIZE (mode);
  
!   alg = get_shift_alg (cpu_type, shift_type, mode, n, &assembler,
! 		       &assembler2, &cc_valid);
  
!   switch (alg)
!     {
!     case SHIFT_INLINE:
!       /* Emit two bit shifts first.  */
!       while (n > 1 && assembler2 != NULL)
! 	{
! 	  output_asm_insn (assembler2, operands);
! 	  n -= 2;
! 	}
  
!       /* Now emit one bit shifts for any residual.  */
!       while (n > 0)
  	{
! 	  output_asm_insn (assembler, operands);
! 	  n -= 1;
! 	}
  
!       /* Keep track of CC.  */
!       if (cc_valid)
! 	{
! 	  cc_status.value1 = operands[0];
! 	  cc_status.flags |= cc_valid;
! 	}
!       return "";
  
!     case SHIFT_ROT_AND:
!       {
! 	int m = GET_MODE_BITSIZE (mode) - n;
! 	int mask = (shift_type == SHIFT_ASHIFT
! 		    ? ((1 << (GET_MODE_BITSIZE (mode) - n)) - 1) << n
! 		    : (1 << (GET_MODE_BITSIZE (mode) - n)) - 1);
! 	char insn_buf[200];
! 
! 	/* Not all possibilities of rotate are supported.  They shouldn't
! 	   be generated, but let's watch for 'em.  */
! 	if (assembler == 0)
! 	  abort ();
  
! 	/* Emit two bit rotates first.  */
! 	while (m > 1 && assembler2 != NULL)
  	  {
! 	    output_asm_insn (assembler2, operands);
! 	    m -= 2;
! 	  }
  
! 	/* Now single bit rotates for any residual.  */
! 	while (m > 0)
! 	  {
! 	    output_asm_insn (assembler, operands);
! 	    m -= 1;
! 	  }
  
! 	/* Now mask off the high bits.  */
! 	if (TARGET_H8300)
! 	  {
! 	    switch (mode)
  	      {
! 	      case QImode:
! 		sprintf (insn_buf, "and #%d,%%X0", mask);
  		cc_status.value1 = operands[0];
  		cc_status.flags |= CC_NO_CARRY;
+ 		break;
+ 	      case HImode:
+ 		sprintf (insn_buf, "and #%d,%%s0\n\tand #%d,%%t0",
+ 			 mask & 255, mask >> 8);
+ 		break;
+ 	      case SImode:
+ 		abort ();
+ 	      default:
+ 		break;
  	      }
  	  }
! 	else
! 	  {
! 	    sprintf (insn_buf, "and.%c #%d,%%%c0",
! 		     "bwl"[shift_mode], mask,
! 		     mode == QImode ? 'X' : mode == HImode ? 'T' : 'S');
! 	    cc_status.value1 = operands[0];
! 	    cc_status.flags |= CC_NO_CARRY;
! 	  }
! 	output_asm_insn (insn_buf, operands);
! 	return "";
!       }
  
!     case SHIFT_SPECIAL:
!       output_asm_insn (assembler, operands);
!       return "";
  
!     case SHIFT_LOOP:
!       /* A loop to shift by a "large" constant value.
! 	 If we have shift-by-2 insns, use them.  */
!       if (assembler2 != NULL)
! 	{
! 	  fprintf (asm_out_file, "\tmov.b	#%d,%sl\n", n / 2,
! 		   names_big[REGNO (operands[4])]);
! 	  fprintf (asm_out_file, ".Llt%d:\n", loopend_lab);
! 	  output_asm_insn (assembler2, operands);
! 	  output_asm_insn ("add	#0xff,%X4", operands);
! 	  fprintf (asm_out_file, "\tbne	.Llt%d\n", loopend_lab);
! 	  if (n % 2)
! 	    output_asm_insn (assembler, operands);
! 	}
!       else
! 	{
! 	  fprintf (asm_out_file, "\tmov.b	#%d,%sl\n", n,
! 		   names_big[REGNO (operands[4])]);
! 	  fprintf (asm_out_file, ".Llt%d:\n", loopend_lab);
! 	  output_asm_insn (assembler, operands);
! 	  output_asm_insn ("add	#0xff,%X4", operands);
! 	  fprintf (asm_out_file, "\tbne	.Llt%d\n", loopend_lab);
  	}
+       return "";
+ 
+     default:
+       abort ();
      }
  }
  
Index: h8300.md
===================================================================
RCS file: /cvs/gcc/egcs/gcc/config/h8300/h8300.md,v
retrieving revision 1.14
diff -c -3 -p -r1.14 h8300.md
*** h8300.md	2000/08/01 03:05:52	1.14
--- h8300.md	2000/08/07 06:01:59
***************
*** 1762,1781 ****
  ;; SHIFTS
  ;; ----------------------------------------------------------------------
  ;;
! ;; We make some attempt to provide real efficient shifting.  One example is
! ;; doing an 8 bit shift of a 16 bit value by moving a byte reg into the other
! ;; reg and moving 0 into the former reg.
  ;;
! ;; We also try to achieve this in a uniform way.  IE: We don't try to achieve
! ;; this in both rtl and at insn emit time.  Ideally, we'd use rtl as that would
! ;; give the optimizer more cracks at the code.  However, we wish to do things
! ;; like optimizing shifting the sign bit to bit 0 by rotating the other way.
! ;; There is rtl to handle this (rotate + and), but the h8/300 doesn't handle
! ;; 16 bit rotates.  Also, if we emit complicated rtl, combine may not be able
! ;; to detect cases it can optimize.
  ;;
! ;; For these and other fuzzy reasons, I've decided to go the less pretty but
! ;; easier "do it at insn emit time" route.
  
  ;; QI BIT SHIFTS
  
--- 1762,1783 ----
  ;; SHIFTS
  ;; ----------------------------------------------------------------------
  ;;
! ;; We make some attempt to provide real efficient shifting.  One
! ;; example is doing an 8 bit shift of a 16 bit value by moving a byte
! ;; reg into the other reg and moving 0 into the former reg.
  ;;
! ;; We also try to achieve this in a fairly uniform way.  At the RTL
! ;; generation time, we convert a shift by a non-constant amount to a
! ;; loop in which a shift by one bit is used, leaving no shifts by
! ;; non-constant amounts any further.  At the insn emit time, we
! ;; actually emit the shift code.
  ;;
! ;; Ideally, we'd use rtl for all the shift code as that would give the
! ;; optimizer more cracks at the code.  However, we wish to do things
! ;; like optimizing shifting the sign bit to bit 0 by rotating the
! ;; other way.  There is rtl to handle this (rotate + and), but the
! ;; h8/300 doesn't handle 16 bit rotates.  Also, if we emit complicated
! ;; rtl, combine may not be able to detect cases it can optimize.
  
  ;; QI BIT SHIFTS
  
============================================================


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]