This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] H8300 Shift Optimization


Hi Kazu,

Based on the analysis given below, some of the HImode Shifts currently implemented as Loop could be optimized for speed.

For H8300
=========

1. SHIFT_ASHIFT by 5 - This could be inlined. Current implementation 
takes 42 clock cycles for execution and requires 4 instructions. If 
inlined it takes 10 cycles and 5 instructions. 

2. SHIFT_ASHIFT by 6 - If inlined this would execute in 12 cycles as 
compared to 50 at the cost of two more instructions.

3. SHIFT_ASHIFT by 13 - Moving low byte to high byte and inlining rest 5 
takes 7 instructions and 14 clock cycles while loop takes 4 instructions 
but 106 clock cycles.

4. SHIFT_ASHIFT by 14 - Loop takes 4 instructions and 114 clock cycles. 
Instead if low byte is moved to high byte, rotate right twice and AND 
with 0xC0 takes 5 instructions and executes in 10 clock cycles.

5. SHIFT_LSHFTRT by 13 - Loop takes 5 instructions and 132 clock cycles, 
while moving high byte to low byte and inlining rest takes 7 instructions 
and 14 clock cycles.

6. SHIFT_LSHIFTRT by 14 - Loop takes 5 instructions and 142 clock cycles 
while move byte, rotate left twice, AND with 0x03 takes 5 instructions 
and 10 clock cycles.

7. SHIFT_ASHIFTRT by 13 - Loop takes 5 instructions and 132 clock cycles, 
while moving high byte to low byte and inlining rest takes 7 instructions 
and 14 clock cycles.

8. SHIFT_ASHIFTRT by 14 - Loop takes 5 instructions and 142 clock cycles 
while if coded as 

mov.b     r0h,r0l
shll.b    r0l
subx.b    r0h,r0h
shll.b    r0l
mov.b     r0h,r0l
bst.b     #0,r0l

takes 6 instructions and 12 clock cycles.

For H8300H
==========

1. SHIFT_ASHIFT by 5 - This could be inlined. Current implementation 
takes 42 clock cycles for execution and requires 4 instructions. If 
inlined it takes 10 cycles and 5 instructions. 

2. SHIFT_ASHIFT by 6 - If inlined this would execute in 12 cycles as 
compared to 50 at the cost of two more instructions.

3.SHIFT_LSHIFTRT by 5 - This could be inlined. Current implementation 
takes 42 clock cycles for execution and requires 4 instructions. If 
inlined it takes 10 cycles and 5 instructions. 

4. SHIFT_LSHIFTRT by 6 - If inlined this would execute in 12 cycles as 
compared to 50 at the cost of two more instructions.

5.SHIFT_ASHIFTRT by 5 - This could be inlined. Current implementation 
takes 42 clock cycles for execution and requires 4 instructions. If 
inlined it takes 10 cycles and 5 instructions. 

6. SHIFT_ASHIFTRT by 6 - If inlined this would execute in 12 cycles as 
compared to 50 at the cost of two more instructions.

7. SHIFT_ASHIFTRT by 13 - Loop takes 4 instructions and 106 clock cycles, 
while moving high byte to low byte and inlining rest takes 7 instructions 
and 14 clock cycles.

8. SHIFT_ASHIFTRT by 14 - Loop takes 4 instructions and 114 clock cycles 
while if coded as 

shll.b          r0h
subx.b          r0l,r0l 
shll.b          r0h    
rotxl.b         r0l
exts.w          r0

takes 5 instructions and 10 clock cycles.

For H8S
========

1. SHIFT_ASHIFTRT by 13 - Loop takes 5 instructions and 26 clock cycles, 
while moving high byte to low byte and inlining rest takes 5 instructions 
and 5 clock cycles.

2. SHIFT_ASHIFTRT by 14 - Loop takes 4 instructions and 29 clock cycles 
while if coded as 

mov.b       r0h,r0l
exts.w      r0
shar.w      #2,r0
shar.w      #2,r0
shar.w      #2,r0

takes 5 instructions and 5 clock cycles.

So at the cost of 1 or 2 instructions most of the loops could be avoided 
and good amount of speed increase could be achieved.

Following patch implements the above cases.

========================================================================
*** h8300.c.orig	Tue Jul 23 12:17:18 2002
--- h8300.c	Wed Jul 24 10:35:46 2002
***************
*** 2047,2056 ****
      /*  0    1    2    3    4    5    6    7  */
      /*  8    9   10   11   12   13   14   15  */
      { INL, INL, INL, INL, INL, LOP, LOP, SPC,
!       SPC, SPC, SPC, SPC, SPC, LOP, LOP, SPC }, /* SHIFT_ASHIFT   */
      { INL, INL, INL, INL, INL, LOP, LOP, SPC,
!       SPC, SPC, SPC, SPC, SPC, LOP, LOP, SPC }, /* SHIFT_LSHIFTRT */
!     { INL, INL, INL, INL, INL, LOP, LOP, SPC,
!       SPC, SPC, SPC, SPC, SPC, LOP, LOP, SPC }, /* SHIFT_ASHIFTRT */
    },
    {
--- 2047,2056 ----
      /*  0    1    2    3    4    5    6    7  */
      /*  8    9   10   11   12   13   14   15  */
+     { INL, INL, INL, INL, INL, INL, INL, SPC,
+       SPC, SPC, SPC, SPC, SPC, SPC, SPC, SPC }, /* SHIFT_ASHIFT   */
      { INL, INL, INL, INL, INL, LOP, LOP, SPC,
!       SPC, SPC, SPC, SPC, SPC, SPC, SPC, SPC }, /* SHIFT_LSHIFTRT */
      { INL, INL, INL, INL, INL, LOP, LOP, SPC,
!       SPC, SPC, SPC, SPC, SPC, SPC, SPC, SPC }, /* SHIFT_ASHIFTRT */
    },
    {
***************
*** 2058,2067 ****
      /*  0    1    2    3    4    5    6    7  */
      /*  8    9   10   11   12   13   14   15  */
!     { INL, INL, INL, INL, INL, LOP, LOP, SPC,
        SPC, SPC, SPC, SPC, SPC, ROT, ROT, ROT }, /* SHIFT_ASHIFT   */
!     { INL, INL, INL, INL, INL, LOP, LOP, SPC,
        SPC, SPC, SPC, SPC, SPC, ROT, ROT, ROT }, /* SHIFT_LSHIFTRT */
!     { INL, INL, INL, INL, INL, LOP, LOP, SPC,
!       SPC, SPC, SPC, SPC, SPC, LOP, LOP, SPC }, /* SHIFT_ASHIFTRT */
    },
    {
--- 2058,2067 ----
      /*  0    1    2    3    4    5    6    7  */
      /*  8    9   10   11   12   13   14   15  */
!     { INL, INL, INL, INL, INL, INL, INL, SPC,
        SPC, SPC, SPC, SPC, SPC, ROT, ROT, ROT }, /* SHIFT_ASHIFT   */
!     { INL, INL, INL, INL, INL, INL, INL, SPC,
        SPC, SPC, SPC, SPC, SPC, ROT, ROT, ROT }, /* SHIFT_LSHIFTRT */
!     { INL, INL, INL, INL, INL, INL, INL, SPC,
!       SPC, SPC, SPC, SPC, SPC, SPC, SPC, SPC }, /* SHIFT_ASHIFTRT */
    },
    {
***************
*** 2074,2078 ****
        SPC, SPC, SPC, SPC, SPC, ROT, ROT, ROT }, /* SHIFT_LSHIFTRT */
      { INL, INL, INL, INL, INL, INL, INL, INL,
!       SPC, SPC, SPC, SPC, SPC, LOP, LOP, SPC }, /* SHIFT_ASHIFTRT */
    }
  };
--- 2074,2078 ----
        SPC, SPC, SPC, SPC, SPC, ROT, ROT, ROT }, /* SHIFT_LSHIFTRT */
      { INL, INL, INL, INL, INL, INL, INL, INL,
!       SPC, SPC, SPC, SPC, SPC, SPC, SPC, SPC }, /* SHIFT_ASHIFTRT */
    }
  };
***************
*** 2292,2296 ****
  	    }
  	}
!       else if (8 <= count && count <= 12)
  	{
  	  info->remainder = count - 8;
--- 2292,2296 ----
  	    }
  	}
!       else if (8 <= count && count <= 13)
  	{
  	  info->remainder = count - 8;
***************
*** 2315,2318 ****
--- 2315,2340 ----
  	      info->shift1 = "shar.b\t%s0";
  	      info->shift2 = "shar.b\t#2,%s0";
+ 	      goto end;
+ 	    }
+ 	}
+       else if (count == 14)
+ 	{
+ 	  switch (shift_type)
+ 	    {
+ 	    case SHIFT_ASHIFT:
+ 	      if (TARGET_H8300)
+ 		info->special = "mov.b\t%s0,%t0\n\trotr.b\t%t0\n\trotr.b\t%t0\n\tand.b\t#0xC0,%t0\n\tsub.b\t%s0,%s0";
+ 	      goto end;
+ 	    case SHIFT_LSHIFTRT:
+ 	      if (TARGET_H8300)
+ 		info->special = "mov.b\t%t0,%s0\n\trotl.b\t%s0\n\trotl.b\t%s0\n\tand.b\t#3,%s0\n\tsub.b\t%t0,%t0";
+ 	      goto end;
+ 	    case SHIFT_ASHIFTRT:
+ 	      if (TARGET_H8300)
+ 		info->special = "mov.b\t%t0,%s0\n\tshll.b\t%s0\n\tsubx.b\t%t0,%t0\n\tshll.b\t%s0\n\tmov.b\t%t0,%s0\n\tbst.b\t#0,%s0";
+ 	      else if (TARGET_H8300H)
+ 		info->special = "shll.b\t%t0\n\tsubx.b\t%s0,%s0\n\tshll.b\t%t0\n\trotxl.b\t%s0\n\texts.w\t%T0";
+ 	      else /* TARGET_H8300S */
+ 		info->special = "mov.b\t%t0,%s0\n\texts.w\t%T0\n\tshar.w\t#2,%T0\n\tshar.w\t#2,%T0\n\tshar.w\t#2,%T0";
  	      goto end;
  	    }
========================================================================

Regards,
Dhananjay

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Free download of GNUSH and GNUH8 tool-chains for Hitachi's SH and H8 Series.
The following site also offers free support to European customers.
Read more at http://www.kpit.com.
Latest versions of GNUSH and GNUH8 are released on July 1, 2002.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]