This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH] H8300 Shift Optimization
- From: "Dhananjay R. Deshpande" <dhananjayd at kpit dot com>
- To: "Kazu Hirata" <kazu at cs dot umass dot edu>
- Cc: <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 31 Jul 2002 15:17:02 +0530
- Subject: [PATCH] H8300 Shift Optimization
Hi Kazu,
Based on the analysis given below, some of the HImode Shifts currently implemented as Loop could be optimized for speed.
For H8300
=========
1. SHIFT_ASHIFT by 5 - This could be inlined. Current implementation
takes 42 clock cycles for execution and requires 4 instructions. If
inlined it takes 10 cycles and 5 instructions.
2. SHIFT_ASHIFT by 6 - If inlined this would execute in 12 cycles as
compared to 50 at the cost of two more instructions.
3. SHIFT_ASHIFT by 13 - Moving low byte to high byte and inlining rest 5
takes 7 instructions and 14 clock cycles while loop takes 4 instructions
but 106 clock cycles.
4. SHIFT_ASHIFT by 14 - Loop takes 4 instructions and 114 clock cycles.
Instead if low byte is moved to high byte, rotate right twice and AND
with 0xC0 takes 5 instructions and executes in 10 clock cycles.
5. SHIFT_LSHFTRT by 13 - Loop takes 5 instructions and 132 clock cycles,
while moving high byte to low byte and inlining rest takes 7 instructions
and 14 clock cycles.
6. SHIFT_LSHIFTRT by 14 - Loop takes 5 instructions and 142 clock cycles
while move byte, rotate left twice, AND with 0x03 takes 5 instructions
and 10 clock cycles.
7. SHIFT_ASHIFTRT by 13 - Loop takes 5 instructions and 132 clock cycles,
while moving high byte to low byte and inlining rest takes 7 instructions
and 14 clock cycles.
8. SHIFT_ASHIFTRT by 14 - Loop takes 5 instructions and 142 clock cycles
while if coded as
mov.b r0h,r0l
shll.b r0l
subx.b r0h,r0h
shll.b r0l
mov.b r0h,r0l
bst.b #0,r0l
takes 6 instructions and 12 clock cycles.
For H8300H
==========
1. SHIFT_ASHIFT by 5 - This could be inlined. Current implementation
takes 42 clock cycles for execution and requires 4 instructions. If
inlined it takes 10 cycles and 5 instructions.
2. SHIFT_ASHIFT by 6 - If inlined this would execute in 12 cycles as
compared to 50 at the cost of two more instructions.
3.SHIFT_LSHIFTRT by 5 - This could be inlined. Current implementation
takes 42 clock cycles for execution and requires 4 instructions. If
inlined it takes 10 cycles and 5 instructions.
4. SHIFT_LSHIFTRT by 6 - If inlined this would execute in 12 cycles as
compared to 50 at the cost of two more instructions.
5.SHIFT_ASHIFTRT by 5 - This could be inlined. Current implementation
takes 42 clock cycles for execution and requires 4 instructions. If
inlined it takes 10 cycles and 5 instructions.
6. SHIFT_ASHIFTRT by 6 - If inlined this would execute in 12 cycles as
compared to 50 at the cost of two more instructions.
7. SHIFT_ASHIFTRT by 13 - Loop takes 4 instructions and 106 clock cycles,
while moving high byte to low byte and inlining rest takes 7 instructions
and 14 clock cycles.
8. SHIFT_ASHIFTRT by 14 - Loop takes 4 instructions and 114 clock cycles
while if coded as
shll.b r0h
subx.b r0l,r0l
shll.b r0h
rotxl.b r0l
exts.w r0
takes 5 instructions and 10 clock cycles.
For H8S
========
1. SHIFT_ASHIFTRT by 13 - Loop takes 5 instructions and 26 clock cycles,
while moving high byte to low byte and inlining rest takes 5 instructions
and 5 clock cycles.
2. SHIFT_ASHIFTRT by 14 - Loop takes 4 instructions and 29 clock cycles
while if coded as
mov.b r0h,r0l
exts.w r0
shar.w #2,r0
shar.w #2,r0
shar.w #2,r0
takes 5 instructions and 5 clock cycles.
So at the cost of 1 or 2 instructions most of the loops could be avoided
and good amount of speed increase could be achieved.
Following patch implements the above cases.
========================================================================
*** h8300.c.orig Tue Jul 23 12:17:18 2002
--- h8300.c Wed Jul 24 10:35:46 2002
***************
*** 2047,2056 ****
/* 0 1 2 3 4 5 6 7 */
/* 8 9 10 11 12 13 14 15 */
{ INL, INL, INL, INL, INL, LOP, LOP, SPC,
! SPC, SPC, SPC, SPC, SPC, LOP, LOP, SPC }, /* SHIFT_ASHIFT */
{ INL, INL, INL, INL, INL, LOP, LOP, SPC,
! SPC, SPC, SPC, SPC, SPC, LOP, LOP, SPC }, /* SHIFT_LSHIFTRT */
! { INL, INL, INL, INL, INL, LOP, LOP, SPC,
! SPC, SPC, SPC, SPC, SPC, LOP, LOP, SPC }, /* SHIFT_ASHIFTRT */
},
{
--- 2047,2056 ----
/* 0 1 2 3 4 5 6 7 */
/* 8 9 10 11 12 13 14 15 */
+ { INL, INL, INL, INL, INL, INL, INL, SPC,
+ SPC, SPC, SPC, SPC, SPC, SPC, SPC, SPC }, /* SHIFT_ASHIFT */
{ INL, INL, INL, INL, INL, LOP, LOP, SPC,
! SPC, SPC, SPC, SPC, SPC, SPC, SPC, SPC }, /* SHIFT_LSHIFTRT */
{ INL, INL, INL, INL, INL, LOP, LOP, SPC,
! SPC, SPC, SPC, SPC, SPC, SPC, SPC, SPC }, /* SHIFT_ASHIFTRT */
},
{
***************
*** 2058,2067 ****
/* 0 1 2 3 4 5 6 7 */
/* 8 9 10 11 12 13 14 15 */
! { INL, INL, INL, INL, INL, LOP, LOP, SPC,
SPC, SPC, SPC, SPC, SPC, ROT, ROT, ROT }, /* SHIFT_ASHIFT */
! { INL, INL, INL, INL, INL, LOP, LOP, SPC,
SPC, SPC, SPC, SPC, SPC, ROT, ROT, ROT }, /* SHIFT_LSHIFTRT */
! { INL, INL, INL, INL, INL, LOP, LOP, SPC,
! SPC, SPC, SPC, SPC, SPC, LOP, LOP, SPC }, /* SHIFT_ASHIFTRT */
},
{
--- 2058,2067 ----
/* 0 1 2 3 4 5 6 7 */
/* 8 9 10 11 12 13 14 15 */
! { INL, INL, INL, INL, INL, INL, INL, SPC,
SPC, SPC, SPC, SPC, SPC, ROT, ROT, ROT }, /* SHIFT_ASHIFT */
! { INL, INL, INL, INL, INL, INL, INL, SPC,
SPC, SPC, SPC, SPC, SPC, ROT, ROT, ROT }, /* SHIFT_LSHIFTRT */
! { INL, INL, INL, INL, INL, INL, INL, SPC,
! SPC, SPC, SPC, SPC, SPC, SPC, SPC, SPC }, /* SHIFT_ASHIFTRT */
},
{
***************
*** 2074,2078 ****
SPC, SPC, SPC, SPC, SPC, ROT, ROT, ROT }, /* SHIFT_LSHIFTRT */
{ INL, INL, INL, INL, INL, INL, INL, INL,
! SPC, SPC, SPC, SPC, SPC, LOP, LOP, SPC }, /* SHIFT_ASHIFTRT */
}
};
--- 2074,2078 ----
SPC, SPC, SPC, SPC, SPC, ROT, ROT, ROT }, /* SHIFT_LSHIFTRT */
{ INL, INL, INL, INL, INL, INL, INL, INL,
! SPC, SPC, SPC, SPC, SPC, SPC, SPC, SPC }, /* SHIFT_ASHIFTRT */
}
};
***************
*** 2292,2296 ****
}
}
! else if (8 <= count && count <= 12)
{
info->remainder = count - 8;
--- 2292,2296 ----
}
}
! else if (8 <= count && count <= 13)
{
info->remainder = count - 8;
***************
*** 2315,2318 ****
--- 2315,2340 ----
info->shift1 = "shar.b\t%s0";
info->shift2 = "shar.b\t#2,%s0";
+ goto end;
+ }
+ }
+ else if (count == 14)
+ {
+ switch (shift_type)
+ {
+ case SHIFT_ASHIFT:
+ if (TARGET_H8300)
+ info->special = "mov.b\t%s0,%t0\n\trotr.b\t%t0\n\trotr.b\t%t0\n\tand.b\t#0xC0,%t0\n\tsub.b\t%s0,%s0";
+ goto end;
+ case SHIFT_LSHIFTRT:
+ if (TARGET_H8300)
+ info->special = "mov.b\t%t0,%s0\n\trotl.b\t%s0\n\trotl.b\t%s0\n\tand.b\t#3,%s0\n\tsub.b\t%t0,%t0";
+ goto end;
+ case SHIFT_ASHIFTRT:
+ if (TARGET_H8300)
+ info->special = "mov.b\t%t0,%s0\n\tshll.b\t%s0\n\tsubx.b\t%t0,%t0\n\tshll.b\t%s0\n\tmov.b\t%t0,%s0\n\tbst.b\t#0,%s0";
+ else if (TARGET_H8300H)
+ info->special = "shll.b\t%t0\n\tsubx.b\t%s0,%s0\n\tshll.b\t%t0\n\trotxl.b\t%s0\n\texts.w\t%T0";
+ else /* TARGET_H8300S */
+ info->special = "mov.b\t%t0,%s0\n\texts.w\t%T0\n\tshar.w\t#2,%T0\n\tshar.w\t#2,%T0\n\tshar.w\t#2,%T0";
goto end;
}
========================================================================
Regards,
Dhananjay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Free download of GNUSH and GNUH8 tool-chains for Hitachi's SH and H8 Series.
The following site also offers free support to European customers.
Read more at http://www.kpit.com.
Latest versions of GNUSH and GNUH8 are released on July 1, 2002.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~