[PATCH, ARM] Cortex-A8 backend fixes

Dmitry Melnik dm@ispras.ru
Thu Feb 9 15:09:00 GMT 2012


This patch fixes few things in pipeline description of ARM Cortex-A8.

1) arm_no_early_alu_shift_value_dep() checks early dependence only for 
one argument, ignoring the dependence on register used as shift amount. 
For example, this function is used as a condition in bypass that sets 
dep_cost=0 between mov and ALU operations:

   mov r0, r1
   add r3, r4, r5, asr r0

This results in dep_cost returning 0 for these insns, while according
to Technical Reference Manual it should be 1
(http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344k/Babcagee.html). 


Also, in PLUS and MINUS rtx expressions the order of operands is 
different: PLUS has shift expression as its first argument, while MINUS 
usually has shift as a second argument. But in 
arm_no_early_alu_shift_value_dep() only the first argument is checked as 
EARLY_OP. We changed arm_no_early_alu_shift_dep() so it uses 
rtx_search() to find SHIFT expression.  As all registers for SHIFT 
expression are required at stage E1, it's no difference whether it's 
shift's first or second argument, so we use new 
arm_no_early_alu_shift_dep() instead of 
arm_no_early_alu_shift_value_dep() in Cortex-A8 bypasses. Functions 
arm_no_early_alu_shift_[value_]dep() are also used in Cortex-A5, 
Cortex-R4 and ARM1136JFS descriptions, so we named modified function as  
arm_cortex_a8_no_early_alu_shift_dep().
Besides SHIFTs and ROTATE, the function also handles MULT (which is used 
to represent shifts by a constant) and ZERO_EXTEND and SIGN_EXTEND (they 
also have type of alu_shift).

2) MUL to ALU bypass has incorrect delay of 4 cycles, while according to 
TRM it has to be 5 for MUL and 6 for MULL.  The patch splits this bypass 
in two and sets the correct delay values.

3) In cortex-a8.md MOV with shift instructions matched to wrong 
reservations (cortex_a8_alu_shift, cortex_a8_alu_shift_reg).  Adding 
insn attribute "mov" for arm_shiftsi3 pattern in arm.md fixes that.

4) SMLALxy was moved from cortex_a8_mull reservation to 
cortex_a8_smlald, which according to TRM has proper timing for this insn 
(1 cycle less than MULL).

5) ARM Cortex-A8 TRM itself contains inaccurate timings for availability 
of RdLo in some multiply instructions.  Namely, lower part of the result 
for (S|U)MULL, (S|U)MLAL, UMAAL, SMLALxy, SMLALD, SMLSLD instructions  
is already available at E4 stage (instead of E5 in TRM).

This information initially was found in beagle board mailing list, and 
it's confirmed by our tests and these sites: 
http://www.avison.me.uk/ben/programming/cortex-a8.html and 
http://hilbert-space.de/?p=66

The patch adds two bypasses between these instructions and MOV 
instruction, which uses arm_mull_low_part_dep() to check whether 
dependency is only on the low part of MUL destination.  Bypasses between 
MULL and ALU insns for RdLo can't be added, because between this pair of 
reservation there are existing bypasses.  However, in practice these 
multiply insns are rare, and on SPEC2K INT code low part of the result 
for such insns is never used.

-- 
Best regards,
   Dmitry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cortex-a8-fixes.diff
Type: text/x-diff
Size: 9167 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20120209/95dce85f/attachment.bin>


More information about the Gcc-patches mailing list