[PATCH: ARM] PR 45335 Use ldrd and strd to access two consecutive words
Carrot Wei
carrot@google.com
Tue Jan 4 08:57:00 GMT 2011
Happy new year!
Hope I can check in this patch in 2011
On Wed, Dec 15, 2010 at 6:00 AM, Carrot Wei <carrot@google.com> wrote:
> ping
>
> On Mon, Nov 29, 2010 at 2:32 PM, Carrot Wei <carrot@google.com> wrote:
>> ping
>>
>> On Mon, Nov 22, 2010 at 3:16 PM, Carrot Wei <carrot@google.com> wrote:
>>> ping
>>>
>>> On Sun, Oct 31, 2010 at 2:22 AM, Carrot Wei <carrot@google.com> wrote:
>>>> Ping
>>>>
>>>> On Sun, Oct 24, 2010 at 9:46 PM, Carrot Wei <carrot@google.com> wrote:
>>>>> Ping
>>>>>
>>>>> On Sat, Oct 16, 2010 at 8:27 PM, Carrot Wei <carrot@google.com> wrote:
>>>>>> On Wed, Oct 13, 2010 at 7:01 PM, Paul Brook <paul@codesourcery.com> wrote:
>>>>>>>> ChangeLog:
>>>>>>>> 2010-09-04 Wei Guozhi <carrot@google.com>
>>>>>>>>
>>>>>>>> PR target/45335
>>>>>>>> * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>>>>>> thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>>>>>> peephole2.
>>>>>>>> (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>>>>>> New insn pattern and related peephole2.
>>>>>>>> * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>>>>>> (thumb2_check_ldrd_operands): New function.
>>>>>>>> (thumb2_prefer_ldmstm): New function.
>>>>>>>> * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New
>>>>>>>> prototype. (thumb2_check_ldrd_operands): New prototype.
>>>>>>>> (thumb2_prefer_ldmstm): New prototype.
>>>>>>>> * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>>>>>> Change the ldm/stm patterns with 2 words to ARM only.
>>>>>>>> * gcc/config/arm/constraints.md (Py): New thumb2 constant
>>>>>>>> constraint suitable to ldrd/strd instructions.
>>>>>>>
>>>>>>> Not ok.
>>>>>>>
>>>>>>> Why is this restricted to Thumb mode? The ARM variant of ldrd isn't quite as
>>>>>>> flexible, but still provides a useful improvement over ldm.
>>>>>>>
>>>>>> I agree the ARM version is also useful. But it brings much less
>>>>>> benefit with too much complexity (due to more restriction and insn
>>>>>> pattern conflict with ldm). So I will leave it as a future
>>>>>> improvement.
>>>>>>
>>>>>>> This transformation is only valid on ARMv7 cores. On earlier hardware
>>>>>>> (depending on system configuration) it may cause undefined behavior or an
>>>>>>> alignment trap.
>>>>>>>
>>>>>> done.
>>>>>>
>>>>>>> The range on -1020 to +1024 is used in several places, but without any
>>>>>>> apparent explanation of why it's different to the range of an ldrd
>>>>>>> instruction. I figured it out eventually, but it deserves a comment.
>>>>>>>
>>>>>> Comments added.
>>>>>>
>>>>>>>> + "TARGET_THUMB2 && thumb2_check_ldrd_operands (operands[0], operands[1],
>>>>>>>> + operands[2], 0, operands[3], 1)"
>>>>>>>
>>>>>>> Passed operands do not match expected types. Specifically "0" is not an rtx
>>>>>>> (should be "NULL_RTX"), and "1" is not a boolean value (should be "true").
>>>>>>> Many other occurrences.
>>>>>>>
>>>>>> Fixed.
>>>>>>
>>>>>>>> +(define_constraint "Py"
>>>>>>>> + "@internal In Thumb-2 state a constant that is a multiple of 4 in the
>>>>>>>> + range -1020 to 1024"
>>>>>>>
>>>>>>> This comment seems particularly pointless. You should mention why this
>>>>>>> exists/where it is used.
>>>>>>>
>>>>>>> I think you're better off enforcing this in the insn condition, and remove
>>>>>>> this constraint. At least half the uses (the -reg[12] insns) are incorrect,
>>>>>>> and you already need the condition to enforce the dependency between the
>>>>>>> operands.
>>>>>>>
>>>>>> I removed this constraint and add the check to insn condition.
>>>>>>
>>>>>>>> +thumb2_check_ldrd_operands (rtx reg1, rtx reg2, rtx base,
>>>>>>>>...
>>>>>>>> + if (ldrd && (reg1 == reg2))
>>>>>>>> + return false;
>>>>>>>
>>>>>>> This function is part of the instruction condition. Instruction conditions
>>>>>>> must not be used to enforce register allocation.
>>>>>>>
>>>>>> removed.
>>>>>>
>>>>>>>> +thumb2_legitimate_ldrd_p (
>>>>>>>>...
>>>>>>>> + if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>>>>>>> + return false;
>>>>>>>
>>>>>>> You're incorrectly assuming offset1 < offset2, which might not be true at this
>>>>>>> point.
>>>>>>>
>>>>>> The following check assumes offset1 < offset2
>>>>>> + if ((offset1 + 4) == offset2)
>>>>>> + return true;
>>>>>>
>>>>>> And another check assumes offset2 < offset1, so both cases are covered.
>>>>>> + if ((offset2 + 4) == offset1)
>>>>>> + return true;
>>>>>>
>>>>>>>> + /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>>>>>>> + cost. */
>>>>>>>> + return false;
>>>>>>>
>>>>>>> Code clearly doesn't match the comment. In fact this function always returns
>>>>>>> false.
>>>>>>>
>>>>>> Richard mentioned that in some cases (specifically cortex A9) ldm has
>>>>>> less cost than ldrd and we should model this in the insn pattern. This
>>>>>> function is used for this. But I don't know the cortex A9 architecture
>>>>>> detail, so it should be filled by somebody with more knowledge about
>>>>>> it in future.
>>>>>>
>>>>>> Wei Guozhi
>>>>>>
>>>>>>
>>>>>> ChangeLog:
>>>>>> 2010-10-16 Wei Guozhi <carrot@google.com>
>>>>>>
>>>>>> PR target/45335
>>>>>> * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>>>> thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>>>> peephole2.
>>>>>> (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>>>> New insn pattern and related peephole2.
>>>>>> * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>>>> (thumb2_check_ldrd_operands): New function.
>>>>>> (thumb2_prefer_ldmstm): New function.
>>>>>> * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New prototype.
>>>>>> (thumb2_check_ldrd_operands): New prototype.
>>>>>> (thumb2_prefer_ldmstm): New prototype.
>>>>>> * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>>>> Change the ldm/stm patterns with 2 words to ARM only.
>>>>>>
>>>>>>
>>>>>> 2010-10-16 Wei Guozhi <carrot@google.com>
>>>>>>
>>>>>> PR target/45335
>>>>>> * gcc.target/arm/pr45335.c: New test.
>>>>>> * gcc.target/arm/pr40457-1.c: Changed to load 3 words.
>>>>>> * gcc.target/arm/pr40457-2.c: Changed to store 3 words.
>>>>>> * gcc.target/arm/pr40457-3.c: Changed to store 3 words.
>>>>>>
>>>>>>
>>>>>> Index: thumb2.md
>>>>>> ===================================================================
>>>>>> --- thumb2.md (revision 165492)
>>>>>> +++ thumb2.md (working copy)
>>>>>> @@ -1118,3 +1118,228 @@ (define_peephole2
>>>>>> "
>>>>>> operands[2] = GEN_INT (32 - INTVAL (operands[2]));
>>>>>> ")
>>>>>> +
>>>>>> +(define_insn "*thumb2_ldrd"
>>>>>> + [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>>> + (mem:SI (plus:SI
>>>>>> + (match_operand:SI 2 "s_register_operand" "rk")
>>>>>> + (match_operand:SI 3 "const_int_operand" ""))))
>>>>>> + (set (match_operand:SI 1 "s_register_operand" "")
>>>>>> + (mem:SI (plus:SI (match_dup 2)
>>>>>> + (match_operand:SI 4 "const_int_operand" ""))))])]
>>>>>> + "TARGET_THUMB2 && arm_arch7
>>>>>> + && thumb2_check_ldrd_operands (operands[3], operands[4])"
>>>>>> + "*
>>>>>> + {
>>>>>> + HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>>> + HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>>>>>> + if (offset1 > offset2)
>>>>>> + {
>>>>>> + /* Swap the operands so that memory [base+offset1] is loaded into
>>>>>> + operands[0]. */
>>>>>> + rtx tmp = operands[0];
>>>>>> + operands[0] = operands[1];
>>>>>> + operands[1] = tmp;
>>>>>> + tmp = operands[3];
>>>>>> + operands[3] = operands[4];
>>>>>> + operands[4] = tmp;
>>>>>> + offset1 = INTVAL (operands[3]);
>>>>>> + offset2 = INTVAL (operands[4]);
>>>>>> + }
>>>>>> + if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>>> + operands[2], operands[3], operands[4], true))
>>>>>> + return \"ldmdb\\t%2, {%0, %1}\";
>>>>>> + else if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>>> + {
>>>>>> + if (offset1 <= -256)
>>>>>> + {
>>>>>> + output_asm_insn (\"sub\\t%2, %2, %n3\", operands);
>>>>>> + output_asm_insn (\"ldr\\t%1, [%2, #4]\", operands);
>>>>>> + output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>>> + }
>>>>>> + else
>>>>>> + {
>>>>>> + output_asm_insn (\"ldr\\t%1, [%2, %4]\", operands);
>>>>>> + output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>>> + }
>>>>>> + return \"\";
>>>>>> + }
>>>>>> + else
>>>>>> + return \"ldrd\\t%0, %1, [%2, %3]\";
>>>>>> + }"
>>>>>> +)
>>>>>> +
>>>>>> +(define_insn "*thumb2_ldrd_reg1"
>>>>>> + [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>>> + (mem:SI (match_operand:SI 2 "s_register_operand" "rk")))
>>>>>> + (set (match_operand:SI 1 "s_register_operand" "")
>>>>>> + (mem:SI (plus:SI (match_dup 2)
>>>>>> + (match_operand:SI 3 "const_int_operand" ""))))])]
>>>>>> + "TARGET_THUMB2 && arm_arch7
>>>>>> + && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>>>>>> + "*
>>>>>> + {
>>>>>> + HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>>>>>> + if (offset2 == 4)
>>>>>> + {
>>>>>> + if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>>> + operands[2], NULL_RTX, operands[3], true))
>>>>>> + return \"ldmia\\t%2, {%0, %1}\";
>>>>>> + if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>>> + {
>>>>>> + output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>>>>>> + output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>>> + return \"\";
>>>>>> + }
>>>>>> + return \"ldrd\\t%0, %1, [%2]\";
>>>>>> + }
>>>>>> + else
>>>>>> + {
>>>>>> + if (fix_cm3_ldrd && (operands[2] == operands[1]))
>>>>>> + {
>>>>>> + output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>>> + output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>>>>>> + }
>>>>>> + return \"ldrd\\t%1, %0, [%2, %3]\";
>>>>>> + }
>>>>>> + }"
>>>>>> +)
>>>>>> +
>>>>>> +(define_insn "*thumb2_ldrd_reg2"
>>>>>> + [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>>> + (mem:SI (plus:SI
>>>>>> + (match_operand:SI 2 "s_register_operand" "rk")
>>>>>> + (match_operand:SI 3 "const_int_operand" ""))))
>>>>>> + (set (match_operand:SI 1 "s_register_operand" "")
>>>>>> + (mem:SI (match_dup 2)))])]
>>>>>> + "TARGET_THUMB2 && arm_arch7
>>>>>> + && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>>>>>> + "*
>>>>>> + {
>>>>>> + HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>>> + if (offset1 == -4)
>>>>>> + {
>>>>>> + if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>>> + {
>>>>>> + output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>>>>>> + output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>>> + return \"\";
>>>>>> + }
>>>>>> + return \"ldrd\\t%0, %1, [%2, %3]\";
>>>>>> + }
>>>>>> + else
>>>>>> + {
>>>>>> + if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>>> + operands[2], operands[3], NULL_RTX, true))
>>>>>> + return \"ldmia\\t%2, {%1, %0}\";
>>>>>> + if (fix_cm3_ldrd && (operands[2] == operands[1]))
>>>>>> + {
>>>>>> + output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>>> + output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>>>>>> + return \"\";
>>>>>> + }
>>>>>> + return \"ldrd\\t%1, %0, [%2]\";
>>>>>> + }
>>>>>> + }"
>>>>>> +)
>>>>>> +
>>>>>> +(define_peephole2
>>>>>> + [(set (match_operand:SI 0 "s_register_operand" "")
>>>>>> + (match_operand:SI 2 "memory_operand" ""))
>>>>>> + (set (match_operand:SI 1 "s_register_operand" "")
>>>>>> + (match_operand:SI 3 "memory_operand" ""))]
>>>>>> + "TARGET_THUMB2 && arm_arch7
>>>>>> + && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>>>>>> + operands[2], operands[3], true)"
>>>>>> + [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>>> + (match_operand:SI 2 "memory_operand" ""))
>>>>>> + (set (match_operand:SI 1 "s_register_operand" "")
>>>>>> + (match_operand:SI 3 "memory_operand" ""))])]
>>>>>> + ""
>>>>>> +)
>>>>>> +
>>>>>> +(define_insn "*thumb2_strd"
>>>>>> + [(parallel [(set (mem:SI
>>>>>> + (plus:SI (match_operand:SI 2 "s_register_operand" "rk")
>>>>>> + (match_operand:SI 3 "const_int_operand" "")))
>>>>>> + (match_operand:SI 0 "s_register_operand" ""))
>>>>>> + (set (mem:SI (plus:SI (match_dup 2)
>>>>>> + (match_operand:SI 4 "const_int_operand" "")))
>>>>>> + (match_operand:SI 1 "s_register_operand" ""))])]
>>>>>> + "TARGET_THUMB2 && arm_arch7
>>>>>> + && thumb2_check_ldrd_operands (operands[3], operands[4])"
>>>>>> + "*
>>>>>> + {
>>>>>> + HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>>> + HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>>>>>> + if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>>> + operands[2], operands[3], operands[4], false))
>>>>>> + return \"stmdb\\t%2, {%0, %1}\";
>>>>>> + if (offset1 < offset2)
>>>>>> + return \"strd\\t%0, %1, [%2, %3]\";
>>>>>> + else
>>>>>> + return \"strd\\t%1, %0, [%2, %4]\";
>>>>>> + }"
>>>>>> +)
>>>>>> +
>>>>>> +(define_insn "*thumb2_strd_reg1"
>>>>>> + [(parallel [(set (mem:SI (match_operand:SI 2 "s_register_operand" "rk"))
>>>>>> + (match_operand:SI 0 "s_register_operand" ""))
>>>>>> + (set (mem:SI (plus:SI (match_dup 2)
>>>>>> + (match_operand:SI 3 "const_int_operand" "")))
>>>>>> + (match_operand:SI 1 "s_register_operand" ""))])]
>>>>>> + "TARGET_THUMB2 && arm_arch7
>>>>>> + && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>>>>>> + "*
>>>>>> + {
>>>>>> + HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>>>>>> + if (offset2 == 4)
>>>>>> + {
>>>>>> + if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>>> + operands[2], NULL_RTX, operands[3], false))
>>>>>> + return \"stmia\\t%2, {%0, %1}\";
>>>>>> + return \"strd\\t%0, %1, [%2]\";
>>>>>> + }
>>>>>> + else
>>>>>> + return \"strd\\t%1, %0, [%2, %3]\";
>>>>>> + }"
>>>>>> +)
>>>>>> +
>>>>>> +(define_insn "*thumb2_strd_reg2"
>>>>>> + [(parallel [(set (mem:SI (plus:SI
>>>>>> + (match_operand:SI 2 "s_register_operand" "rk")
>>>>>> + (match_operand:SI 3 "const_int_operand" "")))
>>>>>> + (match_operand:SI 0 "s_register_operand" ""))
>>>>>> + (set (mem:SI (match_dup 2))
>>>>>> + (match_operand:SI 1 "s_register_operand" ""))])]
>>>>>> + "TARGET_THUMB2 && arm_arch7
>>>>>> + && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>>>>>> + "*
>>>>>> + {
>>>>>> + HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>>> + if (offset1 == -4)
>>>>>> + return \"strd\\t%0, %1, [%2, %3]\";
>>>>>> + else
>>>>>> + {
>>>>>> + if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>>> + operands[2], operands[3], NULL_RTX, false))
>>>>>> + return \"stmia\\t%2, {%1, %0}\";
>>>>>> + return \"strd\\t%1, %0, [%2]\";
>>>>>> + }
>>>>>> + }"
>>>>>> +)
>>>>>> +
>>>>>> +(define_peephole2
>>>>>> + [(set (match_operand:SI 2 "memory_operand" "")
>>>>>> + (match_operand:SI 0 "s_register_operand" ""))
>>>>>> + (set (match_operand:SI 3 "memory_operand" "")
>>>>>> + (match_operand:SI 1 "s_register_operand" ""))]
>>>>>> + "TARGET_THUMB2 && arm_arch7
>>>>>> + && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>>>>>> + operands[2], operands[3], false)"
>>>>>> + [(parallel [(set (match_operand:SI 2 "memory_operand" "")
>>>>>> + (match_operand:SI 0 "s_register_operand" ""))
>>>>>> + (set (match_operand:SI 3 "memory_operand" "")
>>>>>> + (match_operand:SI 1 "s_register_operand" ""))])]
>>>>>> + ""
>>>>>> +)
>>>>>> Index: arm.c
>>>>>> ===================================================================
>>>>>> --- arm.c (revision 165492)
>>>>>> +++ arm.c (working copy)
>>>>>> @@ -23254,4 +23254,134 @@ arm_builtin_support_vector_misalignment
>>>>>> is_packed);
>>>>>> }
>>>>>>
>>>>>> +/* Check the validity of operands in an ldrd/strd instruction. */
>>>>>> +bool
>>>>>> +thumb2_check_ldrd_operands (rtx off1, rtx off2)
>>>>>> +{
>>>>>> + HOST_WIDE_INT offset1 = 0;
>>>>>> + HOST_WIDE_INT offset2 = 0;
>>>>>> +
>>>>>> + if (off1 != NULL_RTX)
>>>>>> + offset1 = INTVAL (off1);
>>>>>> + if (off2 != NULL_RTX)
>>>>>> + offset2 = INTVAL (off2);
>>>>>> +
>>>>>> + /* The offset range of LDRD is [-1020, 1020]. Here we check if both
>>>>>> + offsets lie in the range [-1020, 1024]. If one of the offsets is
>>>>>> + 1024, the following condition ((offset1 + 4) == offset2) will ensure
>>>>>> + offset1 to be 1020, suitable for instruction LDRD. */
>>>>>> + if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>>>>>> + return false;
>>>>>> + if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>>>>>> + return false;
>>>>>> +
>>>>>> + if ((offset1 + 4) == offset2)
>>>>>> + return true;
>>>>>> + if ((offset2 + 4) == offset1)
>>>>>> + return true;
>>>>>> +
>>>>>> + return false;
>>>>>> +}
>>>>>> +
>>>>>> +/* Check if the two memory accesses can be merged to an ldrd/strd instruction.
>>>>>> + That is they use the same base register, and the gap between constant
>>>>>> + offsets should be 4. */
>>>>>> +bool
>>>>>> +thumb2_legitimate_ldrd_p (rtx reg1, rtx reg2, rtx mem1, rtx mem2, bool ldrd)
>>>>>> +{
>>>>>> + rtx base1, base2, op1;
>>>>>> + rtx addr1 = XEXP (mem1, 0);
>>>>>> + rtx addr2 = XEXP (mem2, 0);
>>>>>> + HOST_WIDE_INT offset1 = 0;
>>>>>> + HOST_WIDE_INT offset2 = 0;
>>>>>> +
>>>>>> + if (MEM_VOLATILE_P (mem1) || MEM_VOLATILE_P (mem2))
>>>>>> + return false;
>>>>>> +
>>>>>> + if (REG_P (addr1))
>>>>>> + base1 = addr1;
>>>>>> + else if (GET_CODE (addr1) == PLUS)
>>>>>> + {
>>>>>> + base1 = XEXP (addr1, 0);
>>>>>> + op1 = XEXP (addr1, 1);
>>>>>> + if (!REG_P (base1) || (GET_CODE (op1) != CONST_INT))
>>>>>> + return false;
>>>>>> + offset1 = INTVAL (op1);
>>>>>> + }
>>>>>> + else
>>>>>> + return false;
>>>>>> +
>>>>>> + if (REG_P (addr2))
>>>>>> + base2 = addr2;
>>>>>> + else if (GET_CODE (addr2) == PLUS)
>>>>>> + {
>>>>>> + base2 = XEXP (addr2, 0);
>>>>>> + op1 = XEXP (addr2, 1);
>>>>>> + if (!REG_P (base2) || (GET_CODE (op1) != CONST_INT))
>>>>>> + return false;
>>>>>> + offset2 = INTVAL (op1);
>>>>>> + }
>>>>>> + else
>>>>>> + return false;
>>>>>> +
>>>>>> + if (base1 != base2)
>>>>>> + return false;
>>>>>> +
>>>>>> + /* The offset range of LDRD is [-1020, 1020]. Here we check if both
>>>>>> + offsets lie in the range [-1020, 1024]. If one of the offsets is
>>>>>> + 1024, the following condition ((offset1 + 4) == offset2) will ensure
>>>>>> + offset1 to be 1020, suitable for instruction LDRD. */
>>>>>> + if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>>>>>> + return false;
>>>>>> + if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>>>>>> + return false;
>>>>>> +
>>>>>> + if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>>>>> + return false;
>>>>>> +
>>>>>> + if ((offset1 + 4) == offset2)
>>>>>> + return true;
>>>>>> + if ((offset2 + 4) == offset1)
>>>>>> + return true;
>>>>>> +
>>>>>> + return false;
>>>>>> +}
>>>>>> +
>>>>>> +/* Check if the insn can be expressed as ldm/stm with less cost. */
>>>>>> +bool
>>>>>> +thumb2_prefer_ldmstm (rtx reg1, rtx reg2, rtx base,
>>>>>> + rtx off1, rtx off2, bool ldrd)
>>>>>> +{
>>>>>> + HOST_WIDE_INT offset1 = 0;
>>>>>> + HOST_WIDE_INT offset2 = 0;
>>>>>> +
>>>>>> + if (off1 != NULL_RTX)
>>>>>> + offset1 = INTVAL (off1);
>>>>>> + if (off2 != NULL_RTX)
>>>>>> + offset2 = INTVAL (off2);
>>>>>> +
>>>>>> + if (offset1 > offset2)
>>>>>> + {
>>>>>> + rtx tmp;
>>>>>> + HOST_WIDE_INT t = offset1;
>>>>>> + offset1 = offset2;
>>>>>> + offset2 = t;
>>>>>> + tmp = reg1;
>>>>>> + reg1 = reg2;
>>>>>> + reg2 = tmp;
>>>>>> + }
>>>>>> +
>>>>>> + /* The offset of ldmdb is -8, the offset of ldmia is 0. */
>>>>>> + if ((offset1 != -8) && (offset1 != 0))
>>>>>> + return false;
>>>>>> +
>>>>>> + /* Lower register corresponds to lower memory. */
>>>>>> + if (REGNO (reg1) > REGNO (reg2))
>>>>>> + return false;
>>>>>> +
>>>>>> + /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>>>>> + cost. */
>>>>>> + return false;
>>>>>> +}
>>>>>> +
>>>>>> #include "gt-arm.h"
>>>>>> Index: arm-protos.h
>>>>>> ===================================================================
>>>>>> --- arm-protos.h (revision 165492)
>>>>>> +++ arm-protos.h (working copy)
>>>>>> @@ -150,6 +150,9 @@ extern void arm_expand_sync (enum machin
>>>>>> extern const char *arm_output_memory_barrier (rtx *);
>>>>>> extern const char *arm_output_sync_insn (rtx, rtx *);
>>>>>> extern unsigned int arm_sync_loop_insns (rtx , rtx *);
>>>>>> +extern bool thumb2_check_ldrd_operands (rtx, rtx);
>>>>>> +extern bool thumb2_legitimate_ldrd_p (rtx, rtx, rtx, rtx, bool);
>>>>>> +extern bool thumb2_prefer_ldmstm (rtx, rtx, rtx, rtx, rtx, bool);
>>>>>>
>>>>>> #if defined TREE_CODE
>>>>>> extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
>>>>>> Index: ldmstm.md
>>>>>> ===================================================================
>>>>>> --- ldmstm.md (revision 165492)
>>>>>> +++ ldmstm.md (working copy)
>>>>>> @@ -852,7 +852,7 @@ (define_insn "*ldm2_ia"
>>>>>> (set (match_operand:SI 2 "arm_hard_register_operand" "")
>>>>>> (mem:SI (plus:SI (match_dup 3)
>>>>>> (const_int 4))))])]
>>>>>> - "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>>> + "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>> "ldm%(ia%)\t%3, {%1, %2}"
>>>>>> [(set_attr "type" "load2")
>>>>>> (set_attr "predicable" "yes")])
>>>>>> @@ -901,7 +901,7 @@ (define_insn "*stm2_ia"
>>>>>> (match_operand:SI 1 "arm_hard_register_operand" ""))
>>>>>> (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
>>>>>> (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>>>>>> - "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>>> + "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>> "stm%(ia%)\t%3, {%1, %2}"
>>>>>> [(set_attr "type" "store2")
>>>>>> (set_attr "predicable" "yes")])
>>>>>> @@ -1041,7 +1041,7 @@ (define_insn "*ldm2_db"
>>>>>> (set (match_operand:SI 2 "arm_hard_register_operand" "")
>>>>>> (mem:SI (plus:SI (match_dup 3)
>>>>>> (const_int -4))))])]
>>>>>> - "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>>> + "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>> "ldm%(db%)\t%3, {%1, %2}"
>>>>>> [(set_attr "type" "load2")
>>>>>> (set_attr "predicable" "yes")])
>>>>>> @@ -1067,7 +1067,7 @@ (define_insn "*stm2_db"
>>>>>> (match_operand:SI 1 "arm_hard_register_operand" ""))
>>>>>> (set (mem:SI (plus:SI (match_dup 3) (const_int -4)))
>>>>>> (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>>>>>> - "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>>> + "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>> "stm%(db%)\t%3, {%1, %2}"
>>>>>> [(set_attr "type" "store2")
>>>>>> (set_attr "predicable" "yes")])
>>>>>>
>>>>>>
>>>>>> Index: pr40457-3.c
>>>>>> ===================================================================
>>>>>> --- pr40457-3.c (revision 165492)
>>>>>> +++ pr40457-3.c (working copy)
>>>>>> @@ -5,6 +5,7 @@ void foo(int* p)
>>>>>> {
>>>>>> p[0] = 1;
>>>>>> p[1] = 0;
>>>>>> + p[2] = 2;
>>>>>> }
>>>>>>
>>>>>> /* { dg-final { scan-assembler "stm" } } */
>>>>>> Index: pr40457-1.c
>>>>>> ===================================================================
>>>>>> --- pr40457-1.c (revision 165492)
>>>>>> +++ pr40457-1.c (working copy)
>>>>>> @@ -1,9 +1,9 @@
>>>>>> -/* { dg-options "-Os" } */
>>>>>> +/* { dg-options "-O2" } */
>>>>>> /* { dg-do compile } */
>>>>>>
>>>>>> int bar(int* p)
>>>>>> {
>>>>>> - int x = p[0] + p[1];
>>>>>> + int x = p[0] + p[1] + p[2];
>>>>>> return x;
>>>>>> }
>>>>>>
>>>>>> Index: pr40457-2.c
>>>>>> ===================================================================
>>>>>> --- pr40457-2.c (revision 165492)
>>>>>> +++ pr40457-2.c (working copy)
>>>>>> @@ -5,6 +5,7 @@ void foo(int* p)
>>>>>> {
>>>>>> p[0] = 1;
>>>>>> p[1] = 0;
>>>>>> + p[2] = 2;
>>>>>> }
>>>>>>
>>>>>> /* { dg-final { scan-assembler "stm" } } */
>>>>>> Index: pr45335.c
>>>>>> ===================================================================
>>>>>> --- pr45335.c (revision 0)
>>>>>> +++ pr45335.c (revision 0)
>>>>>> @@ -0,0 +1,22 @@
>>>>>> +/* { dg-options "-mthumb -O2" } */
>>>>>> +/* { dg-require-effective-target arm_thumb2_ok } */
>>>>>> +/* { dg-final { scan-assembler "ldrd" } } */
>>>>>> +/* { dg-final { scan-assembler "strd" } } */
>>>>>> +
>>>>>> +struct S
>>>>>> +{
>>>>>> + void* p1;
>>>>>> + void* p2;
>>>>>> + void* p3;
>>>>>> + void* p4;
>>>>>> +};
>>>>>> +
>>>>>> +extern printf(char*, ...);
>>>>>> +
>>>>>> +void foo1(struct S* fp, struct S* otherSaveArea)
>>>>>> +{
>>>>>> + struct S* saveA = fp - 1;
>>>>>> + printf("StackSaveArea for fp %p [%p/%p]:\n", fp, saveA, otherSaveArea);
>>>>>> + printf("prevFrame=%p savedPc=%p meth=%p curPc=%p fp[0]=0x%08x\n",
>>>>>> + saveA->p1, saveA->p2, saveA->p3, saveA->p4, *(unsigned int*)fp);
>>>>>> +}
>>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the Gcc-patches
mailing list