[PATCH: ARM] PR 45335 Use ldrd and strd to access two consecutive words
Carrot Wei
carrot@google.com
Tue Dec 14 22:58:00 GMT 2010
ping
On Mon, Nov 29, 2010 at 2:32 PM, Carrot Wei <carrot@google.com> wrote:
> ping
>
> On Mon, Nov 22, 2010 at 3:16 PM, Carrot Wei <carrot@google.com> wrote:
>> ping
>>
>> On Sun, Oct 31, 2010 at 2:22 AM, Carrot Wei <carrot@google.com> wrote:
>>> Ping
>>>
>>> On Sun, Oct 24, 2010 at 9:46 PM, Carrot Wei <carrot@google.com> wrote:
>>>> Ping
>>>>
>>>> On Sat, Oct 16, 2010 at 8:27 PM, Carrot Wei <carrot@google.com> wrote:
>>>>> On Wed, Oct 13, 2010 at 7:01 PM, Paul Brook <paul@codesourcery.com> wrote:
>>>>>>> ChangeLog:
>>>>>>> 2010-09-04 Wei Guozhi <carrot@google.com>
>>>>>>>
>>>>>>> PR target/45335
>>>>>>> * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>>>>> thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>>>>> peephole2.
>>>>>>> (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>>>>> New insn pattern and related peephole2.
>>>>>>> * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>>>>> (thumb2_check_ldrd_operands): New function.
>>>>>>> (thumb2_prefer_ldmstm): New function.
>>>>>>> * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New
>>>>>>> prototype. (thumb2_check_ldrd_operands): New prototype.
>>>>>>> (thumb2_prefer_ldmstm): New prototype.
>>>>>>> * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>>>>> Change the ldm/stm patterns with 2 words to ARM only.
>>>>>>> * gcc/config/arm/constraints.md (Py): New thumb2 constant
>>>>>>> constraint suitable to ldrd/strd instructions.
>>>>>>
>>>>>> Not ok.
>>>>>>
>>>>>> Why is this restricted to Thumb mode? The ARM variant of ldrd isn't quite as
>>>>>> flexible, but still provides a useful improvement over ldm.
>>>>>>
>>>>> I agree the ARM version is also useful. But it brings much less
>>>>> benefit with too much complexity (due to more restriction and insn
>>>>> pattern conflict with ldm). So I will leave it as a future
>>>>> improvement.
>>>>>
>>>>>> This transformation is only valid on ARMv7 cores. On earlier hardware
>>>>>> (depending on system configuration) it may cause undefined behavior or an
>>>>>> alignment trap.
>>>>>>
>>>>> done.
>>>>>
>>>>>> The range on -1020 to +1024 is used in several places, but without any
>>>>>> apparent explanation of why it's different to the range of an ldrd
>>>>>> instruction. I figured it out eventually, but it deserves a comment.
>>>>>>
>>>>> Comments added.
>>>>>
>>>>>>> + "TARGET_THUMB2 && thumb2_check_ldrd_operands (operands[0], operands[1],
>>>>>>> + operands[2], 0, operands[3], 1)"
>>>>>>
>>>>>> Passed operands do not match expected types. Specifically "0" is not an rtx
>>>>>> (should be "NULL_RTX"), and "1" is not a boolean value (should be "true").
>>>>>> Many other occurrences.
>>>>>>
>>>>> Fixed.
>>>>>
>>>>>>> +(define_constraint "Py"
>>>>>>> + "@internal In Thumb-2 state a constant that is a multiple of 4 in the
>>>>>>> + range -1020 to 1024"
>>>>>>
>>>>>> This comment seems particularly pointless. You should mention why this
>>>>>> exists/where it is used.
>>>>>>
>>>>>> I think you're better off enforcing this in the insn condition, and remove
>>>>>> this constraint. At least half the uses (the -reg[12] insns) are incorrect,
>>>>>> and you already need the condition to enforce the dependency between the
>>>>>> operands.
>>>>>>
>>>>> I removed this constraint and add the check to insn condition.
>>>>>
>>>>>>> +thumb2_check_ldrd_operands (rtx reg1, rtx reg2, rtx base,
>>>>>>>...
>>>>>>> + if (ldrd && (reg1 == reg2))
>>>>>>> + return false;
>>>>>>
>>>>>> This function is part of the instruction condition. Instruction conditions
>>>>>> must not be used to enforce register allocation.
>>>>>>
>>>>> removed.
>>>>>
>>>>>>> +thumb2_legitimate_ldrd_p (
>>>>>>>...
>>>>>>> + if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>>>>>> + return false;
>>>>>>
>>>>>> You're incorrectly assuming offset1 < offset2, which might not be true at this
>>>>>> point.
>>>>>>
>>>>> The following check assumes offset1 < offset2
>>>>> + if ((offset1 + 4) == offset2)
>>>>> + return true;
>>>>>
>>>>> And another check assumes offset2 < offset1, so both cases are covered.
>>>>> + if ((offset2 + 4) == offset1)
>>>>> + return true;
>>>>>
>>>>>>> + /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>>>>>> + cost. */
>>>>>>> + return false;
>>>>>>
>>>>>> Code clearly doesn't match the comment. In fact this function always returns
>>>>>> false.
>>>>>>
>>>>> Richard mentioned that in some cases (specifically cortex A9) ldm has
>>>>> less cost than ldrd and we should model this in the insn pattern. This
>>>>> function is used for this. But I don't know the cortex A9 architecture
>>>>> detail, so it should be filled by somebody with more knowledge about
>>>>> it in future.
>>>>>
>>>>> Wei Guozhi
>>>>>
>>>>>
>>>>> ChangeLog:
>>>>> 2010-10-16 Wei Guozhi <carrot@google.com>
>>>>>
>>>>> PR target/45335
>>>>> * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>>> thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>>> peephole2.
>>>>> (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>>> New insn pattern and related peephole2.
>>>>> * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>>> (thumb2_check_ldrd_operands): New function.
>>>>> (thumb2_prefer_ldmstm): New function.
>>>>> * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New prototype.
>>>>> (thumb2_check_ldrd_operands): New prototype.
>>>>> (thumb2_prefer_ldmstm): New prototype.
>>>>> * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>>> Change the ldm/stm patterns with 2 words to ARM only.
>>>>>
>>>>>
>>>>> 2010-10-16 Wei Guozhi <carrot@google.com>
>>>>>
>>>>> PR target/45335
>>>>> * gcc.target/arm/pr45335.c: New test.
>>>>> * gcc.target/arm/pr40457-1.c: Changed to load 3 words.
>>>>> * gcc.target/arm/pr40457-2.c: Changed to store 3 words.
>>>>> * gcc.target/arm/pr40457-3.c: Changed to store 3 words.
>>>>>
>>>>>
>>>>> Index: thumb2.md
>>>>> ===================================================================
>>>>> --- thumb2.md (revision 165492)
>>>>> +++ thumb2.md (working copy)
>>>>> @@ -1118,3 +1118,228 @@ (define_peephole2
>>>>> "
>>>>> operands[2] = GEN_INT (32 - INTVAL (operands[2]));
>>>>> ")
>>>>> +
>>>>> +(define_insn "*thumb2_ldrd"
>>>>> + [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>> + (mem:SI (plus:SI
>>>>> + (match_operand:SI 2 "s_register_operand" "rk")
>>>>> + (match_operand:SI 3 "const_int_operand" ""))))
>>>>> + (set (match_operand:SI 1 "s_register_operand" "")
>>>>> + (mem:SI (plus:SI (match_dup 2)
>>>>> + (match_operand:SI 4 "const_int_operand" ""))))])]
>>>>> + "TARGET_THUMB2 && arm_arch7
>>>>> + && thumb2_check_ldrd_operands (operands[3], operands[4])"
>>>>> + "*
>>>>> + {
>>>>> + HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>> + HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>>>>> + if (offset1 > offset2)
>>>>> + {
>>>>> + /* Swap the operands so that memory [base+offset1] is loaded into
>>>>> + operands[0]. */
>>>>> + rtx tmp = operands[0];
>>>>> + operands[0] = operands[1];
>>>>> + operands[1] = tmp;
>>>>> + tmp = operands[3];
>>>>> + operands[3] = operands[4];
>>>>> + operands[4] = tmp;
>>>>> + offset1 = INTVAL (operands[3]);
>>>>> + offset2 = INTVAL (operands[4]);
>>>>> + }
>>>>> + if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> + operands[2], operands[3], operands[4], true))
>>>>> + return \"ldmdb\\t%2, {%0, %1}\";
>>>>> + else if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>> + {
>>>>> + if (offset1 <= -256)
>>>>> + {
>>>>> + output_asm_insn (\"sub\\t%2, %2, %n3\", operands);
>>>>> + output_asm_insn (\"ldr\\t%1, [%2, #4]\", operands);
>>>>> + output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>> + }
>>>>> + else
>>>>> + {
>>>>> + output_asm_insn (\"ldr\\t%1, [%2, %4]\", operands);
>>>>> + output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>> + }
>>>>> + return \"\";
>>>>> + }
>>>>> + else
>>>>> + return \"ldrd\\t%0, %1, [%2, %3]\";
>>>>> + }"
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_ldrd_reg1"
>>>>> + [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>> + (mem:SI (match_operand:SI 2 "s_register_operand" "rk")))
>>>>> + (set (match_operand:SI 1 "s_register_operand" "")
>>>>> + (mem:SI (plus:SI (match_dup 2)
>>>>> + (match_operand:SI 3 "const_int_operand" ""))))])]
>>>>> + "TARGET_THUMB2 && arm_arch7
>>>>> + && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>>>>> + "*
>>>>> + {
>>>>> + HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>>>>> + if (offset2 == 4)
>>>>> + {
>>>>> + if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> + operands[2], NULL_RTX, operands[3], true))
>>>>> + return \"ldmia\\t%2, {%0, %1}\";
>>>>> + if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>> + {
>>>>> + output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>>>>> + output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>> + return \"\";
>>>>> + }
>>>>> + return \"ldrd\\t%0, %1, [%2]\";
>>>>> + }
>>>>> + else
>>>>> + {
>>>>> + if (fix_cm3_ldrd && (operands[2] == operands[1]))
>>>>> + {
>>>>> + output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>> + output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>>>>> + }
>>>>> + return \"ldrd\\t%1, %0, [%2, %3]\";
>>>>> + }
>>>>> + }"
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_ldrd_reg2"
>>>>> + [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>> + (mem:SI (plus:SI
>>>>> + (match_operand:SI 2 "s_register_operand" "rk")
>>>>> + (match_operand:SI 3 "const_int_operand" ""))))
>>>>> + (set (match_operand:SI 1 "s_register_operand" "")
>>>>> + (mem:SI (match_dup 2)))])]
>>>>> + "TARGET_THUMB2 && arm_arch7
>>>>> + && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>>>>> + "*
>>>>> + {
>>>>> + HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>> + if (offset1 == -4)
>>>>> + {
>>>>> + if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>> + {
>>>>> + output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>>>>> + output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>> + return \"\";
>>>>> + }
>>>>> + return \"ldrd\\t%0, %1, [%2, %3]\";
>>>>> + }
>>>>> + else
>>>>> + {
>>>>> + if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> + operands[2], operands[3], NULL_RTX, true))
>>>>> + return \"ldmia\\t%2, {%1, %0}\";
>>>>> + if (fix_cm3_ldrd && (operands[2] == operands[1]))
>>>>> + {
>>>>> + output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>> + output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>>>>> + return \"\";
>>>>> + }
>>>>> + return \"ldrd\\t%1, %0, [%2]\";
>>>>> + }
>>>>> + }"
>>>>> +)
>>>>> +
>>>>> +(define_peephole2
>>>>> + [(set (match_operand:SI 0 "s_register_operand" "")
>>>>> + (match_operand:SI 2 "memory_operand" ""))
>>>>> + (set (match_operand:SI 1 "s_register_operand" "")
>>>>> + (match_operand:SI 3 "memory_operand" ""))]
>>>>> + "TARGET_THUMB2 && arm_arch7
>>>>> + && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>>>>> + operands[2], operands[3], true)"
>>>>> + [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>> + (match_operand:SI 2 "memory_operand" ""))
>>>>> + (set (match_operand:SI 1 "s_register_operand" "")
>>>>> + (match_operand:SI 3 "memory_operand" ""))])]
>>>>> + ""
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_strd"
>>>>> + [(parallel [(set (mem:SI
>>>>> + (plus:SI (match_operand:SI 2 "s_register_operand" "rk")
>>>>> + (match_operand:SI 3 "const_int_operand" "")))
>>>>> + (match_operand:SI 0 "s_register_operand" ""))
>>>>> + (set (mem:SI (plus:SI (match_dup 2)
>>>>> + (match_operand:SI 4 "const_int_operand" "")))
>>>>> + (match_operand:SI 1 "s_register_operand" ""))])]
>>>>> + "TARGET_THUMB2 && arm_arch7
>>>>> + && thumb2_check_ldrd_operands (operands[3], operands[4])"
>>>>> + "*
>>>>> + {
>>>>> + HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>> + HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>>>>> + if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> + operands[2], operands[3], operands[4], false))
>>>>> + return \"stmdb\\t%2, {%0, %1}\";
>>>>> + if (offset1 < offset2)
>>>>> + return \"strd\\t%0, %1, [%2, %3]\";
>>>>> + else
>>>>> + return \"strd\\t%1, %0, [%2, %4]\";
>>>>> + }"
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_strd_reg1"
>>>>> + [(parallel [(set (mem:SI (match_operand:SI 2 "s_register_operand" "rk"))
>>>>> + (match_operand:SI 0 "s_register_operand" ""))
>>>>> + (set (mem:SI (plus:SI (match_dup 2)
>>>>> + (match_operand:SI 3 "const_int_operand" "")))
>>>>> + (match_operand:SI 1 "s_register_operand" ""))])]
>>>>> + "TARGET_THUMB2 && arm_arch7
>>>>> + && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>>>>> + "*
>>>>> + {
>>>>> + HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>>>>> + if (offset2 == 4)
>>>>> + {
>>>>> + if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> + operands[2], NULL_RTX, operands[3], false))
>>>>> + return \"stmia\\t%2, {%0, %1}\";
>>>>> + return \"strd\\t%0, %1, [%2]\";
>>>>> + }
>>>>> + else
>>>>> + return \"strd\\t%1, %0, [%2, %3]\";
>>>>> + }"
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_strd_reg2"
>>>>> + [(parallel [(set (mem:SI (plus:SI
>>>>> + (match_operand:SI 2 "s_register_operand" "rk")
>>>>> + (match_operand:SI 3 "const_int_operand" "")))
>>>>> + (match_operand:SI 0 "s_register_operand" ""))
>>>>> + (set (mem:SI (match_dup 2))
>>>>> + (match_operand:SI 1 "s_register_operand" ""))])]
>>>>> + "TARGET_THUMB2 && arm_arch7
>>>>> + && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>>>>> + "*
>>>>> + {
>>>>> + HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>> + if (offset1 == -4)
>>>>> + return \"strd\\t%0, %1, [%2, %3]\";
>>>>> + else
>>>>> + {
>>>>> + if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> + operands[2], operands[3], NULL_RTX, false))
>>>>> + return \"stmia\\t%2, {%1, %0}\";
>>>>> + return \"strd\\t%1, %0, [%2]\";
>>>>> + }
>>>>> + }"
>>>>> +)
>>>>> +
>>>>> +(define_peephole2
>>>>> + [(set (match_operand:SI 2 "memory_operand" "")
>>>>> + (match_operand:SI 0 "s_register_operand" ""))
>>>>> + (set (match_operand:SI 3 "memory_operand" "")
>>>>> + (match_operand:SI 1 "s_register_operand" ""))]
>>>>> + "TARGET_THUMB2 && arm_arch7
>>>>> + && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>>>>> + operands[2], operands[3], false)"
>>>>> + [(parallel [(set (match_operand:SI 2 "memory_operand" "")
>>>>> + (match_operand:SI 0 "s_register_operand" ""))
>>>>> + (set (match_operand:SI 3 "memory_operand" "")
>>>>> + (match_operand:SI 1 "s_register_operand" ""))])]
>>>>> + ""
>>>>> +)
>>>>> Index: arm.c
>>>>> ===================================================================
>>>>> --- arm.c (revision 165492)
>>>>> +++ arm.c (working copy)
>>>>> @@ -23254,4 +23254,134 @@ arm_builtin_support_vector_misalignment
>>>>> is_packed);
>>>>> }
>>>>>
>>>>> +/* Check the validity of operands in an ldrd/strd instruction. */
>>>>> +bool
>>>>> +thumb2_check_ldrd_operands (rtx off1, rtx off2)
>>>>> +{
>>>>> + HOST_WIDE_INT offset1 = 0;
>>>>> + HOST_WIDE_INT offset2 = 0;
>>>>> +
>>>>> + if (off1 != NULL_RTX)
>>>>> + offset1 = INTVAL (off1);
>>>>> + if (off2 != NULL_RTX)
>>>>> + offset2 = INTVAL (off2);
>>>>> +
>>>>> + /* The offset range of LDRD is [-1020, 1020]. Here we check if both
>>>>> + offsets lie in the range [-1020, 1024]. If one of the offsets is
>>>>> + 1024, the following condition ((offset1 + 4) == offset2) will ensure
>>>>> + offset1 to be 1020, suitable for instruction LDRD. */
>>>>> + if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>>>>> + return false;
>>>>> + if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>>>>> + return false;
>>>>> +
>>>>> + if ((offset1 + 4) == offset2)
>>>>> + return true;
>>>>> + if ((offset2 + 4) == offset1)
>>>>> + return true;
>>>>> +
>>>>> + return false;
>>>>> +}
>>>>> +
>>>>> +/* Check if the two memory accesses can be merged to an ldrd/strd instruction.
>>>>> + That is they use the same base register, and the gap between constant
>>>>> + offsets should be 4. */
>>>>> +bool
>>>>> +thumb2_legitimate_ldrd_p (rtx reg1, rtx reg2, rtx mem1, rtx mem2, bool ldrd)
>>>>> +{
>>>>> + rtx base1, base2, op1;
>>>>> + rtx addr1 = XEXP (mem1, 0);
>>>>> + rtx addr2 = XEXP (mem2, 0);
>>>>> + HOST_WIDE_INT offset1 = 0;
>>>>> + HOST_WIDE_INT offset2 = 0;
>>>>> +
>>>>> + if (MEM_VOLATILE_P (mem1) || MEM_VOLATILE_P (mem2))
>>>>> + return false;
>>>>> +
>>>>> + if (REG_P (addr1))
>>>>> + base1 = addr1;
>>>>> + else if (GET_CODE (addr1) == PLUS)
>>>>> + {
>>>>> + base1 = XEXP (addr1, 0);
>>>>> + op1 = XEXP (addr1, 1);
>>>>> + if (!REG_P (base1) || (GET_CODE (op1) != CONST_INT))
>>>>> + return false;
>>>>> + offset1 = INTVAL (op1);
>>>>> + }
>>>>> + else
>>>>> + return false;
>>>>> +
>>>>> + if (REG_P (addr2))
>>>>> + base2 = addr2;
>>>>> + else if (GET_CODE (addr2) == PLUS)
>>>>> + {
>>>>> + base2 = XEXP (addr2, 0);
>>>>> + op1 = XEXP (addr2, 1);
>>>>> + if (!REG_P (base2) || (GET_CODE (op1) != CONST_INT))
>>>>> + return false;
>>>>> + offset2 = INTVAL (op1);
>>>>> + }
>>>>> + else
>>>>> + return false;
>>>>> +
>>>>> + if (base1 != base2)
>>>>> + return false;
>>>>> +
>>>>> + /* The offset range of LDRD is [-1020, 1020]. Here we check if both
>>>>> + offsets lie in the range [-1020, 1024]. If one of the offsets is
>>>>> + 1024, the following condition ((offset1 + 4) == offset2) will ensure
>>>>> + offset1 to be 1020, suitable for instruction LDRD. */
>>>>> + if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>>>>> + return false;
>>>>> + if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>>>>> + return false;
>>>>> +
>>>>> + if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>>>> + return false;
>>>>> +
>>>>> + if ((offset1 + 4) == offset2)
>>>>> + return true;
>>>>> + if ((offset2 + 4) == offset1)
>>>>> + return true;
>>>>> +
>>>>> + return false;
>>>>> +}
>>>>> +
>>>>> +/* Check if the insn can be expressed as ldm/stm with less cost. */
>>>>> +bool
>>>>> +thumb2_prefer_ldmstm (rtx reg1, rtx reg2, rtx base,
>>>>> + rtx off1, rtx off2, bool ldrd)
>>>>> +{
>>>>> + HOST_WIDE_INT offset1 = 0;
>>>>> + HOST_WIDE_INT offset2 = 0;
>>>>> +
>>>>> + if (off1 != NULL_RTX)
>>>>> + offset1 = INTVAL (off1);
>>>>> + if (off2 != NULL_RTX)
>>>>> + offset2 = INTVAL (off2);
>>>>> +
>>>>> + if (offset1 > offset2)
>>>>> + {
>>>>> + rtx tmp;
>>>>> + HOST_WIDE_INT t = offset1;
>>>>> + offset1 = offset2;
>>>>> + offset2 = t;
>>>>> + tmp = reg1;
>>>>> + reg1 = reg2;
>>>>> + reg2 = tmp;
>>>>> + }
>>>>> +
>>>>> + /* The offset of ldmdb is -8, the offset of ldmia is 0. */
>>>>> + if ((offset1 != -8) && (offset1 != 0))
>>>>> + return false;
>>>>> +
>>>>> + /* Lower register corresponds to lower memory. */
>>>>> + if (REGNO (reg1) > REGNO (reg2))
>>>>> + return false;
>>>>> +
>>>>> + /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>>>> + cost. */
>>>>> + return false;
>>>>> +}
>>>>> +
>>>>> #include "gt-arm.h"
>>>>> Index: arm-protos.h
>>>>> ===================================================================
>>>>> --- arm-protos.h (revision 165492)
>>>>> +++ arm-protos.h (working copy)
>>>>> @@ -150,6 +150,9 @@ extern void arm_expand_sync (enum machin
>>>>> extern const char *arm_output_memory_barrier (rtx *);
>>>>> extern const char *arm_output_sync_insn (rtx, rtx *);
>>>>> extern unsigned int arm_sync_loop_insns (rtx , rtx *);
>>>>> +extern bool thumb2_check_ldrd_operands (rtx, rtx);
>>>>> +extern bool thumb2_legitimate_ldrd_p (rtx, rtx, rtx, rtx, bool);
>>>>> +extern bool thumb2_prefer_ldmstm (rtx, rtx, rtx, rtx, rtx, bool);
>>>>>
>>>>> #if defined TREE_CODE
>>>>> extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
>>>>> Index: ldmstm.md
>>>>> ===================================================================
>>>>> --- ldmstm.md (revision 165492)
>>>>> +++ ldmstm.md (working copy)
>>>>> @@ -852,7 +852,7 @@ (define_insn "*ldm2_ia"
>>>>> (set (match_operand:SI 2 "arm_hard_register_operand" "")
>>>>> (mem:SI (plus:SI (match_dup 3)
>>>>> (const_int 4))))])]
>>>>> - "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>> + "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>> "ldm%(ia%)\t%3, {%1, %2}"
>>>>> [(set_attr "type" "load2")
>>>>> (set_attr "predicable" "yes")])
>>>>> @@ -901,7 +901,7 @@ (define_insn "*stm2_ia"
>>>>> (match_operand:SI 1 "arm_hard_register_operand" ""))
>>>>> (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
>>>>> (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>>>>> - "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>> + "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>> "stm%(ia%)\t%3, {%1, %2}"
>>>>> [(set_attr "type" "store2")
>>>>> (set_attr "predicable" "yes")])
>>>>> @@ -1041,7 +1041,7 @@ (define_insn "*ldm2_db"
>>>>> (set (match_operand:SI 2 "arm_hard_register_operand" "")
>>>>> (mem:SI (plus:SI (match_dup 3)
>>>>> (const_int -4))))])]
>>>>> - "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>> + "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>> "ldm%(db%)\t%3, {%1, %2}"
>>>>> [(set_attr "type" "load2")
>>>>> (set_attr "predicable" "yes")])
>>>>> @@ -1067,7 +1067,7 @@ (define_insn "*stm2_db"
>>>>> (match_operand:SI 1 "arm_hard_register_operand" ""))
>>>>> (set (mem:SI (plus:SI (match_dup 3) (const_int -4)))
>>>>> (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>>>>> - "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>> + "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>> "stm%(db%)\t%3, {%1, %2}"
>>>>> [(set_attr "type" "store2")
>>>>> (set_attr "predicable" "yes")])
>>>>>
>>>>>
>>>>> Index: pr40457-3.c
>>>>> ===================================================================
>>>>> --- pr40457-3.c (revision 165492)
>>>>> +++ pr40457-3.c (working copy)
>>>>> @@ -5,6 +5,7 @@ void foo(int* p)
>>>>> {
>>>>> p[0] = 1;
>>>>> p[1] = 0;
>>>>> + p[2] = 2;
>>>>> }
>>>>>
>>>>> /* { dg-final { scan-assembler "stm" } } */
>>>>> Index: pr40457-1.c
>>>>> ===================================================================
>>>>> --- pr40457-1.c (revision 165492)
>>>>> +++ pr40457-1.c (working copy)
>>>>> @@ -1,9 +1,9 @@
>>>>> -/* { dg-options "-Os" } */
>>>>> +/* { dg-options "-O2" } */
>>>>> /* { dg-do compile } */
>>>>>
>>>>> int bar(int* p)
>>>>> {
>>>>> - int x = p[0] + p[1];
>>>>> + int x = p[0] + p[1] + p[2];
>>>>> return x;
>>>>> }
>>>>>
>>>>> Index: pr40457-2.c
>>>>> ===================================================================
>>>>> --- pr40457-2.c (revision 165492)
>>>>> +++ pr40457-2.c (working copy)
>>>>> @@ -5,6 +5,7 @@ void foo(int* p)
>>>>> {
>>>>> p[0] = 1;
>>>>> p[1] = 0;
>>>>> + p[2] = 2;
>>>>> }
>>>>>
>>>>> /* { dg-final { scan-assembler "stm" } } */
>>>>> Index: pr45335.c
>>>>> ===================================================================
>>>>> --- pr45335.c (revision 0)
>>>>> +++ pr45335.c (revision 0)
>>>>> @@ -0,0 +1,22 @@
>>>>> +/* { dg-options "-mthumb -O2" } */
>>>>> +/* { dg-require-effective-target arm_thumb2_ok } */
>>>>> +/* { dg-final { scan-assembler "ldrd" } } */
>>>>> +/* { dg-final { scan-assembler "strd" } } */
>>>>> +
>>>>> +struct S
>>>>> +{
>>>>> + void* p1;
>>>>> + void* p2;
>>>>> + void* p3;
>>>>> + void* p4;
>>>>> +};
>>>>> +
>>>>> +extern printf(char*, ...);
>>>>> +
>>>>> +void foo1(struct S* fp, struct S* otherSaveArea)
>>>>> +{
>>>>> + struct S* saveA = fp - 1;
>>>>> + printf("StackSaveArea for fp %p [%p/%p]:\n", fp, saveA, otherSaveArea);
>>>>> + printf("prevFrame=%p savedPc=%p meth=%p curPc=%p fp[0]=0x%08x\n",
>>>>> + saveA->p1, saveA->p2, saveA->p3, saveA->p4, *(unsigned int*)fp);
>>>>> +}
>>>>>
>>>>
>>>
>>
>
More information about the Gcc-patches
mailing list