This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH: ARM] PR 45335 Use ldrd and strd to access two consecutive words


ping

On Mon, Nov 29, 2010 at 2:32 PM, Carrot Wei <carrot@google.com> wrote:
> ping
>
> On Mon, Nov 22, 2010 at 3:16 PM, Carrot Wei <carrot@google.com> wrote:
>> ping
>>
>> On Sun, Oct 31, 2010 at 2:22 AM, Carrot Wei <carrot@google.com> wrote:
>>> Ping
>>>
>>> On Sun, Oct 24, 2010 at 9:46 PM, Carrot Wei <carrot@google.com> wrote:
>>>> Ping
>>>>
>>>> On Sat, Oct 16, 2010 at 8:27 PM, Carrot Wei <carrot@google.com> wrote:
>>>>> On Wed, Oct 13, 2010 at 7:01 PM, Paul Brook <paul@codesourcery.com> wrote:
>>>>>>> ChangeLog:
>>>>>>> 2010-09-04 ?Wei Guozhi ?<carrot@google.com>
>>>>>>>
>>>>>>> ? ? ? ? PR target/45335
>>>>>>> ? ? ? ? * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>>>>> ? ? ? ? thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>>>>> ? ? ? ? peephole2.
>>>>>>> ? ? ? ? (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>>>>> ? ? ? ? New insn pattern and related peephole2.
>>>>>>> ? ? ? ? * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>>>>> ? ? ? ? (thumb2_check_ldrd_operands): New function.
>>>>>>> ? ? ? ? (thumb2_prefer_ldmstm): New function.
>>>>>>> ? ? ? ? * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New
>>>>>>> prototype. (thumb2_check_ldrd_operands): New prototype.
>>>>>>> ? ? ? ? (thumb2_prefer_ldmstm): New prototype.
>>>>>>> ? ? ? ? * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>>>>> ? ? ? ? Change the ldm/stm patterns with 2 words to ARM only.
>>>>>>> ? ? ? ? * gcc/config/arm/constraints.md (Py): New thumb2 constant
>>>>>>> constraint suitable to ldrd/strd instructions.
>>>>>>
>>>>>> Not ok.
>>>>>>
>>>>>> Why is this restricted to Thumb mode? The ARM variant of ldrd isn't quite as
>>>>>> flexible, but still provides a useful improvement over ldm.
>>>>>>
>>>>> I agree the ARM version is also useful. But it brings much less
>>>>> benefit with too much complexity (due to more restriction and insn
>>>>> pattern conflict with ldm). So I will leave it as a future
>>>>> improvement.
>>>>>
>>>>>> This transformation is only valid on ARMv7 cores. On earlier hardware
>>>>>> (depending on system configuration) it may cause undefined behavior or an
>>>>>> alignment trap.
>>>>>>
>>>>> done.
>>>>>
>>>>>> The range on -1020 to +1024 is used in several places, but without any
>>>>>> apparent explanation of why it's different to the range of an ldrd
>>>>>> instruction. ?I figured it out eventually, but it deserves a comment.
>>>>>>
>>>>> Comments added.
>>>>>
>>>>>>> + ?"TARGET_THUMB2 && thumb2_check_ldrd_operands (operands[0], operands[1],
>>>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? operands[2], 0, operands[3], 1)"
>>>>>>
>>>>>> Passed operands do not match expected types. Specifically "0" is not an rtx
>>>>>> (should be "NULL_RTX"), and "1" is not a boolean value (should be "true").
>>>>>> Many other occurrences.
>>>>>>
>>>>> Fixed.
>>>>>
>>>>>>> +(define_constraint "Py"
>>>>>>> + ?"@internal In Thumb-2 state a constant that is a multiple of 4 in the
>>>>>>> + ? range -1020 to 1024"
>>>>>>
>>>>>> This comment seems particularly pointless. You should mention why this
>>>>>> exists/where it is used.
>>>>>>
>>>>>> I think you're better off enforcing this in the insn condition, and remove
>>>>>> this constraint. At least half the uses (the -reg[12] insns) are incorrect,
>>>>>> and you already need the condition to enforce the dependency between the
>>>>>> operands.
>>>>>>
>>>>> I removed this constraint and add the check to insn condition.
>>>>>
>>>>>>> +thumb2_check_ldrd_operands (rtx reg1, rtx reg2, rtx base,
>>>>>>>...
>>>>>>> + ?if (ldrd && (reg1 == reg2))
>>>>>>> + ? ?return false;
>>>>>>
>>>>>> This function is part of the instruction condition. ?Instruction conditions
>>>>>> must not be used to enforce register allocation.
>>>>>>
>>>>> removed.
>>>>>
>>>>>>> +thumb2_legitimate_ldrd_p (
>>>>>>>...
>>>>>>> + ?if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>>>>>> + ? ?return false;
>>>>>>
>>>>>> You're incorrectly assuming offset1 < offset2, which might not be true at this
>>>>>> point.
>>>>>>
>>>>> The following check assumes offset1 < offset2
>>>>> + ?if ((offset1 + 4) == offset2)
>>>>> + ? ?return true;
>>>>>
>>>>> And another check assumes offset2 < offset1, so both cases are covered.
>>>>> + ?if ((offset2 + 4) == offset1)
>>>>> + ? ?return true;
>>>>>
>>>>>>> + ?/* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>>>>>> + ? ? cost. ?*/
>>>>>>> + ?return false;
>>>>>>
>>>>>> Code clearly doesn't match the comment. ?In fact this function always returns
>>>>>> false.
>>>>>>
>>>>> Richard mentioned that in some cases (specifically cortex A9) ldm has
>>>>> less cost than ldrd and we should model this in the insn pattern. This
>>>>> function is used for this. But I don't know the cortex A9 architecture
>>>>> detail, so it should be filled by somebody with more knowledge about
>>>>> it in future.
>>>>>
>>>>> Wei Guozhi
>>>>>
>>>>>
>>>>> ChangeLog:
>>>>> 2010-10-16 ?Wei Guozhi ?<carrot@google.com>
>>>>>
>>>>> ? ? ? ?PR target/45335
>>>>> ? ? ? ?* gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>>> ? ? ? ?thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>>> ? ? ? ?peephole2.
>>>>> ? ? ? ?(thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>>> ? ? ? ?New insn pattern and related peephole2.
>>>>> ? ? ? ?* gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>>> ? ? ? ?(thumb2_check_ldrd_operands): New function.
>>>>> ? ? ? ?(thumb2_prefer_ldmstm): New function.
>>>>> ? ? ? ?* gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New prototype.
>>>>> ? ? ? ?(thumb2_check_ldrd_operands): New prototype.
>>>>> ? ? ? ?(thumb2_prefer_ldmstm): New prototype.
>>>>> ? ? ? ?* gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>>> ? ? ? ?Change the ldm/stm patterns with 2 words to ARM only.
>>>>>
>>>>>
>>>>> 2010-10-16 ?Wei Guozhi ?<carrot@google.com>
>>>>>
>>>>> ? ? ? ?PR target/45335
>>>>> ? ? ? ?* gcc.target/arm/pr45335.c: New test.
>>>>> ? ? ? ?* gcc.target/arm/pr40457-1.c: Changed to load 3 words.
>>>>> ? ? ? ?* gcc.target/arm/pr40457-2.c: Changed to store 3 words.
>>>>> ? ? ? ?* gcc.target/arm/pr40457-3.c: Changed to store 3 words.
>>>>>
>>>>>
>>>>> Index: thumb2.md
>>>>> ===================================================================
>>>>> --- thumb2.md ? (revision 165492)
>>>>> +++ thumb2.md ? (working copy)
>>>>> @@ -1118,3 +1118,228 @@ (define_peephole2
>>>>> ? "
>>>>> ? operands[2] = GEN_INT (32 - INTVAL (operands[2]));
>>>>> ? ")
>>>>> +
>>>>> +(define_insn "*thumb2_ldrd"
>>>>> + ?[(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>> + ? ? ? ? ? ? ? ? ?(mem:SI (plus:SI
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (match_operand:SI 2 "s_register_operand" "rk")
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (match_operand:SI 3 "const_int_operand" ""))))
>>>>> + ? ? ? ? ? ? (set (match_operand:SI 1 "s_register_operand" "")
>>>>> + ? ? ? ? ? ? ? ? ?(mem:SI (plus:SI (match_dup 2)
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?(match_operand:SI 4 "const_int_operand" ""))))])]
>>>>> + ?"TARGET_THUMB2 && arm_arch7
>>>>> + ? && thumb2_check_ldrd_operands (operands[3], operands[4])"
>>>>> + ?"*
>>>>> + ?{
>>>>> + ? ?HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>> + ? ?HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>>>>> + ? ?if (offset1 > offset2)
>>>>> + ? ? ?{
>>>>> + ? ? ? /* Swap the operands so that memory [base+offset1] is loaded into
>>>>> + ? ? ? ? ?operands[0]. ?*/
>>>>> + ? ? ? rtx tmp = operands[0];
>>>>> + ? ? ? operands[0] = operands[1];
>>>>> + ? ? ? operands[1] = tmp;
>>>>> + ? ? ? tmp = operands[3];
>>>>> + ? ? ? operands[3] = operands[4];
>>>>> + ? ? ? operands[4] = tmp;
>>>>> + ? ? ? offset1 = INTVAL (operands[3]);
>>>>> + ? ? ? offset2 = INTVAL (operands[4]);
>>>>> + ? ? ?}
>>>>> + ? ?if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? operands[2], operands[3], operands[4], true))
>>>>> + ? ? ?return \"ldmdb\\t%2, {%0, %1}\";
>>>>> + ? ?else if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>> + ? ? ?{
>>>>> + ? ? ? if (offset1 <= -256)
>>>>> + ? ? ? ? {
>>>>> + ? ? ? ? ? output_asm_insn (\"sub\\t%2, %2, %n3\", operands);
>>>>> + ? ? ? ? ? output_asm_insn (\"ldr\\t%1, [%2, #4]\", operands);
>>>>> + ? ? ? ? ? output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>> + ? ? ? ? }
>>>>> + ? ? ? else
>>>>> + ? ? ? ? {
>>>>> + ? ? ? ? ? output_asm_insn (\"ldr\\t%1, [%2, %4]\", operands);
>>>>> + ? ? ? ? ? output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>> + ? ? ? ? }
>>>>> + ? ? ? return \"\";
>>>>> + ? ? ?}
>>>>> + ? ?else
>>>>> + ? ? ?return \"ldrd\\t%0, %1, [%2, %3]\";
>>>>> + ?}"
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_ldrd_reg1"
>>>>> + ?[(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>> + ? ? ? ? ? ? ? ? ?(mem:SI (match_operand:SI 2 "s_register_operand" "rk")))
>>>>> + ? ? ? ? ? ? (set (match_operand:SI 1 "s_register_operand" "")
>>>>> + ? ? ? ? ? ? ? ? ?(mem:SI (plus:SI (match_dup 2)
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?(match_operand:SI 3 "const_int_operand" ""))))])]
>>>>> + ?"TARGET_THUMB2 && arm_arch7
>>>>> + ? && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>>>>> + ?"*
>>>>> + ?{
>>>>> + ? ?HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>>>>> + ? ?if (offset2 == 4)
>>>>> + ? ? ?{
>>>>> + ? ? ? if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? operands[2], NULL_RTX, operands[3], true))
>>>>> + ? ? ? ? return \"ldmia\\t%2, {%0, %1}\";
>>>>> + ? ? ? if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>> + ? ? ? ? {
>>>>> + ? ? ? ? ? output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>>>>> + ? ? ? ? ? output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>> + ? ? ? ? ? return \"\";
>>>>> + ? ? ? ? }
>>>>> + ? ? ? return \"ldrd\\t%0, %1, [%2]\";
>>>>> + ? ? ?}
>>>>> + ? ?else
>>>>> + ? ? ?{
>>>>> + ? ? ? if (fix_cm3_ldrd && (operands[2] == operands[1]))
>>>>> + ? ? ? ? {
>>>>> + ? ? ? ? ? output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>> + ? ? ? ? ? output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>>>>> + ? ? ? ? }
>>>>> + ? ? ? return \"ldrd\\t%1, %0, [%2, %3]\";
>>>>> + ? ? ?}
>>>>> + ?}"
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_ldrd_reg2"
>>>>> + ?[(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>> + ? ? ? ? ? ? ? ? ?(mem:SI (plus:SI
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (match_operand:SI 2 "s_register_operand" "rk")
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (match_operand:SI 3 "const_int_operand" ""))))
>>>>> + ? ? ? ? ? ? (set (match_operand:SI 1 "s_register_operand" "")
>>>>> + ? ? ? ? ? ? ? ? ?(mem:SI (match_dup 2)))])]
>>>>> + ?"TARGET_THUMB2 && arm_arch7
>>>>> + ? && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>>>>> + ?"*
>>>>> + ?{
>>>>> + ? ?HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>> + ? ?if (offset1 == -4)
>>>>> + ? ? ?{
>>>>> + ? ? ? if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>> + ? ? ? ? {
>>>>> + ? ? ? ? ? output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>>>>> + ? ? ? ? ? output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>> + ? ? ? ? ? return \"\";
>>>>> + ? ? ? ? }
>>>>> + ? ? ? return \"ldrd\\t%0, %1, [%2, %3]\";
>>>>> + ? ? ?}
>>>>> + ? ?else
>>>>> + ? ? ?{
>>>>> + ? ? ? if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? operands[2], operands[3], NULL_RTX, true))
>>>>> + ? ? ? ? return \"ldmia\\t%2, {%1, %0}\";
>>>>> + ? ? ? if (fix_cm3_ldrd && (operands[2] == operands[1]))
>>>>> + ? ? ? ? {
>>>>> + ? ? ? ? ? output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>> + ? ? ? ? ? output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>>>>> + ? ? ? ? ? return \"\";
>>>>> + ? ? ? ? }
>>>>> + ? ? ? return \"ldrd\\t%1, %0, [%2]\";
>>>>> + ? ? ?}
>>>>> + ?}"
>>>>> +)
>>>>> +
>>>>> +(define_peephole2
>>>>> + ?[(set (match_operand:SI 0 "s_register_operand" "")
>>>>> + ? ? ? (match_operand:SI 2 "memory_operand" ""))
>>>>> + ? (set (match_operand:SI 1 "s_register_operand" "")
>>>>> + ? ? ? (match_operand:SI 3 "memory_operand" ""))]
>>>>> + ?"TARGET_THUMB2 && arm_arch7
>>>>> + ? && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? operands[2], operands[3], true)"
>>>>> + ?[(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>> + ? ? ? ? ? ? ? ? ?(match_operand:SI 2 "memory_operand" ""))
>>>>> + ? ? ? ? ? ? (set (match_operand:SI 1 "s_register_operand" "")
>>>>> + ? ? ? ? ? ? ? ? ?(match_operand:SI 3 "memory_operand" ""))])]
>>>>> + ?""
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_strd"
>>>>> + ?[(parallel [(set (mem:SI
>>>>> + ? ? ? ? ? ? ? ? ? ? ? (plus:SI (match_operand:SI 2 "s_register_operand" "rk")
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(match_operand:SI 3 "const_int_operand" "")))
>>>>> + ? ? ? ? ? ? ? ? ?(match_operand:SI 0 "s_register_operand" ""))
>>>>> + ? ? ? ? ? ? (set (mem:SI (plus:SI (match_dup 2)
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(match_operand:SI 4 "const_int_operand" "")))
>>>>> + ? ? ? ? ? ? ? ? ?(match_operand:SI 1 "s_register_operand" ""))])]
>>>>> + ?"TARGET_THUMB2 && arm_arch7
>>>>> + ? && thumb2_check_ldrd_operands (operands[3], operands[4])"
>>>>> + ?"*
>>>>> + ?{
>>>>> + ? ?HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>> + ? ?HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>>>>> + ? ?if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? operands[2], operands[3], operands[4], false))
>>>>> + ? ? ?return \"stmdb\\t%2, {%0, %1}\";
>>>>> + ? ?if (offset1 < offset2)
>>>>> + ? ? ?return \"strd\\t%0, %1, [%2, %3]\";
>>>>> + ? ?else
>>>>> + ? ? ?return \"strd\\t%1, %0, [%2, %4]\";
>>>>> + ?}"
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_strd_reg1"
>>>>> + ?[(parallel [(set (mem:SI (match_operand:SI 2 "s_register_operand" "rk"))
>>>>> + ? ? ? ? ? ? ? ? ?(match_operand:SI 0 "s_register_operand" ""))
>>>>> + ? ? ? ? ? ? (set (mem:SI (plus:SI (match_dup 2)
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (match_operand:SI 3 "const_int_operand" "")))
>>>>> + ? ? ? ? ? ? ? ? ?(match_operand:SI 1 "s_register_operand" ""))])]
>>>>> + ?"TARGET_THUMB2 && arm_arch7
>>>>> + ? && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>>>>> + ?"*
>>>>> + ?{
>>>>> + ? ?HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>>>>> + ? ?if (offset2 == 4)
>>>>> + ? ? ?{
>>>>> + ? ? ? if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? operands[2], NULL_RTX, operands[3], false))
>>>>> + ? ? ? ? return \"stmia\\t%2, {%0, %1}\";
>>>>> + ? ? ? return \"strd\\t%0, %1, [%2]\";
>>>>> + ? ? ?}
>>>>> + ? ?else
>>>>> + ? ? ?return \"strd\\t%1, %0, [%2, %3]\";
>>>>> + ?}"
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_strd_reg2"
>>>>> + ?[(parallel [(set (mem:SI (plus:SI
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (match_operand:SI 2 "s_register_operand" "rk")
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (match_operand:SI 3 "const_int_operand" "")))
>>>>> + ? ? ? ? ? ? ? ? ?(match_operand:SI 0 "s_register_operand" ""))
>>>>> + ? ? ? ? ? ? (set (mem:SI (match_dup 2))
>>>>> + ? ? ? ? ? ? ? ? ?(match_operand:SI 1 "s_register_operand" ""))])]
>>>>> + ?"TARGET_THUMB2 && arm_arch7
>>>>> + ? && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>>>>> + ?"*
>>>>> + ?{
>>>>> + ? ?HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>> + ? ?if (offset1 == -4)
>>>>> + ? ? ?return \"strd\\t%0, %1, [%2, %3]\";
>>>>> + ? ?else
>>>>> + ? ? ?{
>>>>> + ? ? ? if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? operands[2], operands[3], NULL_RTX, false))
>>>>> + ? ? ? ? return \"stmia\\t%2, {%1, %0}\";
>>>>> + ? ? ? return \"strd\\t%1, %0, [%2]\";
>>>>> + ? ? ?}
>>>>> + ?}"
>>>>> +)
>>>>> +
>>>>> +(define_peephole2
>>>>> + ?[(set (match_operand:SI 2 "memory_operand" "")
>>>>> + ? ? ? (match_operand:SI 0 "s_register_operand" ""))
>>>>> + ? (set (match_operand:SI 3 "memory_operand" "")
>>>>> + ? ? ? (match_operand:SI 1 "s_register_operand" ""))]
>>>>> + ?"TARGET_THUMB2 && arm_arch7
>>>>> + ? && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? operands[2], operands[3], false)"
>>>>> + ?[(parallel [(set (match_operand:SI 2 "memory_operand" "")
>>>>> + ? ? ? ? ? ? ? ? ?(match_operand:SI 0 "s_register_operand" ""))
>>>>> + ? ? ? ? ? ? (set (match_operand:SI 3 "memory_operand" "")
>>>>> + ? ? ? ? ? ? ? ? ?(match_operand:SI 1 "s_register_operand" ""))])]
>>>>> + ?""
>>>>> +)
>>>>> Index: arm.c
>>>>> ===================================================================
>>>>> --- arm.c ? ? ? (revision 165492)
>>>>> +++ arm.c ? ? ? (working copy)
>>>>> @@ -23254,4 +23254,134 @@ arm_builtin_support_vector_misalignment
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?is_packed);
>>>>> ?}
>>>>>
>>>>> +/* Check the validity of operands in an ldrd/strd instruction. ?*/
>>>>> +bool
>>>>> +thumb2_check_ldrd_operands (rtx off1, rtx off2)
>>>>> +{
>>>>> + ?HOST_WIDE_INT offset1 = 0;
>>>>> + ?HOST_WIDE_INT offset2 = 0;
>>>>> +
>>>>> + ?if (off1 != NULL_RTX)
>>>>> + ? ?offset1 = INTVAL (off1);
>>>>> + ?if (off2 != NULL_RTX)
>>>>> + ? ?offset2 = INTVAL (off2);
>>>>> +
>>>>> + ?/* The offset range of LDRD is [-1020, 1020]. Here we check if both
>>>>> + ? ? offsets lie in the range [-1020, 1024]. If one of the offsets is
>>>>> + ? ? 1024, the following condition ((offset1 + 4) == offset2) will ensure
>>>>> + ? ? offset1 to be 1020, suitable for instruction LDRD. ?*/
>>>>> + ?if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>>>>> + ? ?return false;
>>>>> + ?if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>>>>> + ? ?return false;
>>>>> +
>>>>> + ?if ((offset1 + 4) == offset2)
>>>>> + ? ?return true;
>>>>> + ?if ((offset2 + 4) == offset1)
>>>>> + ? ?return true;
>>>>> +
>>>>> + ?return false;
>>>>> +}
>>>>> +
>>>>> +/* Check if the two memory accesses can be merged to an ldrd/strd instruction.
>>>>> + ? That is they use the same base register, and the gap between constant
>>>>> + ? offsets should be 4. ?*/
>>>>> +bool
>>>>> +thumb2_legitimate_ldrd_p (rtx reg1, rtx reg2, rtx mem1, rtx mem2, bool ldrd)
>>>>> +{
>>>>> + ?rtx base1, base2, op1;
>>>>> + ?rtx addr1 = XEXP (mem1, 0);
>>>>> + ?rtx addr2 = XEXP (mem2, 0);
>>>>> + ?HOST_WIDE_INT offset1 = 0;
>>>>> + ?HOST_WIDE_INT offset2 = 0;
>>>>> +
>>>>> + ?if (MEM_VOLATILE_P (mem1) || MEM_VOLATILE_P (mem2))
>>>>> + ? ?return false;
>>>>> +
>>>>> + ?if (REG_P (addr1))
>>>>> + ? ?base1 = addr1;
>>>>> + ?else if (GET_CODE (addr1) == PLUS)
>>>>> + ? ?{
>>>>> + ? ? ?base1 = XEXP (addr1, 0);
>>>>> + ? ? ?op1 = XEXP (addr1, 1);
>>>>> + ? ? ?if (!REG_P (base1) || (GET_CODE (op1) != CONST_INT))
>>>>> + ? ? ? return false;
>>>>> + ? ? ?offset1 = INTVAL (op1);
>>>>> + ? ?}
>>>>> + ?else
>>>>> + ? ?return false;
>>>>> +
>>>>> + ?if (REG_P (addr2))
>>>>> + ? ?base2 = addr2;
>>>>> + ?else if (GET_CODE (addr2) == PLUS)
>>>>> + ? ?{
>>>>> + ? ? ?base2 = XEXP (addr2, 0);
>>>>> + ? ? ?op1 = XEXP (addr2, 1);
>>>>> + ? ? ?if (!REG_P (base2) || (GET_CODE (op1) != CONST_INT))
>>>>> + ? ? ? return false;
>>>>> + ? ? ?offset2 = INTVAL (op1);
>>>>> + ? ?}
>>>>> + ?else
>>>>> + ? ?return false;
>>>>> +
>>>>> + ?if (base1 != base2)
>>>>> + ? ?return false;
>>>>> +
>>>>> + ?/* The offset range of LDRD is [-1020, 1020]. Here we check if both
>>>>> + ? ? offsets lie in the range [-1020, 1024]. If one of the offsets is
>>>>> + ? ? 1024, the following condition ((offset1 + 4) == offset2) will ensure
>>>>> + ? ? offset1 to be 1020, suitable for instruction LDRD. ?*/
>>>>> + ?if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>>>>> + ? ?return false;
>>>>> + ?if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>>>>> + ? ?return false;
>>>>> +
>>>>> + ?if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>>>> + ? ?return false;
>>>>> +
>>>>> + ?if ((offset1 + 4) == offset2)
>>>>> + ? ?return true;
>>>>> + ?if ((offset2 + 4) == offset1)
>>>>> + ? ?return true;
>>>>> +
>>>>> + ?return false;
>>>>> +}
>>>>> +
>>>>> +/* Check if the insn can be expressed as ldm/stm with less cost. ?*/
>>>>> +bool
>>>>> +thumb2_prefer_ldmstm (rtx reg1, rtx reg2, rtx base,
>>>>> + ? ? ? ? ? ? ? ? ? ? rtx off1, rtx off2, bool ldrd)
>>>>> +{
>>>>> + ?HOST_WIDE_INT offset1 = 0;
>>>>> + ?HOST_WIDE_INT offset2 = 0;
>>>>> +
>>>>> + ?if (off1 != NULL_RTX)
>>>>> + ? ?offset1 = INTVAL (off1);
>>>>> + ?if (off2 != NULL_RTX)
>>>>> + ? ?offset2 = INTVAL (off2);
>>>>> +
>>>>> + ?if (offset1 > offset2)
>>>>> + ? ?{
>>>>> + ? ? ?rtx tmp;
>>>>> + ? ? ?HOST_WIDE_INT t = offset1;
>>>>> + ? ? ?offset1 = offset2;
>>>>> + ? ? ?offset2 = t;
>>>>> + ? ? ?tmp = reg1;
>>>>> + ? ? ?reg1 = reg2;
>>>>> + ? ? ?reg2 = tmp;
>>>>> + ? ?}
>>>>> +
>>>>> + ?/* The offset of ldmdb is -8, the offset of ldmia is 0. ?*/
>>>>> + ?if ((offset1 != -8) && (offset1 != 0))
>>>>> + ? ?return false;
>>>>> +
>>>>> + ?/* Lower register corresponds to lower memory. ?*/
>>>>> + ?if (REGNO (reg1) > REGNO (reg2))
>>>>> + ? ?return false;
>>>>> +
>>>>> + ?/* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>>>> + ? ? cost. ?*/
>>>>> + ?return false;
>>>>> +}
>>>>> +
>>>>> ?#include "gt-arm.h"
>>>>> Index: arm-protos.h
>>>>> ===================================================================
>>>>> --- arm-protos.h ? ? ? ?(revision 165492)
>>>>> +++ arm-protos.h ? ? ? ?(working copy)
>>>>> @@ -150,6 +150,9 @@ extern void arm_expand_sync (enum machin
>>>>> ?extern const char *arm_output_memory_barrier (rtx *);
>>>>> ?extern const char *arm_output_sync_insn (rtx, rtx *);
>>>>> ?extern unsigned int arm_sync_loop_insns (rtx , rtx *);
>>>>> +extern bool thumb2_check_ldrd_operands (rtx, rtx);
>>>>> +extern bool thumb2_legitimate_ldrd_p (rtx, rtx, rtx, rtx, bool);
>>>>> +extern bool thumb2_prefer_ldmstm (rtx, rtx, rtx, rtx, rtx, bool);
>>>>>
>>>>> ?#if defined TREE_CODE
>>>>> ?extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
>>>>> Index: ldmstm.md
>>>>> ===================================================================
>>>>> --- ldmstm.md ? (revision 165492)
>>>>> +++ ldmstm.md ? (working copy)
>>>>> @@ -852,7 +852,7 @@ (define_insn "*ldm2_ia"
>>>>> ? ? ?(set (match_operand:SI 2 "arm_hard_register_operand" "")
>>>>> ? ? ? ? ? (mem:SI (plus:SI (match_dup 3)
>>>>> ? ? ? ? ? ? ? ? ? (const_int 4))))])]
>>>>> - ?"TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>> + ?"TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>> ? "ldm%(ia%)\t%3, {%1, %2}"
>>>>> ? [(set_attr "type" "load2")
>>>>> ? ?(set_attr "predicable" "yes")])
>>>>> @@ -901,7 +901,7 @@ (define_insn "*stm2_ia"
>>>>> ? ? ? ? ? (match_operand:SI 1 "arm_hard_register_operand" ""))
>>>>> ? ? ?(set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
>>>>> ? ? ? ? ? (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>>>>> - ?"TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>> + ?"TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>> ? "stm%(ia%)\t%3, {%1, %2}"
>>>>> ? [(set_attr "type" "store2")
>>>>> ? ?(set_attr "predicable" "yes")])
>>>>> @@ -1041,7 +1041,7 @@ (define_insn "*ldm2_db"
>>>>> ? ? ?(set (match_operand:SI 2 "arm_hard_register_operand" "")
>>>>> ? ? ? ? ? (mem:SI (plus:SI (match_dup 3)
>>>>> ? ? ? ? ? ? ? ? ? (const_int -4))))])]
>>>>> - ?"TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>> + ?"TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>> ? "ldm%(db%)\t%3, {%1, %2}"
>>>>> ? [(set_attr "type" "load2")
>>>>> ? ?(set_attr "predicable" "yes")])
>>>>> @@ -1067,7 +1067,7 @@ (define_insn "*stm2_db"
>>>>> ? ? ? ? ? (match_operand:SI 1 "arm_hard_register_operand" ""))
>>>>> ? ? ?(set (mem:SI (plus:SI (match_dup 3) (const_int -4)))
>>>>> ? ? ? ? ? (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>>>>> - ?"TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>> + ?"TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>> ? "stm%(db%)\t%3, {%1, %2}"
>>>>> ? [(set_attr "type" "store2")
>>>>> ? ?(set_attr "predicable" "yes")])
>>>>>
>>>>>
>>>>> Index: pr40457-3.c
>>>>> ===================================================================
>>>>> --- pr40457-3.c (revision 165492)
>>>>> +++ pr40457-3.c (working copy)
>>>>> @@ -5,6 +5,7 @@ void foo(int* p)
>>>>> ?{
>>>>> ? p[0] = 1;
>>>>> ? p[1] = 0;
>>>>> + ?p[2] = 2;
>>>>> ?}
>>>>>
>>>>> ?/* { dg-final { scan-assembler "stm" } } */
>>>>> Index: pr40457-1.c
>>>>> ===================================================================
>>>>> --- pr40457-1.c (revision 165492)
>>>>> +++ pr40457-1.c (working copy)
>>>>> @@ -1,9 +1,9 @@
>>>>> -/* { dg-options "-Os" } ?*/
>>>>> +/* { dg-options "-O2" } ?*/
>>>>> ?/* { dg-do compile } */
>>>>>
>>>>> ?int bar(int* p)
>>>>> ?{
>>>>> - ?int x = p[0] + p[1];
>>>>> + ?int x = p[0] + p[1] + p[2];
>>>>> ? return x;
>>>>> ?}
>>>>>
>>>>> Index: pr40457-2.c
>>>>> ===================================================================
>>>>> --- pr40457-2.c (revision 165492)
>>>>> +++ pr40457-2.c (working copy)
>>>>> @@ -5,6 +5,7 @@ void foo(int* p)
>>>>> ?{
>>>>> ? p[0] = 1;
>>>>> ? p[1] = 0;
>>>>> + ?p[2] = 2;
>>>>> ?}
>>>>>
>>>>> ?/* { dg-final { scan-assembler "stm" } } */
>>>>> Index: pr45335.c
>>>>> ===================================================================
>>>>> --- pr45335.c ? (revision 0)
>>>>> +++ pr45335.c ? (revision 0)
>>>>> @@ -0,0 +1,22 @@
>>>>> +/* { dg-options "-mthumb -O2" } */
>>>>> +/* { dg-require-effective-target arm_thumb2_ok } */
>>>>> +/* { dg-final { scan-assembler "ldrd" } } */
>>>>> +/* { dg-final { scan-assembler "strd" } } */
>>>>> +
>>>>> +struct S
>>>>> +{
>>>>> + ? ?void* p1;
>>>>> + ? ?void* p2;
>>>>> + ? ?void* p3;
>>>>> + ? ?void* p4;
>>>>> +};
>>>>> +
>>>>> +extern printf(char*, ...);
>>>>> +
>>>>> +void foo1(struct S* fp, struct S* otherSaveArea)
>>>>> +{
>>>>> + ? ?struct S* saveA = fp - 1;
>>>>> + ? ?printf("StackSaveArea for fp %p [%p/%p]:\n", fp, saveA, otherSaveArea);
>>>>> + ? ?printf("prevFrame=%p savedPc=%p meth=%p curPc=%p fp[0]=0x%08x\n",
>>>>> + ? ? ? ?saveA->p1, saveA->p2, saveA->p3, saveA->p4, *(unsigned int*)fp);
>>>>> +}
>>>>>
>>>>
>>>
>>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]