[PATCH: ARM] PR 45335 Use ldrd and strd to access two consecutive words

Carrot Wei carrot@google.com
Tue Dec 14 22:58:00 GMT 2010


ping

On Mon, Nov 29, 2010 at 2:32 PM, Carrot Wei <carrot@google.com> wrote:
> ping
>
> On Mon, Nov 22, 2010 at 3:16 PM, Carrot Wei <carrot@google.com> wrote:
>> ping
>>
>> On Sun, Oct 31, 2010 at 2:22 AM, Carrot Wei <carrot@google.com> wrote:
>>> Ping
>>>
>>> On Sun, Oct 24, 2010 at 9:46 PM, Carrot Wei <carrot@google.com> wrote:
>>>> Ping
>>>>
>>>> On Sat, Oct 16, 2010 at 8:27 PM, Carrot Wei <carrot@google.com> wrote:
>>>>> On Wed, Oct 13, 2010 at 7:01 PM, Paul Brook <paul@codesourcery.com> wrote:
>>>>>>> ChangeLog:
>>>>>>> 2010-09-04  Wei Guozhi  <carrot@google.com>
>>>>>>>
>>>>>>>         PR target/45335
>>>>>>>         * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>>>>>         thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>>>>>         peephole2.
>>>>>>>         (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>>>>>         New insn pattern and related peephole2.
>>>>>>>         * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>>>>>         (thumb2_check_ldrd_operands): New function.
>>>>>>>         (thumb2_prefer_ldmstm): New function.
>>>>>>>         * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New
>>>>>>> prototype. (thumb2_check_ldrd_operands): New prototype.
>>>>>>>         (thumb2_prefer_ldmstm): New prototype.
>>>>>>>         * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>>>>>         Change the ldm/stm patterns with 2 words to ARM only.
>>>>>>>         * gcc/config/arm/constraints.md (Py): New thumb2 constant
>>>>>>> constraint suitable to ldrd/strd instructions.
>>>>>>
>>>>>> Not ok.
>>>>>>
>>>>>> Why is this restricted to Thumb mode? The ARM variant of ldrd isn't quite as
>>>>>> flexible, but still provides a useful improvement over ldm.
>>>>>>
>>>>> I agree the ARM version is also useful. But it brings much less
>>>>> benefit with too much complexity (due to more restriction and insn
>>>>> pattern conflict with ldm). So I will leave it as a future
>>>>> improvement.
>>>>>
>>>>>> This transformation is only valid on ARMv7 cores. On earlier hardware
>>>>>> (depending on system configuration) it may cause undefined behavior or an
>>>>>> alignment trap.
>>>>>>
>>>>> done.
>>>>>
>>>>>> The range on -1020 to +1024 is used in several places, but without any
>>>>>> apparent explanation of why it's different to the range of an ldrd
>>>>>> instruction.  I figured it out eventually, but it deserves a comment.
>>>>>>
>>>>> Comments added.
>>>>>
>>>>>>> +  "TARGET_THUMB2 && thumb2_check_ldrd_operands (operands[0], operands[1],
>>>>>>> +                                           operands[2], 0, operands[3], 1)"
>>>>>>
>>>>>> Passed operands do not match expected types. Specifically "0" is not an rtx
>>>>>> (should be "NULL_RTX"), and "1" is not a boolean value (should be "true").
>>>>>> Many other occurrences.
>>>>>>
>>>>> Fixed.
>>>>>
>>>>>>> +(define_constraint "Py"
>>>>>>> +  "@internal In Thumb-2 state a constant that is a multiple of 4 in the
>>>>>>> +   range -1020 to 1024"
>>>>>>
>>>>>> This comment seems particularly pointless. You should mention why this
>>>>>> exists/where it is used.
>>>>>>
>>>>>> I think you're better off enforcing this in the insn condition, and remove
>>>>>> this constraint. At least half the uses (the -reg[12] insns) are incorrect,
>>>>>> and you already need the condition to enforce the dependency between the
>>>>>> operands.
>>>>>>
>>>>> I removed this constraint and add the check to insn condition.
>>>>>
>>>>>>> +thumb2_check_ldrd_operands (rtx reg1, rtx reg2, rtx base,
>>>>>>>...
>>>>>>> +  if (ldrd && (reg1 == reg2))
>>>>>>> +    return false;
>>>>>>
>>>>>> This function is part of the instruction condition.  Instruction conditions
>>>>>> must not be used to enforce register allocation.
>>>>>>
>>>>> removed.
>>>>>
>>>>>>> +thumb2_legitimate_ldrd_p (
>>>>>>>...
>>>>>>> +  if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>>>>>> +    return false;
>>>>>>
>>>>>> You're incorrectly assuming offset1 < offset2, which might not be true at this
>>>>>> point.
>>>>>>
>>>>> The following check assumes offset1 < offset2
>>>>> +  if ((offset1 + 4) == offset2)
>>>>> +    return true;
>>>>>
>>>>> And another check assumes offset2 < offset1, so both cases are covered.
>>>>> +  if ((offset2 + 4) == offset1)
>>>>> +    return true;
>>>>>
>>>>>>> +  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>>>>>> +     cost.  */
>>>>>>> +  return false;
>>>>>>
>>>>>> Code clearly doesn't match the comment.  In fact this function always returns
>>>>>> false.
>>>>>>
>>>>> Richard mentioned that in some cases (specifically cortex A9) ldm has
>>>>> less cost than ldrd and we should model this in the insn pattern. This
>>>>> function is used for this. But I don't know the cortex A9 architecture
>>>>> detail, so it should be filled by somebody with more knowledge about
>>>>> it in future.
>>>>>
>>>>> Wei Guozhi
>>>>>
>>>>>
>>>>> ChangeLog:
>>>>> 2010-10-16  Wei Guozhi  <carrot@google.com>
>>>>>
>>>>>        PR target/45335
>>>>>        * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>>>        thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>>>        peephole2.
>>>>>        (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>>>        New insn pattern and related peephole2.
>>>>>        * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>>>        (thumb2_check_ldrd_operands): New function.
>>>>>        (thumb2_prefer_ldmstm): New function.
>>>>>        * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New prototype.
>>>>>        (thumb2_check_ldrd_operands): New prototype.
>>>>>        (thumb2_prefer_ldmstm): New prototype.
>>>>>        * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>>>        Change the ldm/stm patterns with 2 words to ARM only.
>>>>>
>>>>>
>>>>> 2010-10-16  Wei Guozhi  <carrot@google.com>
>>>>>
>>>>>        PR target/45335
>>>>>        * gcc.target/arm/pr45335.c: New test.
>>>>>        * gcc.target/arm/pr40457-1.c: Changed to load 3 words.
>>>>>        * gcc.target/arm/pr40457-2.c: Changed to store 3 words.
>>>>>        * gcc.target/arm/pr40457-3.c: Changed to store 3 words.
>>>>>
>>>>>
>>>>> Index: thumb2.md
>>>>> ===================================================================
>>>>> --- thumb2.md   (revision 165492)
>>>>> +++ thumb2.md   (working copy)
>>>>> @@ -1118,3 +1118,228 @@ (define_peephole2
>>>>>   "
>>>>>   operands[2] = GEN_INT (32 - INTVAL (operands[2]));
>>>>>   ")
>>>>> +
>>>>> +(define_insn "*thumb2_ldrd"
>>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>> +                  (mem:SI (plus:SI
>>>>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>>>>> +                               (match_operand:SI 3 "const_int_operand" ""))))
>>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>>> +                  (mem:SI (plus:SI (match_dup 2)
>>>>> +                            (match_operand:SI 4 "const_int_operand" ""))))])]
>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>> +   && thumb2_check_ldrd_operands (operands[3], operands[4])"
>>>>> +  "*
>>>>> +  {
>>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>>>>> +    if (offset1 > offset2)
>>>>> +      {
>>>>> +       /* Swap the operands so that memory [base+offset1] is loaded into
>>>>> +          operands[0].  */
>>>>> +       rtx tmp = operands[0];
>>>>> +       operands[0] = operands[1];
>>>>> +       operands[1] = tmp;
>>>>> +       tmp = operands[3];
>>>>> +       operands[3] = operands[4];
>>>>> +       operands[4] = tmp;
>>>>> +       offset1 = INTVAL (operands[3]);
>>>>> +       offset2 = INTVAL (operands[4]);
>>>>> +      }
>>>>> +    if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> +                             operands[2], operands[3], operands[4], true))
>>>>> +      return \"ldmdb\\t%2, {%0, %1}\";
>>>>> +    else if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>> +      {
>>>>> +       if (offset1 <= -256)
>>>>> +         {
>>>>> +           output_asm_insn (\"sub\\t%2, %2, %n3\", operands);
>>>>> +           output_asm_insn (\"ldr\\t%1, [%2, #4]\", operands);
>>>>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>> +         }
>>>>> +       else
>>>>> +         {
>>>>> +           output_asm_insn (\"ldr\\t%1, [%2, %4]\", operands);
>>>>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>> +         }
>>>>> +       return \"\";
>>>>> +      }
>>>>> +    else
>>>>> +      return \"ldrd\\t%0, %1, [%2, %3]\";
>>>>> +  }"
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_ldrd_reg1"
>>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>> +                  (mem:SI (match_operand:SI 2 "s_register_operand" "rk")))
>>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>>> +                  (mem:SI (plus:SI (match_dup 2)
>>>>> +                            (match_operand:SI 3 "const_int_operand" ""))))])]
>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>> +   && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>>>>> +  "*
>>>>> +  {
>>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>>>>> +    if (offset2 == 4)
>>>>> +      {
>>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> +                                 operands[2], NULL_RTX, operands[3], true))
>>>>> +         return \"ldmia\\t%2, {%0, %1}\";
>>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>> +         {
>>>>> +           output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>>>>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>> +           return \"\";
>>>>> +         }
>>>>> +       return \"ldrd\\t%0, %1, [%2]\";
>>>>> +      }
>>>>> +    else
>>>>> +      {
>>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[1]))
>>>>> +         {
>>>>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>> +           output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>>>>> +         }
>>>>> +       return \"ldrd\\t%1, %0, [%2, %3]\";
>>>>> +      }
>>>>> +  }"
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_ldrd_reg2"
>>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>> +                  (mem:SI (plus:SI
>>>>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>>>>> +                               (match_operand:SI 3 "const_int_operand" ""))))
>>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>>> +                  (mem:SI (match_dup 2)))])]
>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>> +   && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>>>>> +  "*
>>>>> +  {
>>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>> +    if (offset1 == -4)
>>>>> +      {
>>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>> +         {
>>>>> +           output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>>>>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>> +           return \"\";
>>>>> +         }
>>>>> +       return \"ldrd\\t%0, %1, [%2, %3]\";
>>>>> +      }
>>>>> +    else
>>>>> +      {
>>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> +                                 operands[2], operands[3], NULL_RTX, true))
>>>>> +         return \"ldmia\\t%2, {%1, %0}\";
>>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[1]))
>>>>> +         {
>>>>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>> +           output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>>>>> +           return \"\";
>>>>> +         }
>>>>> +       return \"ldrd\\t%1, %0, [%2]\";
>>>>> +      }
>>>>> +  }"
>>>>> +)
>>>>> +
>>>>> +(define_peephole2
>>>>> +  [(set (match_operand:SI 0 "s_register_operand" "")
>>>>> +       (match_operand:SI 2 "memory_operand" ""))
>>>>> +   (set (match_operand:SI 1 "s_register_operand" "")
>>>>> +       (match_operand:SI 3 "memory_operand" ""))]
>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>> +   && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>>>>> +                               operands[2], operands[3], true)"
>>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>> +                  (match_operand:SI 2 "memory_operand" ""))
>>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>>> +                  (match_operand:SI 3 "memory_operand" ""))])]
>>>>> +  ""
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_strd"
>>>>> +  [(parallel [(set (mem:SI
>>>>> +                       (plus:SI (match_operand:SI 2 "s_register_operand" "rk")
>>>>> +                                (match_operand:SI 3 "const_int_operand" "")))
>>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>>> +             (set (mem:SI (plus:SI (match_dup 2)
>>>>> +                                (match_operand:SI 4 "const_int_operand" "")))
>>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>> +   && thumb2_check_ldrd_operands (operands[3], operands[4])"
>>>>> +  "*
>>>>> +  {
>>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>>>>> +    if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> +                             operands[2], operands[3], operands[4], false))
>>>>> +      return \"stmdb\\t%2, {%0, %1}\";
>>>>> +    if (offset1 < offset2)
>>>>> +      return \"strd\\t%0, %1, [%2, %3]\";
>>>>> +    else
>>>>> +      return \"strd\\t%1, %0, [%2, %4]\";
>>>>> +  }"
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_strd_reg1"
>>>>> +  [(parallel [(set (mem:SI (match_operand:SI 2 "s_register_operand" "rk"))
>>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>>> +             (set (mem:SI (plus:SI (match_dup 2)
>>>>> +                               (match_operand:SI 3 "const_int_operand" "")))
>>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>> +   && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>>>>> +  "*
>>>>> +  {
>>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>>>>> +    if (offset2 == 4)
>>>>> +      {
>>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> +                                 operands[2], NULL_RTX, operands[3], false))
>>>>> +         return \"stmia\\t%2, {%0, %1}\";
>>>>> +       return \"strd\\t%0, %1, [%2]\";
>>>>> +      }
>>>>> +    else
>>>>> +      return \"strd\\t%1, %0, [%2, %3]\";
>>>>> +  }"
>>>>> +)
>>>>> +
>>>>> +(define_insn "*thumb2_strd_reg2"
>>>>> +  [(parallel [(set (mem:SI (plus:SI
>>>>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>>>>> +                               (match_operand:SI 3 "const_int_operand" "")))
>>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>>> +             (set (mem:SI (match_dup 2))
>>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>> +   && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>>>>> +  "*
>>>>> +  {
>>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>> +    if (offset1 == -4)
>>>>> +      return \"strd\\t%0, %1, [%2, %3]\";
>>>>> +    else
>>>>> +      {
>>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>> +                                 operands[2], operands[3], NULL_RTX, false))
>>>>> +         return \"stmia\\t%2, {%1, %0}\";
>>>>> +       return \"strd\\t%1, %0, [%2]\";
>>>>> +      }
>>>>> +  }"
>>>>> +)
>>>>> +
>>>>> +(define_peephole2
>>>>> +  [(set (match_operand:SI 2 "memory_operand" "")
>>>>> +       (match_operand:SI 0 "s_register_operand" ""))
>>>>> +   (set (match_operand:SI 3 "memory_operand" "")
>>>>> +       (match_operand:SI 1 "s_register_operand" ""))]
>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>> +   && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>>>>> +                               operands[2], operands[3], false)"
>>>>> +  [(parallel [(set (match_operand:SI 2 "memory_operand" "")
>>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>>> +             (set (match_operand:SI 3 "memory_operand" "")
>>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>>> +  ""
>>>>> +)
>>>>> Index: arm.c
>>>>> ===================================================================
>>>>> --- arm.c       (revision 165492)
>>>>> +++ arm.c       (working copy)
>>>>> @@ -23254,4 +23254,134 @@ arm_builtin_support_vector_misalignment
>>>>>                                                      is_packed);
>>>>>  }
>>>>>
>>>>> +/* Check the validity of operands in an ldrd/strd instruction.  */
>>>>> +bool
>>>>> +thumb2_check_ldrd_operands (rtx off1, rtx off2)
>>>>> +{
>>>>> +  HOST_WIDE_INT offset1 = 0;
>>>>> +  HOST_WIDE_INT offset2 = 0;
>>>>> +
>>>>> +  if (off1 != NULL_RTX)
>>>>> +    offset1 = INTVAL (off1);
>>>>> +  if (off2 != NULL_RTX)
>>>>> +    offset2 = INTVAL (off2);
>>>>> +
>>>>> +  /* The offset range of LDRD is [-1020, 1020]. Here we check if both
>>>>> +     offsets lie in the range [-1020, 1024]. If one of the offsets is
>>>>> +     1024, the following condition ((offset1 + 4) == offset2) will ensure
>>>>> +     offset1 to be 1020, suitable for instruction LDRD.  */
>>>>> +  if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>>>>> +    return false;
>>>>> +  if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>>>>> +    return false;
>>>>> +
>>>>> +  if ((offset1 + 4) == offset2)
>>>>> +    return true;
>>>>> +  if ((offset2 + 4) == offset1)
>>>>> +    return true;
>>>>> +
>>>>> +  return false;
>>>>> +}
>>>>> +
>>>>> +/* Check if the two memory accesses can be merged to an ldrd/strd instruction.
>>>>> +   That is they use the same base register, and the gap between constant
>>>>> +   offsets should be 4.  */
>>>>> +bool
>>>>> +thumb2_legitimate_ldrd_p (rtx reg1, rtx reg2, rtx mem1, rtx mem2, bool ldrd)
>>>>> +{
>>>>> +  rtx base1, base2, op1;
>>>>> +  rtx addr1 = XEXP (mem1, 0);
>>>>> +  rtx addr2 = XEXP (mem2, 0);
>>>>> +  HOST_WIDE_INT offset1 = 0;
>>>>> +  HOST_WIDE_INT offset2 = 0;
>>>>> +
>>>>> +  if (MEM_VOLATILE_P (mem1) || MEM_VOLATILE_P (mem2))
>>>>> +    return false;
>>>>> +
>>>>> +  if (REG_P (addr1))
>>>>> +    base1 = addr1;
>>>>> +  else if (GET_CODE (addr1) == PLUS)
>>>>> +    {
>>>>> +      base1 = XEXP (addr1, 0);
>>>>> +      op1 = XEXP (addr1, 1);
>>>>> +      if (!REG_P (base1) || (GET_CODE (op1) != CONST_INT))
>>>>> +       return false;
>>>>> +      offset1 = INTVAL (op1);
>>>>> +    }
>>>>> +  else
>>>>> +    return false;
>>>>> +
>>>>> +  if (REG_P (addr2))
>>>>> +    base2 = addr2;
>>>>> +  else if (GET_CODE (addr2) == PLUS)
>>>>> +    {
>>>>> +      base2 = XEXP (addr2, 0);
>>>>> +      op1 = XEXP (addr2, 1);
>>>>> +      if (!REG_P (base2) || (GET_CODE (op1) != CONST_INT))
>>>>> +       return false;
>>>>> +      offset2 = INTVAL (op1);
>>>>> +    }
>>>>> +  else
>>>>> +    return false;
>>>>> +
>>>>> +  if (base1 != base2)
>>>>> +    return false;
>>>>> +
>>>>> +  /* The offset range of LDRD is [-1020, 1020]. Here we check if both
>>>>> +     offsets lie in the range [-1020, 1024]. If one of the offsets is
>>>>> +     1024, the following condition ((offset1 + 4) == offset2) will ensure
>>>>> +     offset1 to be 1020, suitable for instruction LDRD.  */
>>>>> +  if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>>>>> +    return false;
>>>>> +  if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>>>>> +    return false;
>>>>> +
>>>>> +  if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>>>> +    return false;
>>>>> +
>>>>> +  if ((offset1 + 4) == offset2)
>>>>> +    return true;
>>>>> +  if ((offset2 + 4) == offset1)
>>>>> +    return true;
>>>>> +
>>>>> +  return false;
>>>>> +}
>>>>> +
>>>>> +/* Check if the insn can be expressed as ldm/stm with less cost.  */
>>>>> +bool
>>>>> +thumb2_prefer_ldmstm (rtx reg1, rtx reg2, rtx base,
>>>>> +                     rtx off1, rtx off2, bool ldrd)
>>>>> +{
>>>>> +  HOST_WIDE_INT offset1 = 0;
>>>>> +  HOST_WIDE_INT offset2 = 0;
>>>>> +
>>>>> +  if (off1 != NULL_RTX)
>>>>> +    offset1 = INTVAL (off1);
>>>>> +  if (off2 != NULL_RTX)
>>>>> +    offset2 = INTVAL (off2);
>>>>> +
>>>>> +  if (offset1 > offset2)
>>>>> +    {
>>>>> +      rtx tmp;
>>>>> +      HOST_WIDE_INT t = offset1;
>>>>> +      offset1 = offset2;
>>>>> +      offset2 = t;
>>>>> +      tmp = reg1;
>>>>> +      reg1 = reg2;
>>>>> +      reg2 = tmp;
>>>>> +    }
>>>>> +
>>>>> +  /* The offset of ldmdb is -8, the offset of ldmia is 0.  */
>>>>> +  if ((offset1 != -8) && (offset1 != 0))
>>>>> +    return false;
>>>>> +
>>>>> +  /* Lower register corresponds to lower memory.  */
>>>>> +  if (REGNO (reg1) > REGNO (reg2))
>>>>> +    return false;
>>>>> +
>>>>> +  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>>>> +     cost.  */
>>>>> +  return false;
>>>>> +}
>>>>> +
>>>>>  #include "gt-arm.h"
>>>>> Index: arm-protos.h
>>>>> ===================================================================
>>>>> --- arm-protos.h        (revision 165492)
>>>>> +++ arm-protos.h        (working copy)
>>>>> @@ -150,6 +150,9 @@ extern void arm_expand_sync (enum machin
>>>>>  extern const char *arm_output_memory_barrier (rtx *);
>>>>>  extern const char *arm_output_sync_insn (rtx, rtx *);
>>>>>  extern unsigned int arm_sync_loop_insns (rtx , rtx *);
>>>>> +extern bool thumb2_check_ldrd_operands (rtx, rtx);
>>>>> +extern bool thumb2_legitimate_ldrd_p (rtx, rtx, rtx, rtx, bool);
>>>>> +extern bool thumb2_prefer_ldmstm (rtx, rtx, rtx, rtx, rtx, bool);
>>>>>
>>>>>  #if defined TREE_CODE
>>>>>  extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
>>>>> Index: ldmstm.md
>>>>> ===================================================================
>>>>> --- ldmstm.md   (revision 165492)
>>>>> +++ ldmstm.md   (working copy)
>>>>> @@ -852,7 +852,7 @@ (define_insn "*ldm2_ia"
>>>>>      (set (match_operand:SI 2 "arm_hard_register_operand" "")
>>>>>           (mem:SI (plus:SI (match_dup 3)
>>>>>                   (const_int 4))))])]
>>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>   "ldm%(ia%)\t%3, {%1, %2}"
>>>>>   [(set_attr "type" "load2")
>>>>>    (set_attr "predicable" "yes")])
>>>>> @@ -901,7 +901,7 @@ (define_insn "*stm2_ia"
>>>>>           (match_operand:SI 1 "arm_hard_register_operand" ""))
>>>>>      (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
>>>>>           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>   "stm%(ia%)\t%3, {%1, %2}"
>>>>>   [(set_attr "type" "store2")
>>>>>    (set_attr "predicable" "yes")])
>>>>> @@ -1041,7 +1041,7 @@ (define_insn "*ldm2_db"
>>>>>      (set (match_operand:SI 2 "arm_hard_register_operand" "")
>>>>>           (mem:SI (plus:SI (match_dup 3)
>>>>>                   (const_int -4))))])]
>>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>   "ldm%(db%)\t%3, {%1, %2}"
>>>>>   [(set_attr "type" "load2")
>>>>>    (set_attr "predicable" "yes")])
>>>>> @@ -1067,7 +1067,7 @@ (define_insn "*stm2_db"
>>>>>           (match_operand:SI 1 "arm_hard_register_operand" ""))
>>>>>      (set (mem:SI (plus:SI (match_dup 3) (const_int -4)))
>>>>>           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>   "stm%(db%)\t%3, {%1, %2}"
>>>>>   [(set_attr "type" "store2")
>>>>>    (set_attr "predicable" "yes")])
>>>>>
>>>>>
>>>>> Index: pr40457-3.c
>>>>> ===================================================================
>>>>> --- pr40457-3.c (revision 165492)
>>>>> +++ pr40457-3.c (working copy)
>>>>> @@ -5,6 +5,7 @@ void foo(int* p)
>>>>>  {
>>>>>   p[0] = 1;
>>>>>   p[1] = 0;
>>>>> +  p[2] = 2;
>>>>>  }
>>>>>
>>>>>  /* { dg-final { scan-assembler "stm" } } */
>>>>> Index: pr40457-1.c
>>>>> ===================================================================
>>>>> --- pr40457-1.c (revision 165492)
>>>>> +++ pr40457-1.c (working copy)
>>>>> @@ -1,9 +1,9 @@
>>>>> -/* { dg-options "-Os" }  */
>>>>> +/* { dg-options "-O2" }  */
>>>>>  /* { dg-do compile } */
>>>>>
>>>>>  int bar(int* p)
>>>>>  {
>>>>> -  int x = p[0] + p[1];
>>>>> +  int x = p[0] + p[1] + p[2];
>>>>>   return x;
>>>>>  }
>>>>>
>>>>> Index: pr40457-2.c
>>>>> ===================================================================
>>>>> --- pr40457-2.c (revision 165492)
>>>>> +++ pr40457-2.c (working copy)
>>>>> @@ -5,6 +5,7 @@ void foo(int* p)
>>>>>  {
>>>>>   p[0] = 1;
>>>>>   p[1] = 0;
>>>>> +  p[2] = 2;
>>>>>  }
>>>>>
>>>>>  /* { dg-final { scan-assembler "stm" } } */
>>>>> Index: pr45335.c
>>>>> ===================================================================
>>>>> --- pr45335.c   (revision 0)
>>>>> +++ pr45335.c   (revision 0)
>>>>> @@ -0,0 +1,22 @@
>>>>> +/* { dg-options "-mthumb -O2" } */
>>>>> +/* { dg-require-effective-target arm_thumb2_ok } */
>>>>> +/* { dg-final { scan-assembler "ldrd" } } */
>>>>> +/* { dg-final { scan-assembler "strd" } } */
>>>>> +
>>>>> +struct S
>>>>> +{
>>>>> +    void* p1;
>>>>> +    void* p2;
>>>>> +    void* p3;
>>>>> +    void* p4;
>>>>> +};
>>>>> +
>>>>> +extern printf(char*, ...);
>>>>> +
>>>>> +void foo1(struct S* fp, struct S* otherSaveArea)
>>>>> +{
>>>>> +    struct S* saveA = fp - 1;
>>>>> +    printf("StackSaveArea for fp %p [%p/%p]:\n", fp, saveA, otherSaveArea);
>>>>> +    printf("prevFrame=%p savedPc=%p meth=%p curPc=%p fp[0]=0x%08x\n",
>>>>> +        saveA->p1, saveA->p2, saveA->p3, saveA->p4, *(unsigned int*)fp);
>>>>> +}
>>>>>
>>>>
>>>
>>
>



More information about the Gcc-patches mailing list