[PATCH: ARM] PR 45335 Use ldrd and strd to access two consecutive words

Carrot Wei carrot@google.com
Tue Jan 4 08:57:00 GMT 2011


Happy new year!
Hope I can check in this patch in 2011


On Wed, Dec 15, 2010 at 6:00 AM, Carrot Wei <carrot@google.com> wrote:
> ping
>
> On Mon, Nov 29, 2010 at 2:32 PM, Carrot Wei <carrot@google.com> wrote:
>> ping
>>
>> On Mon, Nov 22, 2010 at 3:16 PM, Carrot Wei <carrot@google.com> wrote:
>>> ping
>>>
>>> On Sun, Oct 31, 2010 at 2:22 AM, Carrot Wei <carrot@google.com> wrote:
>>>> Ping
>>>>
>>>> On Sun, Oct 24, 2010 at 9:46 PM, Carrot Wei <carrot@google.com> wrote:
>>>>> Ping
>>>>>
>>>>> On Sat, Oct 16, 2010 at 8:27 PM, Carrot Wei <carrot@google.com> wrote:
>>>>>> On Wed, Oct 13, 2010 at 7:01 PM, Paul Brook <paul@codesourcery.com> wrote:
>>>>>>>> ChangeLog:
>>>>>>>> 2010-09-04  Wei Guozhi  <carrot@google.com>
>>>>>>>>
>>>>>>>>         PR target/45335
>>>>>>>>         * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>>>>>>         thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>>>>>>         peephole2.
>>>>>>>>         (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>>>>>>         New insn pattern and related peephole2.
>>>>>>>>         * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>>>>>>         (thumb2_check_ldrd_operands): New function.
>>>>>>>>         (thumb2_prefer_ldmstm): New function.
>>>>>>>>         * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New
>>>>>>>> prototype. (thumb2_check_ldrd_operands): New prototype.
>>>>>>>>         (thumb2_prefer_ldmstm): New prototype.
>>>>>>>>         * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>>>>>>         Change the ldm/stm patterns with 2 words to ARM only.
>>>>>>>>         * gcc/config/arm/constraints.md (Py): New thumb2 constant
>>>>>>>> constraint suitable to ldrd/strd instructions.
>>>>>>>
>>>>>>> Not ok.
>>>>>>>
>>>>>>> Why is this restricted to Thumb mode? The ARM variant of ldrd isn't quite as
>>>>>>> flexible, but still provides a useful improvement over ldm.
>>>>>>>
>>>>>> I agree the ARM version is also useful. But it brings much less
>>>>>> benefit with too much complexity (due to more restriction and insn
>>>>>> pattern conflict with ldm). So I will leave it as a future
>>>>>> improvement.
>>>>>>
>>>>>>> This transformation is only valid on ARMv7 cores. On earlier hardware
>>>>>>> (depending on system configuration) it may cause undefined behavior or an
>>>>>>> alignment trap.
>>>>>>>
>>>>>> done.
>>>>>>
>>>>>>> The range on -1020 to +1024 is used in several places, but without any
>>>>>>> apparent explanation of why it's different to the range of an ldrd
>>>>>>> instruction.  I figured it out eventually, but it deserves a comment.
>>>>>>>
>>>>>> Comments added.
>>>>>>
>>>>>>>> +  "TARGET_THUMB2 && thumb2_check_ldrd_operands (operands[0], operands[1],
>>>>>>>> +                                           operands[2], 0, operands[3], 1)"
>>>>>>>
>>>>>>> Passed operands do not match expected types. Specifically "0" is not an rtx
>>>>>>> (should be "NULL_RTX"), and "1" is not a boolean value (should be "true").
>>>>>>> Many other occurrences.
>>>>>>>
>>>>>> Fixed.
>>>>>>
>>>>>>>> +(define_constraint "Py"
>>>>>>>> +  "@internal In Thumb-2 state a constant that is a multiple of 4 in the
>>>>>>>> +   range -1020 to 1024"
>>>>>>>
>>>>>>> This comment seems particularly pointless. You should mention why this
>>>>>>> exists/where it is used.
>>>>>>>
>>>>>>> I think you're better off enforcing this in the insn condition, and remove
>>>>>>> this constraint. At least half the uses (the -reg[12] insns) are incorrect,
>>>>>>> and you already need the condition to enforce the dependency between the
>>>>>>> operands.
>>>>>>>
>>>>>> I removed this constraint and add the check to insn condition.
>>>>>>
>>>>>>>> +thumb2_check_ldrd_operands (rtx reg1, rtx reg2, rtx base,
>>>>>>>>...
>>>>>>>> +  if (ldrd && (reg1 == reg2))
>>>>>>>> +    return false;
>>>>>>>
>>>>>>> This function is part of the instruction condition.  Instruction conditions
>>>>>>> must not be used to enforce register allocation.
>>>>>>>
>>>>>> removed.
>>>>>>
>>>>>>>> +thumb2_legitimate_ldrd_p (
>>>>>>>>...
>>>>>>>> +  if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>>>>>>> +    return false;
>>>>>>>
>>>>>>> You're incorrectly assuming offset1 < offset2, which might not be true at this
>>>>>>> point.
>>>>>>>
>>>>>> The following check assumes offset1 < offset2
>>>>>> +  if ((offset1 + 4) == offset2)
>>>>>> +    return true;
>>>>>>
>>>>>> And another check assumes offset2 < offset1, so both cases are covered.
>>>>>> +  if ((offset2 + 4) == offset1)
>>>>>> +    return true;
>>>>>>
>>>>>>>> +  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>>>>>>> +     cost.  */
>>>>>>>> +  return false;
>>>>>>>
>>>>>>> Code clearly doesn't match the comment.  In fact this function always returns
>>>>>>> false.
>>>>>>>
>>>>>> Richard mentioned that in some cases (specifically cortex A9) ldm has
>>>>>> less cost than ldrd and we should model this in the insn pattern. This
>>>>>> function is used for this. But I don't know the cortex A9 architecture
>>>>>> detail, so it should be filled by somebody with more knowledge about
>>>>>> it in future.
>>>>>>
>>>>>> Wei Guozhi
>>>>>>
>>>>>>
>>>>>> ChangeLog:
>>>>>> 2010-10-16  Wei Guozhi  <carrot@google.com>
>>>>>>
>>>>>>        PR target/45335
>>>>>>        * gcc/config/arm/thumb2.md (thumb2_ldrd, thumb2_ldrd_reg1,
>>>>>>        thumb2_ldrd_reg2 and peephole2): New insn pattern and related
>>>>>>        peephole2.
>>>>>>        (thumb2_strd, thumb2_strd_reg1, thumb2_strd_reg2 and peephole2):
>>>>>>        New insn pattern and related peephole2.
>>>>>>        * gcc/config/arm/arm.c (thumb2_legitimate_ldrd_p): New function.
>>>>>>        (thumb2_check_ldrd_operands): New function.
>>>>>>        (thumb2_prefer_ldmstm): New function.
>>>>>>        * gcc/config/arm/arm-protos.h (thumb2_legitimate_ldrd_p): New prototype.
>>>>>>        (thumb2_check_ldrd_operands): New prototype.
>>>>>>        (thumb2_prefer_ldmstm): New prototype.
>>>>>>        * gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_db, stm2_db):
>>>>>>        Change the ldm/stm patterns with 2 words to ARM only.
>>>>>>
>>>>>>
>>>>>> 2010-10-16  Wei Guozhi  <carrot@google.com>
>>>>>>
>>>>>>        PR target/45335
>>>>>>        * gcc.target/arm/pr45335.c: New test.
>>>>>>        * gcc.target/arm/pr40457-1.c: Changed to load 3 words.
>>>>>>        * gcc.target/arm/pr40457-2.c: Changed to store 3 words.
>>>>>>        * gcc.target/arm/pr40457-3.c: Changed to store 3 words.
>>>>>>
>>>>>>
>>>>>> Index: thumb2.md
>>>>>> ===================================================================
>>>>>> --- thumb2.md   (revision 165492)
>>>>>> +++ thumb2.md   (working copy)
>>>>>> @@ -1118,3 +1118,228 @@ (define_peephole2
>>>>>>   "
>>>>>>   operands[2] = GEN_INT (32 - INTVAL (operands[2]));
>>>>>>   ")
>>>>>> +
>>>>>> +(define_insn "*thumb2_ldrd"
>>>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>>> +                  (mem:SI (plus:SI
>>>>>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>>>>>> +                               (match_operand:SI 3 "const_int_operand" ""))))
>>>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>>>> +                  (mem:SI (plus:SI (match_dup 2)
>>>>>> +                            (match_operand:SI 4 "const_int_operand" ""))))])]
>>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>>> +   && thumb2_check_ldrd_operands (operands[3], operands[4])"
>>>>>> +  "*
>>>>>> +  {
>>>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>>>>>> +    if (offset1 > offset2)
>>>>>> +      {
>>>>>> +       /* Swap the operands so that memory [base+offset1] is loaded into
>>>>>> +          operands[0].  */
>>>>>> +       rtx tmp = operands[0];
>>>>>> +       operands[0] = operands[1];
>>>>>> +       operands[1] = tmp;
>>>>>> +       tmp = operands[3];
>>>>>> +       operands[3] = operands[4];
>>>>>> +       operands[4] = tmp;
>>>>>> +       offset1 = INTVAL (operands[3]);
>>>>>> +       offset2 = INTVAL (operands[4]);
>>>>>> +      }
>>>>>> +    if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>>> +                             operands[2], operands[3], operands[4], true))
>>>>>> +      return \"ldmdb\\t%2, {%0, %1}\";
>>>>>> +    else if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>>> +      {
>>>>>> +       if (offset1 <= -256)
>>>>>> +         {
>>>>>> +           output_asm_insn (\"sub\\t%2, %2, %n3\", operands);
>>>>>> +           output_asm_insn (\"ldr\\t%1, [%2, #4]\", operands);
>>>>>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>>> +         }
>>>>>> +       else
>>>>>> +         {
>>>>>> +           output_asm_insn (\"ldr\\t%1, [%2, %4]\", operands);
>>>>>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>>> +         }
>>>>>> +       return \"\";
>>>>>> +      }
>>>>>> +    else
>>>>>> +      return \"ldrd\\t%0, %1, [%2, %3]\";
>>>>>> +  }"
>>>>>> +)
>>>>>> +
>>>>>> +(define_insn "*thumb2_ldrd_reg1"
>>>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>>> +                  (mem:SI (match_operand:SI 2 "s_register_operand" "rk")))
>>>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>>>> +                  (mem:SI (plus:SI (match_dup 2)
>>>>>> +                            (match_operand:SI 3 "const_int_operand" ""))))])]
>>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>>> +   && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>>>>>> +  "*
>>>>>> +  {
>>>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>>>>>> +    if (offset2 == 4)
>>>>>> +      {
>>>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>>> +                                 operands[2], NULL_RTX, operands[3], true))
>>>>>> +         return \"ldmia\\t%2, {%0, %1}\";
>>>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>>> +         {
>>>>>> +           output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>>>>>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>>> +           return \"\";
>>>>>> +         }
>>>>>> +       return \"ldrd\\t%0, %1, [%2]\";
>>>>>> +      }
>>>>>> +    else
>>>>>> +      {
>>>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[1]))
>>>>>> +         {
>>>>>> +           output_asm_insn (\"ldr\\t%0, [%2]\", operands);
>>>>>> +           output_asm_insn (\"ldr\\t%1, [%2, %3]\", operands);
>>>>>> +         }
>>>>>> +       return \"ldrd\\t%1, %0, [%2, %3]\";
>>>>>> +      }
>>>>>> +  }"
>>>>>> +)
>>>>>> +
>>>>>> +(define_insn "*thumb2_ldrd_reg2"
>>>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>>> +                  (mem:SI (plus:SI
>>>>>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>>>>>> +                               (match_operand:SI 3 "const_int_operand" ""))))
>>>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>>>> +                  (mem:SI (match_dup 2)))])]
>>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>>> +   && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>>>>>> +  "*
>>>>>> +  {
>>>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>>> +    if (offset1 == -4)
>>>>>> +      {
>>>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[0]))
>>>>>> +         {
>>>>>> +           output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>>>>>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>>> +           return \"\";
>>>>>> +         }
>>>>>> +       return \"ldrd\\t%0, %1, [%2, %3]\";
>>>>>> +      }
>>>>>> +    else
>>>>>> +      {
>>>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>>> +                                 operands[2], operands[3], NULL_RTX, true))
>>>>>> +         return \"ldmia\\t%2, {%1, %0}\";
>>>>>> +       if (fix_cm3_ldrd && (operands[2] == operands[1]))
>>>>>> +         {
>>>>>> +           output_asm_insn (\"ldr\\t%0, [%2, %3]\", operands);
>>>>>> +           output_asm_insn (\"ldr\\t%1, [%2]\", operands);
>>>>>> +           return \"\";
>>>>>> +         }
>>>>>> +       return \"ldrd\\t%1, %0, [%2]\";
>>>>>> +      }
>>>>>> +  }"
>>>>>> +)
>>>>>> +
>>>>>> +(define_peephole2
>>>>>> +  [(set (match_operand:SI 0 "s_register_operand" "")
>>>>>> +       (match_operand:SI 2 "memory_operand" ""))
>>>>>> +   (set (match_operand:SI 1 "s_register_operand" "")
>>>>>> +       (match_operand:SI 3 "memory_operand" ""))]
>>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>>> +   && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>>>>>> +                               operands[2], operands[3], true)"
>>>>>> +  [(parallel [(set (match_operand:SI 0 "s_register_operand" "")
>>>>>> +                  (match_operand:SI 2 "memory_operand" ""))
>>>>>> +             (set (match_operand:SI 1 "s_register_operand" "")
>>>>>> +                  (match_operand:SI 3 "memory_operand" ""))])]
>>>>>> +  ""
>>>>>> +)
>>>>>> +
>>>>>> +(define_insn "*thumb2_strd"
>>>>>> +  [(parallel [(set (mem:SI
>>>>>> +                       (plus:SI (match_operand:SI 2 "s_register_operand" "rk")
>>>>>> +                                (match_operand:SI 3 "const_int_operand" "")))
>>>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>>>> +             (set (mem:SI (plus:SI (match_dup 2)
>>>>>> +                                (match_operand:SI 4 "const_int_operand" "")))
>>>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>>> +   && thumb2_check_ldrd_operands (operands[3], operands[4])"
>>>>>> +  "*
>>>>>> +  {
>>>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[4]);
>>>>>> +    if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>>> +                             operands[2], operands[3], operands[4], false))
>>>>>> +      return \"stmdb\\t%2, {%0, %1}\";
>>>>>> +    if (offset1 < offset2)
>>>>>> +      return \"strd\\t%0, %1, [%2, %3]\";
>>>>>> +    else
>>>>>> +      return \"strd\\t%1, %0, [%2, %4]\";
>>>>>> +  }"
>>>>>> +)
>>>>>> +
>>>>>> +(define_insn "*thumb2_strd_reg1"
>>>>>> +  [(parallel [(set (mem:SI (match_operand:SI 2 "s_register_operand" "rk"))
>>>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>>>> +             (set (mem:SI (plus:SI (match_dup 2)
>>>>>> +                               (match_operand:SI 3 "const_int_operand" "")))
>>>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>>> +   && thumb2_check_ldrd_operands (NULL_RTX, operands[3])"
>>>>>> +  "*
>>>>>> +  {
>>>>>> +    HOST_WIDE_INT offset2 = INTVAL (operands[3]);
>>>>>> +    if (offset2 == 4)
>>>>>> +      {
>>>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>>> +                                 operands[2], NULL_RTX, operands[3], false))
>>>>>> +         return \"stmia\\t%2, {%0, %1}\";
>>>>>> +       return \"strd\\t%0, %1, [%2]\";
>>>>>> +      }
>>>>>> +    else
>>>>>> +      return \"strd\\t%1, %0, [%2, %3]\";
>>>>>> +  }"
>>>>>> +)
>>>>>> +
>>>>>> +(define_insn "*thumb2_strd_reg2"
>>>>>> +  [(parallel [(set (mem:SI (plus:SI
>>>>>> +                               (match_operand:SI 2 "s_register_operand" "rk")
>>>>>> +                               (match_operand:SI 3 "const_int_operand" "")))
>>>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>>>> +             (set (mem:SI (match_dup 2))
>>>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>>> +   && thumb2_check_ldrd_operands (operands[3], NULL_RTX)"
>>>>>> +  "*
>>>>>> +  {
>>>>>> +    HOST_WIDE_INT offset1 = INTVAL (operands[3]);
>>>>>> +    if (offset1 == -4)
>>>>>> +      return \"strd\\t%0, %1, [%2, %3]\";
>>>>>> +    else
>>>>>> +      {
>>>>>> +       if (thumb2_prefer_ldmstm (operands[0], operands[1],
>>>>>> +                                 operands[2], operands[3], NULL_RTX, false))
>>>>>> +         return \"stmia\\t%2, {%1, %0}\";
>>>>>> +       return \"strd\\t%1, %0, [%2]\";
>>>>>> +      }
>>>>>> +  }"
>>>>>> +)
>>>>>> +
>>>>>> +(define_peephole2
>>>>>> +  [(set (match_operand:SI 2 "memory_operand" "")
>>>>>> +       (match_operand:SI 0 "s_register_operand" ""))
>>>>>> +   (set (match_operand:SI 3 "memory_operand" "")
>>>>>> +       (match_operand:SI 1 "s_register_operand" ""))]
>>>>>> +  "TARGET_THUMB2 && arm_arch7
>>>>>> +   && thumb2_legitimate_ldrd_p (operands[0], operands[1],
>>>>>> +                               operands[2], operands[3], false)"
>>>>>> +  [(parallel [(set (match_operand:SI 2 "memory_operand" "")
>>>>>> +                  (match_operand:SI 0 "s_register_operand" ""))
>>>>>> +             (set (match_operand:SI 3 "memory_operand" "")
>>>>>> +                  (match_operand:SI 1 "s_register_operand" ""))])]
>>>>>> +  ""
>>>>>> +)
>>>>>> Index: arm.c
>>>>>> ===================================================================
>>>>>> --- arm.c       (revision 165492)
>>>>>> +++ arm.c       (working copy)
>>>>>> @@ -23254,4 +23254,134 @@ arm_builtin_support_vector_misalignment
>>>>>>                                                      is_packed);
>>>>>>  }
>>>>>>
>>>>>> +/* Check the validity of operands in an ldrd/strd instruction.  */
>>>>>> +bool
>>>>>> +thumb2_check_ldrd_operands (rtx off1, rtx off2)
>>>>>> +{
>>>>>> +  HOST_WIDE_INT offset1 = 0;
>>>>>> +  HOST_WIDE_INT offset2 = 0;
>>>>>> +
>>>>>> +  if (off1 != NULL_RTX)
>>>>>> +    offset1 = INTVAL (off1);
>>>>>> +  if (off2 != NULL_RTX)
>>>>>> +    offset2 = INTVAL (off2);
>>>>>> +
>>>>>> +  /* The offset range of LDRD is [-1020, 1020]. Here we check if both
>>>>>> +     offsets lie in the range [-1020, 1024]. If one of the offsets is
>>>>>> +     1024, the following condition ((offset1 + 4) == offset2) will ensure
>>>>>> +     offset1 to be 1020, suitable for instruction LDRD.  */
>>>>>> +  if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>>>>>> +    return false;
>>>>>> +  if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  if ((offset1 + 4) == offset2)
>>>>>> +    return true;
>>>>>> +  if ((offset2 + 4) == offset1)
>>>>>> +    return true;
>>>>>> +
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>> +/* Check if the two memory accesses can be merged to an ldrd/strd instruction.
>>>>>> +   That is they use the same base register, and the gap between constant
>>>>>> +   offsets should be 4.  */
>>>>>> +bool
>>>>>> +thumb2_legitimate_ldrd_p (rtx reg1, rtx reg2, rtx mem1, rtx mem2, bool ldrd)
>>>>>> +{
>>>>>> +  rtx base1, base2, op1;
>>>>>> +  rtx addr1 = XEXP (mem1, 0);
>>>>>> +  rtx addr2 = XEXP (mem2, 0);
>>>>>> +  HOST_WIDE_INT offset1 = 0;
>>>>>> +  HOST_WIDE_INT offset2 = 0;
>>>>>> +
>>>>>> +  if (MEM_VOLATILE_P (mem1) || MEM_VOLATILE_P (mem2))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  if (REG_P (addr1))
>>>>>> +    base1 = addr1;
>>>>>> +  else if (GET_CODE (addr1) == PLUS)
>>>>>> +    {
>>>>>> +      base1 = XEXP (addr1, 0);
>>>>>> +      op1 = XEXP (addr1, 1);
>>>>>> +      if (!REG_P (base1) || (GET_CODE (op1) != CONST_INT))
>>>>>> +       return false;
>>>>>> +      offset1 = INTVAL (op1);
>>>>>> +    }
>>>>>> +  else
>>>>>> +    return false;
>>>>>> +
>>>>>> +  if (REG_P (addr2))
>>>>>> +    base2 = addr2;
>>>>>> +  else if (GET_CODE (addr2) == PLUS)
>>>>>> +    {
>>>>>> +      base2 = XEXP (addr2, 0);
>>>>>> +      op1 = XEXP (addr2, 1);
>>>>>> +      if (!REG_P (base2) || (GET_CODE (op1) != CONST_INT))
>>>>>> +       return false;
>>>>>> +      offset2 = INTVAL (op1);
>>>>>> +    }
>>>>>> +  else
>>>>>> +    return false;
>>>>>> +
>>>>>> +  if (base1 != base2)
>>>>>> +    return false;
>>>>>> +
>>>>>> +  /* The offset range of LDRD is [-1020, 1020]. Here we check if both
>>>>>> +     offsets lie in the range [-1020, 1024]. If one of the offsets is
>>>>>> +     1024, the following condition ((offset1 + 4) == offset2) will ensure
>>>>>> +     offset1 to be 1020, suitable for instruction LDRD.  */
>>>>>> +  if ((offset1 > 1024) || (offset1 < -1020) || ((offset1 & 3) != 0))
>>>>>> +    return false;
>>>>>> +  if ((offset2 > 1024) || (offset2 < -1020) || ((offset2 & 3) != 0))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  if (ldrd && ((reg1 == reg2) || (reg1 == base1)))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  if ((offset1 + 4) == offset2)
>>>>>> +    return true;
>>>>>> +  if ((offset2 + 4) == offset1)
>>>>>> +    return true;
>>>>>> +
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>> +/* Check if the insn can be expressed as ldm/stm with less cost.  */
>>>>>> +bool
>>>>>> +thumb2_prefer_ldmstm (rtx reg1, rtx reg2, rtx base,
>>>>>> +                     rtx off1, rtx off2, bool ldrd)
>>>>>> +{
>>>>>> +  HOST_WIDE_INT offset1 = 0;
>>>>>> +  HOST_WIDE_INT offset2 = 0;
>>>>>> +
>>>>>> +  if (off1 != NULL_RTX)
>>>>>> +    offset1 = INTVAL (off1);
>>>>>> +  if (off2 != NULL_RTX)
>>>>>> +    offset2 = INTVAL (off2);
>>>>>> +
>>>>>> +  if (offset1 > offset2)
>>>>>> +    {
>>>>>> +      rtx tmp;
>>>>>> +      HOST_WIDE_INT t = offset1;
>>>>>> +      offset1 = offset2;
>>>>>> +      offset2 = t;
>>>>>> +      tmp = reg1;
>>>>>> +      reg1 = reg2;
>>>>>> +      reg2 = tmp;
>>>>>> +    }
>>>>>> +
>>>>>> +  /* The offset of ldmdb is -8, the offset of ldmia is 0.  */
>>>>>> +  if ((offset1 != -8) && (offset1 != 0))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  /* Lower register corresponds to lower memory.  */
>>>>>> +  if (REGNO (reg1) > REGNO (reg2))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  /* Now ldm/stm is possible. Check for special cases ldm/stm has lower
>>>>>> +     cost.  */
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>>  #include "gt-arm.h"
>>>>>> Index: arm-protos.h
>>>>>> ===================================================================
>>>>>> --- arm-protos.h        (revision 165492)
>>>>>> +++ arm-protos.h        (working copy)
>>>>>> @@ -150,6 +150,9 @@ extern void arm_expand_sync (enum machin
>>>>>>  extern const char *arm_output_memory_barrier (rtx *);
>>>>>>  extern const char *arm_output_sync_insn (rtx, rtx *);
>>>>>>  extern unsigned int arm_sync_loop_insns (rtx , rtx *);
>>>>>> +extern bool thumb2_check_ldrd_operands (rtx, rtx);
>>>>>> +extern bool thumb2_legitimate_ldrd_p (rtx, rtx, rtx, rtx, bool);
>>>>>> +extern bool thumb2_prefer_ldmstm (rtx, rtx, rtx, rtx, rtx, bool);
>>>>>>
>>>>>>  #if defined TREE_CODE
>>>>>>  extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
>>>>>> Index: ldmstm.md
>>>>>> ===================================================================
>>>>>> --- ldmstm.md   (revision 165492)
>>>>>> +++ ldmstm.md   (working copy)
>>>>>> @@ -852,7 +852,7 @@ (define_insn "*ldm2_ia"
>>>>>>      (set (match_operand:SI 2 "arm_hard_register_operand" "")
>>>>>>           (mem:SI (plus:SI (match_dup 3)
>>>>>>                   (const_int 4))))])]
>>>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>>   "ldm%(ia%)\t%3, {%1, %2}"
>>>>>>   [(set_attr "type" "load2")
>>>>>>    (set_attr "predicable" "yes")])
>>>>>> @@ -901,7 +901,7 @@ (define_insn "*stm2_ia"
>>>>>>           (match_operand:SI 1 "arm_hard_register_operand" ""))
>>>>>>      (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
>>>>>>           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>>>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>>   "stm%(ia%)\t%3, {%1, %2}"
>>>>>>   [(set_attr "type" "store2")
>>>>>>    (set_attr "predicable" "yes")])
>>>>>> @@ -1041,7 +1041,7 @@ (define_insn "*ldm2_db"
>>>>>>      (set (match_operand:SI 2 "arm_hard_register_operand" "")
>>>>>>           (mem:SI (plus:SI (match_dup 3)
>>>>>>                   (const_int -4))))])]
>>>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>>   "ldm%(db%)\t%3, {%1, %2}"
>>>>>>   [(set_attr "type" "load2")
>>>>>>    (set_attr "predicable" "yes")])
>>>>>> @@ -1067,7 +1067,7 @@ (define_insn "*stm2_db"
>>>>>>           (match_operand:SI 1 "arm_hard_register_operand" ""))
>>>>>>      (set (mem:SI (plus:SI (match_dup 3) (const_int -4)))
>>>>>>           (match_operand:SI 2 "arm_hard_register_operand" ""))])]
>>>>>> -  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
>>>>>> +  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
>>>>>>   "stm%(db%)\t%3, {%1, %2}"
>>>>>>   [(set_attr "type" "store2")
>>>>>>    (set_attr "predicable" "yes")])
>>>>>>
>>>>>>
>>>>>> Index: pr40457-3.c
>>>>>> ===================================================================
>>>>>> --- pr40457-3.c (revision 165492)
>>>>>> +++ pr40457-3.c (working copy)
>>>>>> @@ -5,6 +5,7 @@ void foo(int* p)
>>>>>>  {
>>>>>>   p[0] = 1;
>>>>>>   p[1] = 0;
>>>>>> +  p[2] = 2;
>>>>>>  }
>>>>>>
>>>>>>  /* { dg-final { scan-assembler "stm" } } */
>>>>>> Index: pr40457-1.c
>>>>>> ===================================================================
>>>>>> --- pr40457-1.c (revision 165492)
>>>>>> +++ pr40457-1.c (working copy)
>>>>>> @@ -1,9 +1,9 @@
>>>>>> -/* { dg-options "-Os" }  */
>>>>>> +/* { dg-options "-O2" }  */
>>>>>>  /* { dg-do compile } */
>>>>>>
>>>>>>  int bar(int* p)
>>>>>>  {
>>>>>> -  int x = p[0] + p[1];
>>>>>> +  int x = p[0] + p[1] + p[2];
>>>>>>   return x;
>>>>>>  }
>>>>>>
>>>>>> Index: pr40457-2.c
>>>>>> ===================================================================
>>>>>> --- pr40457-2.c (revision 165492)
>>>>>> +++ pr40457-2.c (working copy)
>>>>>> @@ -5,6 +5,7 @@ void foo(int* p)
>>>>>>  {
>>>>>>   p[0] = 1;
>>>>>>   p[1] = 0;
>>>>>> +  p[2] = 2;
>>>>>>  }
>>>>>>
>>>>>>  /* { dg-final { scan-assembler "stm" } } */
>>>>>> Index: pr45335.c
>>>>>> ===================================================================
>>>>>> --- pr45335.c   (revision 0)
>>>>>> +++ pr45335.c   (revision 0)
>>>>>> @@ -0,0 +1,22 @@
>>>>>> +/* { dg-options "-mthumb -O2" } */
>>>>>> +/* { dg-require-effective-target arm_thumb2_ok } */
>>>>>> +/* { dg-final { scan-assembler "ldrd" } } */
>>>>>> +/* { dg-final { scan-assembler "strd" } } */
>>>>>> +
>>>>>> +struct S
>>>>>> +{
>>>>>> +    void* p1;
>>>>>> +    void* p2;
>>>>>> +    void* p3;
>>>>>> +    void* p4;
>>>>>> +};
>>>>>> +
>>>>>> +extern printf(char*, ...);
>>>>>> +
>>>>>> +void foo1(struct S* fp, struct S* otherSaveArea)
>>>>>> +{
>>>>>> +    struct S* saveA = fp - 1;
>>>>>> +    printf("StackSaveArea for fp %p [%p/%p]:\n", fp, saveA, otherSaveArea);
>>>>>> +    printf("prevFrame=%p savedPc=%p meth=%p curPc=%p fp[0]=0x%08x\n",
>>>>>> +        saveA->p1, saveA->p2, saveA->p3, saveA->p4, *(unsigned int*)fp);
>>>>>> +}
>>>>>>
>>>>>
>>>>
>>>
>>
>



More information about the Gcc-patches mailing list