This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH AARCH64]load store pair optimization using sched_fusion pass.


On Sun, Dec 7, 2014 at 6:24 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Sat, Dec 6, 2014 at 6:35 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Sat, Dec 6, 2014 at 5:54 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>> On Fri, Dec 5, 2014 at 9:08 AM, Marcus Shawcroft
>>> <marcus.shawcroft@gmail.com> wrote:
>>>> On 18 November 2014 at 08:34, Bin Cheng <bin.cheng@arm.com> wrote:
>>>>
>>>>> 2014-11-18  Bin Cheng  <bin.cheng@arm.com>
>>>>>
>>>>>         * config/aarch64/aarch64.md (load_pair<mode>): Split to
>>>>>         load_pairsi, load_pairdi, load_pairsf and load_pairdf.
>>>>>         (load_pairsi, load_pairdi, load_pairsf, load_pairdf): Split
>>>>>         from load_pair<mode>.  New alternative to support int/fp
>>>>>         registers in fp/int mode patterns.
>>>>>         (store_pair<mode>:): Split to store_pairsi, store_pairdi,
>>>>>         store_pairsf and store_pairdi.
>>>>>         (store_pairsi, store_pairdi, store_pairsf, store_pairdf): Split
>>>>>         from store_pair<mode>.  New alternative to support int/fp
>>>>>         registers in fp/int mode patterns.
>>>>>         (*load_pair_extendsidi2_aarch64): New pattern.
>>>>>         (*load_pair_zero_extendsidi2_aarch64): New pattern.
>>>>>         (aarch64-ldpstp.md): Include.
>>>>>         * config/aarch64/aarch64-ldpstp.md: New file.
>>>>>         * config/aarch64/aarch64-protos.h (aarch64_gen_adjusted_ldpstp):
>>>>> New.
>>>>>         (extract_base_offset_in_addr): New.
>>>>>         (aarch64_operands_ok_for_ldpstp): New.
>>>>>         (aarch64_operands_adjust_ok_for_ldpstp): New.
>>>>>         * config/aarch64/aarch64.c (enum sched_fusion_type): New enum.
>>>>>         (TARGET_SCHED_FUSION_PRIORITY): New hook.
>>>>>         (fusion_load_store): New functon.
>>>>>         (extract_base_offset_in_addr): New function.
>>>>>         (aarch64_gen_adjusted_ldpstp): New function.
>>>>>         (aarch64_sched_fusion_priority): New function.
>>>>>         (aarch64_operands_ok_for_ldpstp): New function.
>>>>>         (aarch64_operands_adjust_ok_for_ldpstp): New function.
>>>>>
>>>>> 2014-11-18  Bin Cheng  <bin.cheng@arm.com>
>>>>>
>>>>>         * gcc.target/aarch64/ldp-stp-1.c: New test.
>>>>>         * gcc.target/aarch64/ldp-stp-2.c: New test.
>>>>>         * gcc.target/aarch64/ldp-stp-3.c: New test.
>>>>>         * gcc.target/aarch64/ldp-stp-4.c: New test.
>>>>>         * gcc.target/aarch64/ldp-stp-5.c: New test.
>>>>>         * gcc.target/aarch64/lr_free_1.c: Disable scheduling fusion
>>>>>         and peephole2 pass.
>>>>
>>>> Committed, thanks. /Marcus
>>>
>>>
>>> This patch has a bug in it dealing with volatile mems.  It will do a
>>> peephole of two volatile mems into a pair which is not correct as the
>>> order in the architecture is not defined and might even swap the order
>>> of the load/store incorrectly.
>>> I noticed this when I was merging in my changes for an improved
>>> peephone which already has a check for volatile.
>>
>> Something like:
>> @@ -10555,6 +10863,10 @@ aarch64_operands_ok_for_ldpstp (rtx
>> *operands, bool load,
>>        reg_2 = operands[3];
>>      }
>>
>> +  /* The mems cannot be volatile.  */
>> +  if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2))
>> +    return false;
>> +
>>    /* Check if the addresses are in the form of [base+offset].  */
>>    extract_base_offset_in_addr (mem_1, &base_1, &offset_1);
>>    if (base_1 == NULL_RTX || offset_1 == NULL_RTX)
>>
>>
>> ---- CUT ----
>> gcc.target/aarch64/rev16_1.c is the testcase where two volatile loads
>> being merged into one load pair.
>
>
> And this one:
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 40881e7..da36dc8 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -10974,6 +10974,10 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx
> *operands, bool load,
>    if (!MEM_P (mem_1) || aarch64_mem_pair_operand (mem_1, mode))
>      return false;
>
> +  if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2)
> +      || MEM_VOLATILE_P (mem_3) ||MEM_VOLATILE_P (mem_4))
> +    return false;
> +
>    /* Check if the addresses are in the form of [base+offset].  */
>    extract_base_offset_in_addr (mem_1, &base_1, &offset_1);
>    if (base_1 == NULL_RTX || offset_1 == NULL_RTX)
> --- CUT ---
> This one happens with c-c++-common/torture/complex-sign-add.c.
> I will post a full patch tomorrow.
>
> Thanks,
> Andrew
>
>>
>> Thanks,
>> Andrew
>>
>>>
>>> Thanks,
>>> Andrew Pinski
Hi Andrew,
Thanks for reporting this.  When I facing this volatile problem, I
thought it was fine as long as the memory access wasn't removed
accidentally.  Could you please explain more why the order of volatile
also matters?  Is it related to some barriers between the two
accesses?  If that's the case, does it imply that dependence created
by GCC isn't correct?  I am still not sure why the case passed in my
regression test, or how I missed it.

Any way, volatile cases should be handled conservatively here.  Thanks again.

Thanks,
bin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]