This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH AARCH64]load store pair optimization using sched_fusion pass.


On Sun, Dec 7, 2014 at 7:35 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Mon, Dec 8, 2014 at 11:26 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Mon, Dec 8, 2014 at 11:16 AM, Andrew Pinski <pinskia@gmail.com> wrote:
>>> On Sun, Dec 7, 2014 at 5:47 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>> On Sun, Dec 7, 2014 at 6:24 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>>>> On Sat, Dec 6, 2014 at 6:35 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>>>>> On Sat, Dec 6, 2014 at 5:54 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>>>>>> On Fri, Dec 5, 2014 at 9:08 AM, Marcus Shawcroft
>>>>>>> <marcus.shawcroft@gmail.com> wrote:
>>>>>>>> On 18 November 2014 at 08:34, Bin Cheng <bin.cheng@arm.com> wrote:
>>>>>>>>
>>>>>>>>> 2014-11-18  Bin Cheng  <bin.cheng@arm.com>
>>>>>>>>>
>>>>>>>>>         * config/aarch64/aarch64.md (load_pair<mode>): Split to
>>>>>>>>>         load_pairsi, load_pairdi, load_pairsf and load_pairdf.
>>>>>>>>>         (load_pairsi, load_pairdi, load_pairsf, load_pairdf): Split
>>>>>>>>>         from load_pair<mode>.  New alternative to support int/fp
>>>>>>>>>         registers in fp/int mode patterns.
>>>>>>>>>         (store_pair<mode>:): Split to store_pairsi, store_pairdi,
>>>>>>>>>         store_pairsf and store_pairdi.
>>>>>>>>>         (store_pairsi, store_pairdi, store_pairsf, store_pairdf): Split
>>>>>>>>>         from store_pair<mode>.  New alternative to support int/fp
>>>>>>>>>         registers in fp/int mode patterns.
>>>>>>>>>         (*load_pair_extendsidi2_aarch64): New pattern.
>>>>>>>>>         (*load_pair_zero_extendsidi2_aarch64): New pattern.
>>>>>>>>>         (aarch64-ldpstp.md): Include.
>>>>>>>>>         * config/aarch64/aarch64-ldpstp.md: New file.
>>>>>>>>>         * config/aarch64/aarch64-protos.h (aarch64_gen_adjusted_ldpstp):
>>>>>>>>> New.
>>>>>>>>>         (extract_base_offset_in_addr): New.
>>>>>>>>>         (aarch64_operands_ok_for_ldpstp): New.
>>>>>>>>>         (aarch64_operands_adjust_ok_for_ldpstp): New.
>>>>>>>>>         * config/aarch64/aarch64.c (enum sched_fusion_type): New enum.
>>>>>>>>>         (TARGET_SCHED_FUSION_PRIORITY): New hook.
>>>>>>>>>         (fusion_load_store): New functon.
>>>>>>>>>         (extract_base_offset_in_addr): New function.
>>>>>>>>>         (aarch64_gen_adjusted_ldpstp): New function.
>>>>>>>>>         (aarch64_sched_fusion_priority): New function.
>>>>>>>>>         (aarch64_operands_ok_for_ldpstp): New function.
>>>>>>>>>         (aarch64_operands_adjust_ok_for_ldpstp): New function.
>>>>>>>>>
>>>>>>>>> 2014-11-18  Bin Cheng  <bin.cheng@arm.com>
>>>>>>>>>
>>>>>>>>>         * gcc.target/aarch64/ldp-stp-1.c: New test.
>>>>>>>>>         * gcc.target/aarch64/ldp-stp-2.c: New test.
>>>>>>>>>         * gcc.target/aarch64/ldp-stp-3.c: New test.
>>>>>>>>>         * gcc.target/aarch64/ldp-stp-4.c: New test.
>>>>>>>>>         * gcc.target/aarch64/ldp-stp-5.c: New test.
>>>>>>>>>         * gcc.target/aarch64/lr_free_1.c: Disable scheduling fusion
>>>>>>>>>         and peephole2 pass.
>>>>>>>>
>>>>>>>> Committed, thanks. /Marcus
>>>>>>>
>>>>>>>
>>>>>>> This patch has a bug in it dealing with volatile mems.  It will do a
>>>>>>> peephole of two volatile mems into a pair which is not correct as the
>>>>>>> order in the architecture is not defined and might even swap the order
>>>>>>> of the load/store incorrectly.
>>>>>>> I noticed this when I was merging in my changes for an improved
>>>>>>> peephone which already has a check for volatile.
>>>>>>
>>>>>> Something like:
>>>>>> @@ -10555,6 +10863,10 @@ aarch64_operands_ok_for_ldpstp (rtx
>>>>>> *operands, bool load,
>>>>>>        reg_2 = operands[3];
>>>>>>      }
>>>>>>
>>>>>> +  /* The mems cannot be volatile.  */
>>>>>> +  if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2))
>>>>>> +    return false;
>>>>>> +
>>>>>>    /* Check if the addresses are in the form of [base+offset].  */
>>>>>>    extract_base_offset_in_addr (mem_1, &base_1, &offset_1);
>>>>>>    if (base_1 == NULL_RTX || offset_1 == NULL_RTX)
>>>>>>
>>>>>>
>>>>>> ---- CUT ----
>>>>>> gcc.target/aarch64/rev16_1.c is the testcase where two volatile loads
>>>>>> being merged into one load pair.
>>>>>
>>>>>
>>>>> And this one:
>>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>>>> index 40881e7..da36dc8 100644
>>>>> --- a/gcc/config/aarch64/aarch64.c
>>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>>> @@ -10974,6 +10974,10 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx
>>>>> *operands, bool load,
>>>>>    if (!MEM_P (mem_1) || aarch64_mem_pair_operand (mem_1, mode))
>>>>>      return false;
>>>>>
>>>>> +  if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2)
>>>>> +      || MEM_VOLATILE_P (mem_3) ||MEM_VOLATILE_P (mem_4))
>>>>> +    return false;
>>>>> +
>>>>>    /* Check if the addresses are in the form of [base+offset].  */
>>>>>    extract_base_offset_in_addr (mem_1, &base_1, &offset_1);
>>>>>    if (base_1 == NULL_RTX || offset_1 == NULL_RTX)
>>>>> --- CUT ---
>>>>> This one happens with c-c++-common/torture/complex-sign-add.c.
>>>>> I will post a full patch tomorrow.
>>>>>
>>>>> Thanks,
>>>>> Andrew
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Andrew
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Andrew Pinski
>>>> Hi Andrew,
>>>> Thanks for reporting this.  When I facing this volatile problem, I
>>>> thought it was fine as long as the memory access wasn't removed
>>>> accidentally.  Could you please explain more why the order of volatile
>>>> also matters?
>>>
>>> Because volatile access between sequence points cannot be reordered.
>>> See https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html .
>>> Mainly:
>>> " The minimum requirement is that at a sequence point all previous
>>> accesses to volatile objects have stabilized and no subsequent
>>> accesses have occurred. Thus an implementation is free to reorder and
>>> combine volatile accesses that occur between sequence points, but
>>> cannot do so for accesses across a sequence point".
>>>
>>>> Is it related to some barriers between the two
>>>> accesses?
>>>
>>> Volatile accesses are not allowed to be merged as I pointed out above
>>> or reordered which gives an implicit barrier between the two accesses.
>> Ah, yes, it is implicit barrier that can't be crossed.
>> Thanks very much for explanation.
>>
>> Thanks,
>> bin
>>>
>>>> If that's the case, does it imply that dependence created
>>>> by GCC isn't correct?
>>>
>>> It is correct just not the peepholes you created.
>>>
>>>> I am still not sure why the case passed in my
>>>> regression test, or how I missed it.
>>>
>>> Well that is because the ICE only shows up with my improved
>>> peephole2/load_pair/store_pair patterns.  My load_pair/strore_pair
>>> patterns that I had created were rejecting volatile memory locations
>>> to make sure nobody used these patterns with volatile memory
>>> locations.  Once I get some more time, I will post a clean up patch of
>>> my peephole2/load_pair/store_pair which I think are slightly easier to
>>> understand.
>>>
>>> Thanks,
>>> Andrew Pinski
>>>
>>>>
>>>> Any way, volatile cases should be handled conservatively here.  Thanks again.
>>>>
>>>> Thanks,
>>>> bin
>
> As for the fix, I think it is TARGET_SCHED_FUSION_PRIORITY that needs
> to be fixed.  Scheduler reorders loads/stores according to the
> priorities set by that hook function.  After fixing the hook, even
> volatile accesses can be paired if they are next to each other (or
> volatile access and normal one) at the first place (which means no
> sequence point between volatile accesses).  Makes sense?

I think the scheduler should already have code to handle volatile
memory locations and have the implicit barrier.
And sequence point in the documentation is talking about source code
sequence points (that is an end of a statement is considered a
sequence point).

Thanks,
Andrew Pinski

>
> Thanks,
> bin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]