This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH AARCH64]load store pair optimization using sched_fusion pass.
- From: "Bin.Cheng" <amker dot cheng at gmail dot com>
- To: Andrew Pinski <pinskia at gmail dot com>
- Cc: Marcus Shawcroft <marcus dot shawcroft at gmail dot com>, Bin Cheng <bin dot cheng at arm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 8 Dec 2014 09:47:15 +0800
- Subject: Re: [PATCH AARCH64]load store pair optimization using sched_fusion pass.
- Authentication-results: sourceware.org; auth=none
- References: <000401d0030a$6e8dafc0$4ba90f40$ at arm dot com> <CAFqB+Py2Adx8TVNfNxr+fu-FFOijkucoqD2Jgu3RhyA11TurJg at mail dot gmail dot com> <CA+=Sn1ntj506=z0TuqfcATnmp_Psy14EvLY-zESjsL9LPx+f=Q at mail dot gmail dot com> <CA+=Sn1muw_jLTHbKuv--pTfk+SJu2cNPcWoRRGp8t9Es7oP91Q at mail dot gmail dot com> <CA+=Sn1=nkDbC4OD0huR4L3c38AwM0rXNk+1ySOMP-2A3K714Lg at mail dot gmail dot com>
On Sun, Dec 7, 2014 at 6:24 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Sat, Dec 6, 2014 at 6:35 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Sat, Dec 6, 2014 at 5:54 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>> On Fri, Dec 5, 2014 at 9:08 AM, Marcus Shawcroft
>>> <marcus.shawcroft@gmail.com> wrote:
>>>> On 18 November 2014 at 08:34, Bin Cheng <bin.cheng@arm.com> wrote:
>>>>
>>>>> 2014-11-18 Bin Cheng <bin.cheng@arm.com>
>>>>>
>>>>> * config/aarch64/aarch64.md (load_pair<mode>): Split to
>>>>> load_pairsi, load_pairdi, load_pairsf and load_pairdf.
>>>>> (load_pairsi, load_pairdi, load_pairsf, load_pairdf): Split
>>>>> from load_pair<mode>. New alternative to support int/fp
>>>>> registers in fp/int mode patterns.
>>>>> (store_pair<mode>:): Split to store_pairsi, store_pairdi,
>>>>> store_pairsf and store_pairdi.
>>>>> (store_pairsi, store_pairdi, store_pairsf, store_pairdf): Split
>>>>> from store_pair<mode>. New alternative to support int/fp
>>>>> registers in fp/int mode patterns.
>>>>> (*load_pair_extendsidi2_aarch64): New pattern.
>>>>> (*load_pair_zero_extendsidi2_aarch64): New pattern.
>>>>> (aarch64-ldpstp.md): Include.
>>>>> * config/aarch64/aarch64-ldpstp.md: New file.
>>>>> * config/aarch64/aarch64-protos.h (aarch64_gen_adjusted_ldpstp):
>>>>> New.
>>>>> (extract_base_offset_in_addr): New.
>>>>> (aarch64_operands_ok_for_ldpstp): New.
>>>>> (aarch64_operands_adjust_ok_for_ldpstp): New.
>>>>> * config/aarch64/aarch64.c (enum sched_fusion_type): New enum.
>>>>> (TARGET_SCHED_FUSION_PRIORITY): New hook.
>>>>> (fusion_load_store): New functon.
>>>>> (extract_base_offset_in_addr): New function.
>>>>> (aarch64_gen_adjusted_ldpstp): New function.
>>>>> (aarch64_sched_fusion_priority): New function.
>>>>> (aarch64_operands_ok_for_ldpstp): New function.
>>>>> (aarch64_operands_adjust_ok_for_ldpstp): New function.
>>>>>
>>>>> 2014-11-18 Bin Cheng <bin.cheng@arm.com>
>>>>>
>>>>> * gcc.target/aarch64/ldp-stp-1.c: New test.
>>>>> * gcc.target/aarch64/ldp-stp-2.c: New test.
>>>>> * gcc.target/aarch64/ldp-stp-3.c: New test.
>>>>> * gcc.target/aarch64/ldp-stp-4.c: New test.
>>>>> * gcc.target/aarch64/ldp-stp-5.c: New test.
>>>>> * gcc.target/aarch64/lr_free_1.c: Disable scheduling fusion
>>>>> and peephole2 pass.
>>>>
>>>> Committed, thanks. /Marcus
>>>
>>>
>>> This patch has a bug in it dealing with volatile mems. It will do a
>>> peephole of two volatile mems into a pair which is not correct as the
>>> order in the architecture is not defined and might even swap the order
>>> of the load/store incorrectly.
>>> I noticed this when I was merging in my changes for an improved
>>> peephone which already has a check for volatile.
>>
>> Something like:
>> @@ -10555,6 +10863,10 @@ aarch64_operands_ok_for_ldpstp (rtx
>> *operands, bool load,
>> reg_2 = operands[3];
>> }
>>
>> + /* The mems cannot be volatile. */
>> + if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2))
>> + return false;
>> +
>> /* Check if the addresses are in the form of [base+offset]. */
>> extract_base_offset_in_addr (mem_1, &base_1, &offset_1);
>> if (base_1 == NULL_RTX || offset_1 == NULL_RTX)
>>
>>
>> ---- CUT ----
>> gcc.target/aarch64/rev16_1.c is the testcase where two volatile loads
>> being merged into one load pair.
>
>
> And this one:
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 40881e7..da36dc8 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -10974,6 +10974,10 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx
> *operands, bool load,
> if (!MEM_P (mem_1) || aarch64_mem_pair_operand (mem_1, mode))
> return false;
>
> + if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2)
> + || MEM_VOLATILE_P (mem_3) ||MEM_VOLATILE_P (mem_4))
> + return false;
> +
> /* Check if the addresses are in the form of [base+offset]. */
> extract_base_offset_in_addr (mem_1, &base_1, &offset_1);
> if (base_1 == NULL_RTX || offset_1 == NULL_RTX)
> --- CUT ---
> This one happens with c-c++-common/torture/complex-sign-add.c.
> I will post a full patch tomorrow.
>
> Thanks,
> Andrew
>
>>
>> Thanks,
>> Andrew
>>
>>>
>>> Thanks,
>>> Andrew Pinski
Hi Andrew,
Thanks for reporting this. When I facing this volatile problem, I
thought it was fine as long as the memory access wasn't removed
accidentally. Could you please explain more why the order of volatile
also matters? Is it related to some barriers between the two
accesses? If that's the case, does it imply that dependence created
by GCC isn't correct? I am still not sure why the case passed in my
regression test, or how I missed it.
Any way, volatile cases should be handled conservatively here. Thanks again.
Thanks,
bin