This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH AARCH64]load store pair optimization using sched_fusion pass.
- From: Andrew Pinski <pinskia at gmail dot com>
- To: Marcus Shawcroft <marcus dot shawcroft at gmail dot com>
- Cc: Bin Cheng <bin dot cheng at arm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Sun, 7 Dec 2014 02:24:47 -0800
- Subject: Re: [PATCH AARCH64]load store pair optimization using sched_fusion pass.
- Authentication-results: sourceware.org; auth=none
- References: <000401d0030a$6e8dafc0$4ba90f40$ at arm dot com> <CAFqB+Py2Adx8TVNfNxr+fu-FFOijkucoqD2Jgu3RhyA11TurJg at mail dot gmail dot com> <CA+=Sn1ntj506=z0TuqfcATnmp_Psy14EvLY-zESjsL9LPx+f=Q at mail dot gmail dot com> <CA+=Sn1muw_jLTHbKuv--pTfk+SJu2cNPcWoRRGp8t9Es7oP91Q at mail dot gmail dot com>
On Sat, Dec 6, 2014 at 6:35 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Sat, Dec 6, 2014 at 5:54 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Fri, Dec 5, 2014 at 9:08 AM, Marcus Shawcroft
>> <marcus.shawcroft@gmail.com> wrote:
>>> On 18 November 2014 at 08:34, Bin Cheng <bin.cheng@arm.com> wrote:
>>>
>>>> 2014-11-18 Bin Cheng <bin.cheng@arm.com>
>>>>
>>>> * config/aarch64/aarch64.md (load_pair<mode>): Split to
>>>> load_pairsi, load_pairdi, load_pairsf and load_pairdf.
>>>> (load_pairsi, load_pairdi, load_pairsf, load_pairdf): Split
>>>> from load_pair<mode>. New alternative to support int/fp
>>>> registers in fp/int mode patterns.
>>>> (store_pair<mode>:): Split to store_pairsi, store_pairdi,
>>>> store_pairsf and store_pairdi.
>>>> (store_pairsi, store_pairdi, store_pairsf, store_pairdf): Split
>>>> from store_pair<mode>. New alternative to support int/fp
>>>> registers in fp/int mode patterns.
>>>> (*load_pair_extendsidi2_aarch64): New pattern.
>>>> (*load_pair_zero_extendsidi2_aarch64): New pattern.
>>>> (aarch64-ldpstp.md): Include.
>>>> * config/aarch64/aarch64-ldpstp.md: New file.
>>>> * config/aarch64/aarch64-protos.h (aarch64_gen_adjusted_ldpstp):
>>>> New.
>>>> (extract_base_offset_in_addr): New.
>>>> (aarch64_operands_ok_for_ldpstp): New.
>>>> (aarch64_operands_adjust_ok_for_ldpstp): New.
>>>> * config/aarch64/aarch64.c (enum sched_fusion_type): New enum.
>>>> (TARGET_SCHED_FUSION_PRIORITY): New hook.
>>>> (fusion_load_store): New functon.
>>>> (extract_base_offset_in_addr): New function.
>>>> (aarch64_gen_adjusted_ldpstp): New function.
>>>> (aarch64_sched_fusion_priority): New function.
>>>> (aarch64_operands_ok_for_ldpstp): New function.
>>>> (aarch64_operands_adjust_ok_for_ldpstp): New function.
>>>>
>>>> 2014-11-18 Bin Cheng <bin.cheng@arm.com>
>>>>
>>>> * gcc.target/aarch64/ldp-stp-1.c: New test.
>>>> * gcc.target/aarch64/ldp-stp-2.c: New test.
>>>> * gcc.target/aarch64/ldp-stp-3.c: New test.
>>>> * gcc.target/aarch64/ldp-stp-4.c: New test.
>>>> * gcc.target/aarch64/ldp-stp-5.c: New test.
>>>> * gcc.target/aarch64/lr_free_1.c: Disable scheduling fusion
>>>> and peephole2 pass.
>>>
>>> Committed, thanks. /Marcus
>>
>>
>> This patch has a bug in it dealing with volatile mems. It will do a
>> peephole of two volatile mems into a pair which is not correct as the
>> order in the architecture is not defined and might even swap the order
>> of the load/store incorrectly.
>> I noticed this when I was merging in my changes for an improved
>> peephone which already has a check for volatile.
>
> Something like:
> @@ -10555,6 +10863,10 @@ aarch64_operands_ok_for_ldpstp (rtx
> *operands, bool load,
> reg_2 = operands[3];
> }
>
> + /* The mems cannot be volatile. */
> + if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2))
> + return false;
> +
> /* Check if the addresses are in the form of [base+offset]. */
> extract_base_offset_in_addr (mem_1, &base_1, &offset_1);
> if (base_1 == NULL_RTX || offset_1 == NULL_RTX)
>
>
> ---- CUT ----
> gcc.target/aarch64/rev16_1.c is the testcase where two volatile loads
> being merged into one load pair.
And this one:
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 40881e7..da36dc8 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10974,6 +10974,10 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx
*operands, bool load,
if (!MEM_P (mem_1) || aarch64_mem_pair_operand (mem_1, mode))
return false;
+ if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2)
+ || MEM_VOLATILE_P (mem_3) ||MEM_VOLATILE_P (mem_4))
+ return false;
+
/* Check if the addresses are in the form of [base+offset]. */
extract_base_offset_in_addr (mem_1, &base_1, &offset_1);
if (base_1 == NULL_RTX || offset_1 == NULL_RTX)
--- CUT ---
This one happens with c-c++-common/torture/complex-sign-add.c.
I will post a full patch tomorrow.
Thanks,
Andrew
>
> Thanks,
> Andrew
>
>>
>> Thanks,
>> Andrew Pinski