This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: "Bin.Cheng" <amker dot cheng at gmail dot com>
- Cc: Jeff Law <law at redhat dot com>, Bin Cheng <Bin dot Cheng at arm dot com>, gcc-patches List <gcc-patches at gcc dot gnu dot org>, Mike Stump <mikestump at comcast dot net>
- Date: Mon, 24 Nov 2014 14:28:31 +0000
- Subject: Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass
- Authentication-results: sourceware.org; auth=none
- References: <000001cfdc90$1d95c670$58c15350$ at arm dot com> <5453F12B dot 6080106 at redhat dot com> <CAHFci29sr2CN-RfO3SN8RgOYBzAw4owdYw-Ed-ih6AvhKVUpHg at mail dot gmail dot com> <545C009D dot 4020702 at redhat dot com> <CAHFci28vB-ZiEweX0Ub3p9HkjYj_UKz41LsUC=F1+sejYFvafg at mail dot gmail dot com>
On Fri, Nov 14, 2014 at 02:43:12AM +0000, Bin.Cheng wrote:
> On Fri, Nov 7, 2014 at 7:13 AM, Jeff Law <law@redhat.com> wrote:
> > On 11/05/14 02:30, Bin.Cheng wrote:
> >> Thanks very much for reviewing. I refined the patch according to your
> >> comments. Also made two small changes: a) skip breaking dependency
> >> between memory access and the corresponding base-reg modifying
> >> instruction. This feature doesn't help load/store pair that much and
> >> only increases compilation time. b) a minor bug fix in arm backend
> >> hook when calculating priority for memory accesses with minus offset.
> >>
> >> I am running bootstrap/test against latest trunk, and will adapt
> >> ChangeLog once get approved generally. So how about this one?
> >
> > OK for the trunk. Thanks for your patience.
> >
> > Jeff
> >
>
> Thanks for reviewing. For the record, attached patch is committed.
> The only update is I disabled the pass if peephole2 isn't in effect
> because it relies on peephole2 to do real fusion work.
Hi Bin,
The documentation for TARGET_SCHED_FUSION_PRIORITY doesn't look
right to me (see: https://gcc.gnu.org/onlinedocs/gccint/Scheduling.html ).
I think you'll need to wrap your examples in something like @smallexample
tags if you want to maintain their formatting.
Thanks,
James
> Index: gcc/target.def
> ===================================================================
> --- gcc/target.def (revision 217474)
> +++ gcc/target.def (working copy)
> @@ -1526,6 +1526,79 @@ parallelism required in output calculations chain.
> int, (unsigned int opc, machine_mode mode),
> hook_int_uint_mode_1)
>
> +/* The following member value is a function that returns priority for
> + fusion of each instruction via pointer parameters. */
> +DEFHOOK
> +(fusion_priority,
> +"This hook is called by scheduling fusion pass. It calculates fusion\n\
> +priorities for each instruction passed in by parameter. The priorities\n\
> +are returned via pointer parameters.\n\
> +\n\
> +@var{insn} is the instruction whose priorities need to be calculated.\n\
> +@var{max_pri} is the maximum priority can be returned in any cases.\n\
> +@var{fusion_pri} is the pointer parameter through which @var{insn}'s\n\
> +fusion priority should be calculated and returned.\n\
> +@var{pri} is the pointer parameter through which @var{insn}'s priority\n\
> +should be calculated and returned.\n\
> +\n\
> +Same @var{fusion_pri} should be returned for instructions which should\n\
> +be scheduled together. Different @var{pri} should be returned for\n\
> +instructions with same @var{fusion_pri}. @var{fusion_pri} is the major\n\
> +sort key, @var{pri} is the minor sort key. All instructions will be\n\
> +scheduled according to the two priorities. All priorities calculated\n\
> +should be between 0 (exclusive) and @var{max_pri} (inclusive). To avoid\n\
> +false dependencies, @var{fusion_pri} of instructions which need to be\n\
> +scheduled together should be smaller than @var{fusion_pri} of irrelevant\n\
> +instructions.\n\
> +\n\
> +Given below example:\n\
> +\n\
> + ldr r10, [r1, 4]\n\
> + add r4, r4, r10\n\
> + ldr r15, [r2, 8]\n\
> + sub r5, r5, r15\n\
> + ldr r11, [r1, 0]\n\
> + add r4, r4, r11\n\
> + ldr r16, [r2, 12]\n\
> + sub r5, r5, r16\n\
> +\n\
> +On targets like ARM/AArch64, the two pairs of consecutive loads should be\n\
> +merged. Since peephole2 pass can't help in this case unless consecutive\n\
> +loads are actually next to each other in instruction flow. That's where\n\
> +this scheduling fusion pass works. This hook calculates priority for each\n\
> +instruction based on its fustion type, like:\n\
> +\n\
> + ldr r10, [r1, 4] ; fusion_pri=99, pri=96 \n\
> + add r4, r4, r10 ; fusion_pri=100, pri=100 \n\
> + ldr r15, [r2, 8] ; fusion_pri=98, pri=92 \n\
> + sub r5, r5, r15 ; fusion_pri=100, pri=100 \n\
> + ldr r11, [r1, 0] ; fusion_pri=99, pri=100 \n\
> + add r4, r4, r11 ; fusion_pri=100, pri=100 \n\
> + ldr r16, [r2, 12] ; fusion_pri=98, pri=88 \n\
> + sub r5, r5, r16 ; fusion_pri=100, pri=100 \n\
> +\n\
> +Scheduling fusion pass then sorts all ready to issue instructions according\n\
> +to the priorities. As a result, instructions of same fusion type will be\n\
> +pushed together in instruction flow, like:\n\
> +\n\
> + ldr r11, [r1, 0]\n\
> + ldr r10, [r1, 4]\n\
> + ldr r15, [r2, 8]\n\
> + ldr r16, [r2, 12]\n\
> + add r4, r4, r10\n\
> + sub r5, r5, r15\n\
> + add r4, r4, r11\n\
> + sub r5, r5, r16\n\
> +\n\
> +Now peephole2 pass can simply merge the two pairs of loads.\n\
> +\n\
> +Since scheduling fusion pass relies on peephole2 to do real fusion\n\
> +work, it is only enabled by default when peephole2 is in effect.\n\
> +\n\
> +This is firstly introduced on ARM/AArch64 targets, please refer to\n\
> +the hook implementation for how different fusion types are supported.",
> +void, (rtx_insn *insn, int max_pri, int *fusion_pri, int *pri), NULL)
> +
> HOOK_VECTOR_END (sched)
>
> /* Functions relating to OpenMP and Cilk Plus SIMD clones. */