This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Modulo-scheduling improvements.

From: Ayal Zaks <ZAKS at il dot ibm dot com>
To: "Alexander Monakov" <monoid at ispras dot ru>
Cc: "Andrey Belevantsev" <abel at ispras dot ru>, andrew_pinski at playstation dot sony dot com, dje at watson dot ibm dot com, gcc-patches at gcc dot gnu dot org, Mircea Namolaru <NAMOLARU at il dot ibm dot com>, "Zdenek Dvorak" <rakdver at kam dot mff dot cuni dot cz>, Revital1 Eres <ERES at il dot ibm dot com>, "Dmitry Zhurikhin" <zhur at ispras dot ru>
Date: Tue, 21 Aug 2007 01:46:42 +0300
Subject: Re: [PATCH] Modulo-scheduling improvements.

Hi Alexander,

"Alexander Monakov" <monoid@ispras.ru> wrote on 20/08/2007 18:26:23:

> Hello,
>
> Please consider the following two patches.
>
> 1.  Use MAX_ASAP in MAX_II estimation.
>
> Currently SMS estimates maximum initiation interval by multiplying
minimal
> II estimation by hard-coded constant, MAXII_FACTOR, which is currently
> defined as 2.  This may sometimes forbid performing SMS on loops which
> could benefit from modulo-scheduling.  However, MAX_ASAP calculated in
> calculate_order_params is equal to length of critical path for the basic

> block, so this basic block cannot be scheduled in less than MAX_ASAP
ticks
> by scheduler not performing inter-iteration code motion.  Therefore, we
> can use MAX_ASAP as more accurate estimation of maximal II for SMS.  The

> patch below takes a hybrid approach, allowing II as great as MAXII_FACTOR

> * min_ii, even if MAX_ASAP is smaller.
>

This patch is a very good idea; maxii is supposed to be an upper-bound
telling us to quit trying to sms the loop -- we're better off leaving it
for the regular scheduler to handle. The old maxii was way too generous,
revealing a problem that caused sms to fail regardless of how large ii
grows (identified the cause of the problem recently, working on a
solution). However, if the current maxii is found to be too conservative,
it suggests that minii (mii) is too small. You may want to try and improve
mii, as that would reduce futile attempts and improve compile-time,
regardless of the change to maxii.

I prefer not to have a global_max_asap (especially for such a short-lived
variable), but have calculate_order_params() return the critical path
length, say using an int* parameter. If you can test on PowerPC please do
so, otherwise Revital - could you check this patch on PowerPC and SPU?
Patch approved with the above change if additional tests pass.


> ---cut---
> --- gcc/modulo-sched.c  (/gcc-local/trunk)      (revision 30596)
> +++ gcc/modulo-sched.c  (/gcc-local/export-ddg) (revision 30596)
> @@ -793,6 +803,10 @@ canon_loop (struct loop *loop)
>   /* Used to calculate the upper bound of ii.  */
>   #define MAXII_FACTOR 2
>
> +/* Maximal ASAP saved for estimating maxII.  */
> +static int global_max_asap;
> +"Revital1 Eres" <ERES@il.ibm.com>,
> +
>   /* Main entry point, perform SMS scheduling on the loops of the
function
>      that consist of single basic blocks.  */
>   static void
> @@ -1019,9 +1033,10 @@ sms_schedule (void)
>         node_order = XNEWVEC (int, g->num_nodes);
>
>         mii = 1; /* Need to pass some estimate of mii.  */
> +      global_max_asap = 0;
>         rec_mii = sms_order_nodes (g, mii, node_order);
>         mii = MAX (res_MII (g), rec_mii);
> -      maxii = MAXII_FACTOR * mii;
> +      maxii = MAX (global_max_asap, MAXII_FACTOR * mii);
>
>         if (dump_file)
>          fprintf (dump_file, "SMS iis %d %d %d (rec_mii, mii, maxii)\n",
> @@ -1681,6 +1703,7 @@ calculate_order_params (ddg_ptr g, int m
>            }
>       }
>
> +  global_max_asap = max_asap;
>     return node_order_params_arr;
>   }
>
> ---cut---
>
> 2.  Change logic for MAX_PRECEDE, MAX_FOLLOW calculation.
>
> These two sbitmaps are used to determine order of instructions that are
> scheduled on the same row (and, therefore, in the same instruction
> group).  However, the way they are calculated is questionable: their
> calculation is performed out of the loop that iterates over possible
> positions in scheduling window, and uses different clock ticks for
> incoming and outgoing arcs; also, all instructions from incoming edges
are
> marked in must_precede bitmap, and all insns from outgoing edges are
> marked in must_follow bitmap, which may lead to same insn marked in both

> bitmaps.  We propose moving calculation of these bitmaps into loop that
> iterates over positions in scheduling window, and fill the bitmaps in the

> way so precedence is forced according to anti-dependencies.  Patch
> attached (sms-fix-must_precede_follow.patch).
>

The idea behind must_precede and must_follow is to more accurately define
the lintel/sill of the scheduling window, given by a range of 'start' ..
'end' cycles. The window starts on cycle 'start', but possibly not at the
"beginning" of this cycle - there may be insns already scheduled in this
row, which must_precede the current node being scheduled. Likewise for
cycle 'end' (or rather one cycle before 'end', as 'end' marks the first
cycle outside the window) - there may be insns already scheduled in this
row, that must_follow the current node being scheduled. For all cycles
properly inside the window, we are free to place the insn in any position
(dependence-wise). This issue mostly involves zero-latency dependences such
as anti-deps, where producer and consumer are allowed to reside in same row
but their order in the row must conserve the direction of the dependence
(at-least for super-scalar).
One should check not only if pro and con reside in same 'start' or 'end'
row (as currenly done), but also if the dependence equation between them
becomes tight when both are scheduled in 'start' or 'end'.

If you have a testcase demonstrating this problem, that could be helpful.

I suggest to try and peel the for (c = start; c != end; c += step) loop, so
that when c == start we call ps_add_node_check_conflicts with must_precede
but an empty bitmap for must_follow, call ps_add_node_check_conflicts with
two empty bitmaps until one before last cycle (end - 2*step), and call
ps_add_node_check_conflicts with must_follow and an empty must_precede. for
last cycle (end - step).

Please try and see if this works for you.
Thanks, Ayal.

PS. I will be out of touch until Thursday.


> These two patches together with fix posted earlier by Andrey Belevantsev

> allow fine modulo-scheduling of simple loops on Itanium, where it would
> previously fail.  To quantify, number of successfully modulo-scheduled
> loops (without -fmodulo-sched-allow-regmoves) raises from 32 to 129 on
> files from tree-vectorizer testsuite (number of failures lowers from 108

> to 21; the difference in totals is due to ICE's in trunk's version of
SMS,
> which are fixed by Andrey's patch).
>
> Thanks.
>
> --
> Alexander Monakov, Dmitry Zhurikhin[attachment
"sms-use-max_asap-for-maxii-
> estimation.patch" deleted by Ayal Zaks/Haifa/IBM] [attachment "sms-fix-
> must_precede_follow.patch" deleted by Ayal Zaks/Haifa/IBM]

Follow-Ups:
- Re: [PATCH] Modulo-scheduling improvements.
  - From: Revital1 Eres

References:
- Re: [PATCH] Modulo-scheduling improvements.
  - From: Alexander Monakov

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]