This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH 1/2][Modulo-sched] Fix the direction of the scheduling window
- From: Ayal Zaks <ZAKS at il dot ibm dot com>
- To: Revital1 Eres <ERES at il dot ibm dot com>
- Cc: Andrey Belevantsev <abel at ispras dot ru>, "Alexander Monakov" <amonakov at ispras dot ru>, gcc-patches at gcc dot gnu dot org
- Date: Wed, 21 Nov 2007 17:37:20 +0200
- Subject: Re: [PATCH 1/2][Modulo-sched] Fix the direction of the scheduling window
Revital1 Eres/Haifa/IBM wrote on 20/11/2007 15:47:59:
> Hello,
>
> When determining the scheduling window of a node all types of edges are
> taken into account. Currently when a node has both predecessors and
> successors it will be set close to it's predecessors. In fact it can
> be that a node has only one successor with true dep edge but it will be
> scheduled close to it's allegedly predecessor; because of the anti-dep
> edge between the two nodes. Trying to avoid this confusion and reduce
the
> life range of registers we choose to set the scheduled node close to it's
> predecessors or close to it's successors based on true deps edges only.
>
> This change makes SMS succeed on the attached testcase with
-funroll-loops
> on SPU. The testcase was provided by Vladimir.
>
> Bootstrapped and tested together with the follow on patch (2/2) on ppc,
> SPU and x86 with no new regressions.
>
> :ADDPATCH modulo-sched:
>
> OK for mainline?
Yes, with minor comments below.
Note that it may be better to set the directions (step = 1/-1) once when
fixing the order of nodes, instead of counting all true register deps
repeatedly every time we try to schedule a node, but otoh we have to
traverse all succs and preds to compute the window anyhow.
Ayal.
>
> Thanks,
> Revital
>
> 2007-11-20 Ayal Zaks <zaks@il.ibm.com>
> Revital Eres <eres@il.ibm.com>
>
> * modulo-sched.c (get_sched_window): Fix the direction of the
> scheduling window and add dump info.
>
> testsuite:
>
> 2007-11-20 Vladimir Yanovsky <yanov@il.ibm.com>
>
> * gcc.dg/sms-3.c: New testcase.
>
attachment "patch_win_dir_again.txt":
> Index: modulo-sched.c
> ===================================================================
> --- modulo-sched.c (revision 129721)
> +++ modulo-sched.c (working copy)
> @@ -1344,8 +1344,8 @@
> MAX (early_start, p_st + e->latency - (e->distance *
ii));
>
> if (dump_file)
> - fprintf (dump_file, "pred st = %d; early_start = %d; ",
p_st,
> - early_start);
> + fprintf (dump_file, "pred st = %d; early_start = %d; "
> + " latency: %d", p_st, early_start, e->latency);
Not sure what's the proper way to split a string; the following looks
better I think:
+ fprintf (dump_file,
+ "pred st = %d; early_start = %d; latency: %d",
+ p_st, early_start, e->latency);
>
> if (e->data_type == MEM_DEP)
> end = MIN (end, SCHED_TIME (v_node) + ii - 1);
> @@ -1355,6 +1355,7 @@
> }
> start = early_start;
> end = MIN (end, early_start + ii);
> + /* Schedule the node close to it's processors. */
^^^^^^^^^^
predecessors
> step = 1;
>
> if (dump_file)
> @@ -1391,8 +1392,8 @@
> s_st - e->latency + (e->distance * ii));
>
> if (dump_file)
> - fprintf (dump_file, "succ st = %d; late_start = %d;",
s_st,
> - late_start);
> + fprintf (dump_file, "succ st = %d; late_start = %d;"
> + " latency = %d", s_st, late_start, e->latency);
Same as above.
>
> if (e->data_type == MEM_DEP)
> end = MAX (end, SCHED_TIME (v_node) - ii + 1);
> @@ -1406,6 +1407,7 @@
> }
> start = late_start;
> end = MAX (end, late_start - ii);
> + /* Schedule the node close to it's successors. */
> step = -1;
>
> if (dump_file)
> @@ -1419,6 +1421,8 @@
> {
> int early_start = INT_MIN;
> int late_start = INT_MAX;
> + int count_preds = 0;
> + int count_succs = 0;
>
> start = INT_MIN;
> end = INT_MAX;
> @@ -1446,9 +1450,12 @@
> - (e->distance * ii));
>
> if (dump_file)
> - fprintf (dump_file, "pred st = %d; early_start = %d;",
p_st,
> - early_start);
> + fprintf (dump_file, "pred st = %d; early_start = %d;"
> + " latency = %d", p_st, early_start,
e->latency);
Same as above.
>
> + if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> + count_preds++;
> +
> if (e->data_type == MEM_DEP)
> end = MIN (end, SCHED_TIME (v_node) + ii - 1);
> }
> @@ -1479,10 +1486,13 @@
> s_st - e->latency
> + (e->distance * ii));
>
> - if (dump_file)
> - fprintf (dump_file, "succ st = %d; late_start = %d;",
s_st,
> - late_start);
> + if (dump_file)
> + fprintf (dump_file, "succ st = %d; late_start = %d;"
> + " latency = %d", s_st, late_start, e->latency);
Same as above.
>
> + if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> + count_succs++;
> +
> if (e->data_type == MEM_DEP)
> start = MAX (start, SCHED_TIME (v_node) - ii + 1);
> }
> @@ -1493,6 +1503,16 @@
> start = MAX (start, early_start);
> end = MIN (end, MIN (early_start + ii, late_start + 1));
> step = 1;
> + /* If there are more successors than processors schedule the
^^^^^^^^^^
> + node close to it's successors. */
> + if (count_succs >= count_preds)
> + {
> + int old_start = start;
> +
> + start = end - 1;
> + end = old_start - 1;
> + step = -1;
> + }
> }
> else /* psp is empty && pss is empty. */
> {
attachment "sms-3.c.txt":
> /* { dg-do compile } */
> /* { dg-options "-O2 -fmodulo-sched -funroll-loops -dm" } */
>
> int X[100];
> int Y[100];
>
> int
> foo (int len, long *result)
> {
> int i;
>
> len = 1000;
> long res = *result;
> for (i = 0; i < len; i++)
> res += X[i] * Y[i];
>
> *result = res;
> }
>
> /* { dg-final { cleanup-rtl-dump "sms" } } */
Better check that result is correct on some data, in addition to the fact
that the loop is sms'ed.