This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 1/2][Modulo-sched] Fix the direction of the scheduling window


Revital1 Eres/Haifa/IBM wrote on 20/11/2007 15:47:59:

> Hello,
>
> When determining the scheduling window of a node all types of edges are
> taken into account.  Currently when a node has both predecessors and
> successors it will be set close to it's predecessors. In fact it can
> be that a node has only one successor with true dep edge but it will be
> scheduled close to it's allegedly predecessor; because of the anti-dep
> edge between the two nodes.  Trying to avoid this confusion and reduce
the
> life range of registers we choose to set the scheduled node close to it's
> predecessors or close to it's successors based on true deps edges only.
>
> This change makes SMS succeed on the attached testcase with
-funroll-loops
> on SPU.  The testcase was provided by Vladimir.
>
> Bootstrapped and tested together with the follow on patch (2/2) on ppc,
> SPU and x86 with no new regressions.
>
> :ADDPATCH modulo-sched:
>
> OK for mainline?

Yes, with minor comments below.

Note that it may be better to set the directions (step = 1/-1) once when
fixing the order of nodes, instead of counting all true register deps
repeatedly every time we try to schedule a node, but otoh we have to
traverse all succs and preds to compute the window anyhow.

Ayal.


>
> Thanks,
> Revital
>
> 2007-11-20  Ayal Zaks  <zaks@il.ibm.com>
>             Revital Eres  <eres@il.ibm.com>
>
>         * modulo-sched.c (get_sched_window): Fix the direction of the
>         scheduling window and add dump info.
>
> testsuite:
>
> 2007-11-20  Vladimir Yanovsky  <yanov@il.ibm.com>
>
>         * gcc.dg/sms-3.c: New testcase.
>

attachment "patch_win_dir_again.txt":

> Index: modulo-sched.c
> ===================================================================
> --- modulo-sched.c (revision 129721)
> +++ modulo-sched.c (working copy)
> @@ -1344,8 +1344,8 @@
>                  MAX (early_start, p_st + e->latency - (e->distance *
ii));
>
>                if (dump_file)
> -                fprintf (dump_file, "pred st = %d; early_start = %d; ",
p_st,
> -                         early_start);
> +                fprintf (dump_file, "pred st = %d; early_start = %d; "
> +                         " latency: %d", p_st, early_start, e->latency);


Not sure what's the proper way to split a string; the following looks
better I think:

+                fprintf (dump_file,
+                         "pred st = %d; early_start = %d; latency: %d",
+                         p_st, early_start, e->latency);


>
>         if (e->data_type == MEM_DEP)
>    end = MIN (end, SCHED_TIME (v_node) + ii - 1);
> @@ -1355,6 +1355,7 @@
>   }
>        start = early_start;
>        end = MIN (end, early_start + ii);
> +      /* Schedule the node close to it's processors.  */
                                            ^^^^^^^^^^
                                            predecessors

>        step = 1;
>
>        if (dump_file)
> @@ -1391,8 +1392,8 @@
>                                  s_st - e->latency + (e->distance * ii));
>
>                if (dump_file)
> -                fprintf (dump_file, "succ st = %d; late_start = %d;",
s_st,
> -                         late_start);
> +                fprintf (dump_file, "succ st = %d; late_start = %d;"
> +                         " latency = %d", s_st, late_start, e->latency);

Same as above.

>
>         if (e->data_type == MEM_DEP)
>    end = MAX (end, SCHED_TIME (v_node) - ii + 1);
> @@ -1406,6 +1407,7 @@
>   }
>        start = late_start;
>        end = MAX (end, late_start - ii);
> +      /* Schedule the node close to it's successors.  */
>        step = -1;
>
>        if (dump_file)
> @@ -1419,6 +1421,8 @@
>      {
>        int early_start = INT_MIN;
>        int late_start = INT_MAX;
> +      int count_preds = 0;
> +      int count_succs = 0;
>
>        start = INT_MIN;
>        end = INT_MAX;
> @@ -1446,9 +1450,12 @@
>       - (e->distance * ii));
>
>                if (dump_file)
> -                fprintf (dump_file, "pred st = %d; early_start = %d;",
p_st,
> -                         early_start);
> +                fprintf (dump_file, "pred st = %d; early_start = %d;"
> +                         " latency = %d", p_st, early_start,
e->latency);

Same as above.

>
> +              if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> +                count_preds++;
> +
>         if (e->data_type == MEM_DEP)
>    end = MIN (end, SCHED_TIME (v_node) + ii - 1);
>       }
> @@ -1479,10 +1486,13 @@
>      s_st - e->latency
>      + (e->distance * ii));
>
> -               if (dump_file)
> -                 fprintf (dump_file, "succ st = %d; late_start = %d;",
s_st,
> -                          late_start);
> +              if (dump_file)
> +                fprintf (dump_file, "succ st = %d; late_start = %d;"
> +                         " latency = %d", s_st, late_start, e->latency);

Same as above.

>
> +               if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> +                 count_succs++;
> +
>         if (e->data_type == MEM_DEP)
>    start = MAX (start, SCHED_TIME (v_node) - ii + 1);
>       }
> @@ -1493,6 +1503,16 @@
>        start = MAX (start, early_start);
>        end = MIN (end, MIN (early_start + ii, late_start + 1));
>        step = 1;
> +      /* If there are more successors than processors schedule the
                                              ^^^^^^^^^^

> +         node close to it's successors.  */
> +      if (count_succs >= count_preds)
> +        {
> +          int old_start = start;
> +
> +          start = end - 1;
> +          end = old_start - 1;
> +          step = -1;
> +        }
>      }
>    else /* psp is empty && pss is empty.  */
>      {


attachment "sms-3.c.txt":

> /* { dg-do compile } */
> /* { dg-options "-O2 -fmodulo-sched -funroll-loops -dm" } */
>
> int X[100];
> int Y[100];
>
> int
> foo (int len, long *result)
> {
>   int i;
>
>   len = 1000;
>   long res = *result;
>   for (i = 0; i < len; i++)
>     res += X[i] * Y[i];
>
>   *result = res;
> }
>
> /* { dg-final { cleanup-rtl-dump "sms" } } */

Better check that result is correct on some data, in addition to the fact
that the loop is sms'ed.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]