This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] loop distribution bug fix


On Thu, Nov 10, 2016 at 6:25 AM, Jim Wilson <jim.wilson@linaro.org> wrote:
> This fixes a bug in code adding edges to the dependence graph. Values
> for this_dir can be -1 (backward edge), 0 (no edge), 1 (forward edge),
> and 2 (both backward and forward edges). There can be multiple
> dependencies checked, creating multiple edges that have to be merged
> together. this_dir contains the current edge. dir contains the
> previous edges. The code fails to handle the case where this_dir is 2,
> in which case we can return immediately. This is a minor optimization
> to improve compile time. The code handles the case where dir is
> non-zero and this_dir is zero by returning 2, which is incorrect. dir
> should be unmodified in this case. We can return 2 only if both dir
> and this_dir are non-zero and unequal, i.e. one is -1 and the other is
> 1. This problem creates extra unnecessary edges, which can prevent
> loops from being distributed. The patch fixes both problems.
>
> This passed an x86_64 gcc bootstrap with -ftree-loop-distribution
> added to BOOT_CFLAGS and a testsuite regression check.  Curiously, I
> see that I get different results for the C/C++ ubsan tests every time
> I run them, but this has nothing to do with my patch, as it happens
> with or without my patch.  I haven't tried debugging this yet.  Might
> be related to my recent upgrade to Ubuntu 16.04 LTS.  Otherwise, there
> are no regressions.
>
> On SPEC CPU2006, on aarch64, I see 5879 loops distributed without the
> patch, and 5906 loops distributed with the patch. So 27 extra loops
> are distributed which is about 0.5% more loop distributions. There is
> no measurable performance gain from the bug fix on the CPU2006 run
> time though I plan to spend some more time looking at this code to see
> if I can find other improvements.

The biggest "lack" of loop distribution is the ability to undo CSE so for

 for (;;)
  {
     a[i] = a[i] + 1;
     b[i] = a[i];
  }

CSE makes us see

  for (;;)
    {
       tem = a[i];
       tem2 = tem + 1;
       a[i] = tem;
       b[i] = tem;
    }

and loop distribution cannot re-materialize tem3 from a[i] thus most of the
time it ends up pulling redundant computations into each partition (sometimes
that can reduce memory bandwith if one less stream is used but sometimes
not, like in the above case).

Then of course the cost model is purely modeled for STREAM (reduce the number
of memory streams).  So loop distribution is expected to pessimize code for
the CSE case in case you are not memory bound and improve things if you
are memory bound.

> OK?

Ok.

Thanks for the improvement!
Richard.

> Jim


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]