This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Re[2]: [GSoC: DDG export][RFC] Current status


[Sorry for extra-long letter;  Zdenek, I have one question for you,
please see below around your name :) ; Ayal, I received unexpected
results when testing my patch with modulo-scheduling, please see near
the end]

Hello,

As before, I would like to provide an update on work done in the frame of
GSoC project.  The whole patch (attached) is subdivided into smaller
parts (also attached) for easier viewing and reference:

I. Misc. fixes:

export-ddg-01-delete-strange-ia64-bypass.patch
  This deletes a line from Itanuim2 .md file which lists a special case
  for Itanium fused multiply-add insn.  I have not found this special
  case in Intel's manuals, and my performance tests suggest that this is
  safe to remove.  FWIW, it is nice to present more accurate information
  to schedulers (including SMS, which may be quite fooled by this bypass
  on small loops).

export-ddg-02-misc-fix-testsuite.patch
  This changes some { scan-tree-dump-times "a.e" 0 "optimized" } to use
  "a\\.e" instead.  This does not matter for current trunk, but fixes a
  "regression" I saw while reg-testing the whole patch.

export-ddg-03-fix-dataref-issue.patch
  This was sent to me by Sebastian Pop; the purpose of this patch is to
  improve data dependency analysis for references in nested loops.

Patches 04-06 are modulo-scheduling improvements, and are discussed in
other thread.  I include them here to show on what tree I conducted my
work.

II. Other stuff:

export-ddg-10-mem-orig-exprs.patch
  This provides a mapping from RTL MEMs to their original trees.  It is
  a slightly changed part of alias-export project.

export-ddg-11-main.patch
  This is mostly what the project was about.  It provides a pass
  (scheduled before iv_opts) that saves data references and relations
  obtained from compute_dependencies_for_loop for each loop, skipping
  those that define relation in the non-innermost container, so that for
  for (...)
    {
      a[i] = b[i];
      for (...)
        c[i][j] = d[i][j];
    }
  datarefs and ddrs saved for a[i], b[i] will define their relation
  according to iteration of outermost loop, and those saved for c[i][j],
  d[i][j] will define their relation in innermost loop (but datarefs and
  ddrs defining relation of c[i][j], d[i][j] according to iteration of
  outermost loop will not be saved).
  This also provides a simple verification routine that is fired up
  after almost every pass (both GIMPLE and RTL); it visits memory
  references in loops and gathers statistics on how many bits of saved
  information are available for them.

export-ddg-12-passes.patch
  This just registers passes.

export-ddg-14-ivopts.patch
  A small change was needed to notice TARGET_MEM_REF/TMR_ORIGINAL
  transition in iv-opts.  An alternative approach would be to make
  MEM_ORIG_EXPR provide binding to TARGET_MEM_REF trees, not their
  TMR_ORIGINAL parts.

export-ddg-15-passes-fixups.patch
  One delete_unreachable_blocks (); in
  rest_of_handle_thread_prologue_and_epilogue and disabling verification
  routine for pass_expand were needed to allow verifier run ICElessly on
  bootstrap, regtest, spec cpu 2k, tramp3d.

export-ddg-16-use-in-rtl-aliasing.patch
  This uses saved data dependency information for the purposes of
  RTL-level disambiguation.  Memory references are considered
  independent if corresponding DDR signifies independence, or if it has
  recorded distance vectors, and one of them is non-zero.  Basing memory
  disambiguation on distance vectors would be incorrect for passes that
  perform cross-iteration code motion, so there is a way to turn off
  such behaviour for SMS.

export-ddg-17-use-in-modulo-sched.patch
  This uses exported information in DDG construction for modulo
  scheduling.  For now, only dependence is tested, and distance obtained
  from high-level analysis is not used.

The whole patch survives bootstrap (ia64, x86_64, c,c++,fortran,
--disable-multilib), regtest (x86_64), compiles spec cpu 2k, tramp3d.
An important missing piece is correction of exported information for
loop unrolling.  As far as I can tell, for loop unrolled by factor N we
need to clone MEM_ORIG_EXPRs and datarefs for newly-created MEMs, create
no-dependence DDRs for those pairs, for which original DDR was
no-dependence, and for DDRs with recorded distance vectors we will be
able to recompute new distances from old distance(D) and N (but only if
N % D == 0 || D % N == 0).  Is that right, Zdenek?  Where should I start
to implement this?

On x86_86, this patch produces no noticeable performance changes on SPEC
CPU2000 and no code generation changes for set of small numerical
benchmarks.  On ia64 there are some improvements, probably due to weaker
RTL disambiguation due to lack of base+offset addressing mode.  There is
a 1.5x speedup for Linpack-alike benchmark, where exported info helps to
disambiguate references for the following loop:

  for (i = m; i < n; i = i + 4) {
    dy[i] = dy[i] + da*dx[i];
    dy[i+1] = dy[i+1] + da*dx[i+1];
    dy[i+2] = dy[i+2] + da*dx[i+2];
    dy[i+3] = dy[i+3] + da*dx[i+3];

It also provides ~2.5% speedup for fma3d, ~1.5% for gap and ~11% for
bzip2 of SPEC2K.  I have not investigated causes of speedup, but my wild
guess for bzip2 would be better scheduling of its hot memmove-like loop.

So, results of better code generation due to useful knowledge from
high-level analysis are yet to be seen.

The results of testing this patch with modulo scheduling are quite
strange.  I have used loops from tree-vectorizer testsuite to gauge
gains from export, and I saw that using exported information frequently
makes SMS fail (on tree-vect. testsuite number of SMS'ed loops drops
from 129 out of 150 to 59 ot of 150).  The frequent failure scenario
would be like this: less dependencies mean less min_ii and max_ii
estimations, so max_ii can become lower than ii for which SMS succeeded
without using exported info; in case max_ii is still high enough, SMS
takes different decisions and loses.  I will be looking into these
failures in the following days.

Thanks.

--
Alexander Monakov

Attachment: export-ddg-00-all.patch.txt
Description: Text document

Attachment: export-ddg-01-delete-strange-ia64-bypass.patch.txt
Description: Text document

Attachment: export-ddg-02-misc-fix-testsuite.patch.txt
Description: Text document

Attachment: export-ddg-03-fix-dataref-issue.patch.txt
Description: Text document

Attachment: export-ddg-04-sms-use-expand_simple_binop.patch.txt
Description: Text document

Attachment: export-ddg-05-sms-use-max_asap-for-maxii-estimation.patch.txt
Description: Text document

Attachment: export-ddg-06-sms-fix-antideps.patch.txt
Description: Text document

Attachment: export-ddg-10-mem-orig-exprs.patch.txt
Description: Text document

Attachment: export-ddg-11-main.patch.txt
Description: Text document

Attachment: export-ddg-12-passes.patch.txt
Description: Text document

Attachment: export-ddg-13-make-free_data_ref-extern.patch.txt
Description: Text document

Attachment: export-ddg-14-ivopts.patch.txt
Description: Text document

Attachment: export-ddg-15-passes-fixups.patch.txt
Description: Text document

Attachment: export-ddg-16-use-in-rtl-aliasing.patch.txt
Description: Text document

Attachment: export-ddg-17-use-in-modulo-sched.patch.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]