This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
[Sorry for extra-long letter; Zdenek, I have one question for you, please see below around your name :) ; Ayal, I received unexpected results when testing my patch with modulo-scheduling, please see near the end] Hello, As before, I would like to provide an update on work done in the frame of GSoC project. The whole patch (attached) is subdivided into smaller parts (also attached) for easier viewing and reference: I. Misc. fixes: export-ddg-01-delete-strange-ia64-bypass.patch This deletes a line from Itanuim2 .md file which lists a special case for Itanium fused multiply-add insn. I have not found this special case in Intel's manuals, and my performance tests suggest that this is safe to remove. FWIW, it is nice to present more accurate information to schedulers (including SMS, which may be quite fooled by this bypass on small loops). export-ddg-02-misc-fix-testsuite.patch This changes some { scan-tree-dump-times "a.e" 0 "optimized" } to use "a\\.e" instead. This does not matter for current trunk, but fixes a "regression" I saw while reg-testing the whole patch. export-ddg-03-fix-dataref-issue.patch This was sent to me by Sebastian Pop; the purpose of this patch is to improve data dependency analysis for references in nested loops. Patches 04-06 are modulo-scheduling improvements, and are discussed in other thread. I include them here to show on what tree I conducted my work. II. Other stuff: export-ddg-10-mem-orig-exprs.patch This provides a mapping from RTL MEMs to their original trees. It is a slightly changed part of alias-export project. export-ddg-11-main.patch This is mostly what the project was about. It provides a pass (scheduled before iv_opts) that saves data references and relations obtained from compute_dependencies_for_loop for each loop, skipping those that define relation in the non-innermost container, so that for for (...) { a[i] = b[i]; for (...) c[i][j] = d[i][j]; } datarefs and ddrs saved for a[i], b[i] will define their relation according to iteration of outermost loop, and those saved for c[i][j], d[i][j] will define their relation in innermost loop (but datarefs and ddrs defining relation of c[i][j], d[i][j] according to iteration of outermost loop will not be saved). This also provides a simple verification routine that is fired up after almost every pass (both GIMPLE and RTL); it visits memory references in loops and gathers statistics on how many bits of saved information are available for them. export-ddg-12-passes.patch This just registers passes. export-ddg-14-ivopts.patch A small change was needed to notice TARGET_MEM_REF/TMR_ORIGINAL transition in iv-opts. An alternative approach would be to make MEM_ORIG_EXPR provide binding to TARGET_MEM_REF trees, not their TMR_ORIGINAL parts. export-ddg-15-passes-fixups.patch One delete_unreachable_blocks (); in rest_of_handle_thread_prologue_and_epilogue and disabling verification routine for pass_expand were needed to allow verifier run ICElessly on bootstrap, regtest, spec cpu 2k, tramp3d. export-ddg-16-use-in-rtl-aliasing.patch This uses saved data dependency information for the purposes of RTL-level disambiguation. Memory references are considered independent if corresponding DDR signifies independence, or if it has recorded distance vectors, and one of them is non-zero. Basing memory disambiguation on distance vectors would be incorrect for passes that perform cross-iteration code motion, so there is a way to turn off such behaviour for SMS. export-ddg-17-use-in-modulo-sched.patch This uses exported information in DDG construction for modulo scheduling. For now, only dependence is tested, and distance obtained from high-level analysis is not used. The whole patch survives bootstrap (ia64, x86_64, c,c++,fortran, --disable-multilib), regtest (x86_64), compiles spec cpu 2k, tramp3d. An important missing piece is correction of exported information for loop unrolling. As far as I can tell, for loop unrolled by factor N we need to clone MEM_ORIG_EXPRs and datarefs for newly-created MEMs, create no-dependence DDRs for those pairs, for which original DDR was no-dependence, and for DDRs with recorded distance vectors we will be able to recompute new distances from old distance(D) and N (but only if N % D == 0 || D % N == 0). Is that right, Zdenek? Where should I start to implement this? On x86_86, this patch produces no noticeable performance changes on SPEC CPU2000 and no code generation changes for set of small numerical benchmarks. On ia64 there are some improvements, probably due to weaker RTL disambiguation due to lack of base+offset addressing mode. There is a 1.5x speedup for Linpack-alike benchmark, where exported info helps to disambiguate references for the following loop: for (i = m; i < n; i = i + 4) { dy[i] = dy[i] + da*dx[i]; dy[i+1] = dy[i+1] + da*dx[i+1]; dy[i+2] = dy[i+2] + da*dx[i+2]; dy[i+3] = dy[i+3] + da*dx[i+3]; It also provides ~2.5% speedup for fma3d, ~1.5% for gap and ~11% for bzip2 of SPEC2K. I have not investigated causes of speedup, but my wild guess for bzip2 would be better scheduling of its hot memmove-like loop. So, results of better code generation due to useful knowledge from high-level analysis are yet to be seen. The results of testing this patch with modulo scheduling are quite strange. I have used loops from tree-vectorizer testsuite to gauge gains from export, and I saw that using exported information frequently makes SMS fail (on tree-vect. testsuite number of SMS'ed loops drops from 129 out of 150 to 59 ot of 150). The frequent failure scenario would be like this: less dependencies mean less min_ii and max_ii estimations, so max_ii can become lower than ii for which SMS succeeded without using exported info; in case max_ii is still high enough, SMS takes different decisions and loses. I will be looking into these failures in the following days. Thanks. -- Alexander Monakov
Attachment:
export-ddg-00-all.patch.txt
Description: Text document
Attachment:
export-ddg-01-delete-strange-ia64-bypass.patch.txt
Description: Text document
Attachment:
export-ddg-02-misc-fix-testsuite.patch.txt
Description: Text document
Attachment:
export-ddg-03-fix-dataref-issue.patch.txt
Description: Text document
Attachment:
export-ddg-04-sms-use-expand_simple_binop.patch.txt
Description: Text document
Attachment:
export-ddg-05-sms-use-max_asap-for-maxii-estimation.patch.txt
Description: Text document
Attachment:
export-ddg-06-sms-fix-antideps.patch.txt
Description: Text document
Attachment:
export-ddg-10-mem-orig-exprs.patch.txt
Description: Text document
Attachment:
export-ddg-11-main.patch.txt
Description: Text document
Attachment:
export-ddg-12-passes.patch.txt
Description: Text document
Attachment:
export-ddg-13-make-free_data_ref-extern.patch.txt
Description: Text document
Attachment:
export-ddg-14-ivopts.patch.txt
Description: Text document
Attachment:
export-ddg-15-passes-fixups.patch.txt
Description: Text document
Attachment:
export-ddg-16-use-in-rtl-aliasing.patch.txt
Description: Text document
Attachment:
export-ddg-17-use-in-modulo-sched.patch.txt
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |