Differences between revisions 35 and 36
Revision 35 as of 2009-03-28 10:27:32
Size: 9565
Comment: Merge aib-mail content to Wiki to have all information in one place
Revision 36 as of 2009-03-28 10:41:34
Size: 8792
Comment:
Deletions are marked like this. Additions are marked like this.
Line 11: Line 11:

I propose to merge the branch at the beginning of stage1 for GCC 4.5.
Bugfixes put on the branch can be merged independently. Likewise it
should be possible to merge the points-to changes and some of the
call-clobber changes before the rest.
Other changes though are difficult or impossible to separate without
intermediate regressions.
Line 40: Line 33:
 any symbols for renaming (instead this is now useless and costly,
 code auditing is still necessary here
).  As there cannot be magically
 any symbols for renaming (instead this is now useless and costly).
As there cannot be magically
Line 45: Line 38:
 updates virtual SSA form manually in the simple cases).  updates virtual SSA form manually in the simple cases of removing
 virtual operands and otherwise marks .MEM for renaming
).
Line 63: Line 57:
 the bitmap of addressable variables and the escape type per variable
 and pointer.
 the bitmap of addressable variables, the escape type per variable
 and pointer and the bitmaps of loaded and stored symbols on statements.
Line 68: Line 62:
 existing call-clobber related code has been removed.  existing call-clobber related code has been removed.  Low-level
 operations on points-to information are implemented in tree-ssa-structalias.c.
Line 73: Line 68:
 use points-to alias information.  use points-to alias information and it has been split up to be more
 easily available in different contexts. The implementation resides
 in tree-ssa-alias.c, interfaces are exported from tree-ssa-alias.h.
Line 80: Line 77:
 done by the tree DCE pass on the trunk (but see below).  on aliased symbols done by the tree DCE pass on the trunk.

 * The tree DCE pass has been improved to do DSE of non-aliased symbols
 and thus fully replaces the SDSE (simple DSE) early optimization pass.
Line 91: Line 91:
other runs on the frescobaldi machine.  Performance comparisons on other
architectures are welcome.
other runs on the frescobaldi machine.
Line 109: Line 108:

 * The pass-pipeline needs re-tuning. We re-compute points-to information
 too often and at non-optimal places (is no longer necessary for correctness
 to "re-compute" alias information).

 * Special "early" passes can get removed, merged or be made use of
 generally in the optimization pipeline.

 * The DCE / DSE situation needs to be resolved.
 On the trunk DCE does remove far more dead stores than DSE does. On
 the branch dead store removal from DCE is severely pessimized (because
 it still relies on the virtual operands). DSE is improved on the branch,
 but it runs too late and not often enough and is also very simple.
 The best solution is to remove the DSE pass and do all (and proper)
 dead store removal from within DCE, but this is quite an effort and
 as can be seen from benchmark numbers not a blocker for merging the
 branch.

Alias Improvements Branch

(see http://gcc.gnu.org/ml/gcc/2009-01/msg00286.html)

The primary objective of the branch is to transition away from the use of the virtual operand use-def chains as data-flow representation for memory optimizations. Memory optimizers need to transition to the use of the alias-oracle which is a query based system. For the foreseeable future they can still rely on the safety-net the virtual operand use-def chain provides.

Overview of infrastructure changes done on the branch:

  • There is a single memory symbol (.MEM) per function used to build the virtual operand use-def chain. In trunk terms you can view this as equivalent to having a single memory partition containing all symbols. Due to the lack of precision computing and keeping this virtual operand use-def chain is trivial, all code supporting more complex operations has been removed. In particular it can be assumed that at most a single VUSE and VDEF is available per statement. Most code has been simplified according to that assumption.
  • This single-symbol virtual operand use-def chain safety-net is also available during early optimizations (in fact, always if the IL is in SSA form). This removes the need for special-casing memory statements during that compilation phase and enables to use common infrastructure during early (and IPA) optimization phases. It also allows to schedule regular memory optimization passes during early optimization.
  • The virtual operand use-def chain is kept up-to-date by the operand-scanner on the branch. There is no need to manually mark any symbols for renaming (instead this is now useless and costly). As there cannot be magically more, less or different memory symbols required (there's only one) optimization passes usually know if they are removing a load or a store and so can act properly (or rely on the operand scanner which updates virtual SSA form manually in the simple cases of removing virtual operands and otherwise marks .MEM for renaming).
  • The at most two virtual operands (a VUSE and a VDEF) are now real operands (there is a vuse and a vdef member in the gimple structure for memory statements). This means they can share the infrastructure present for regular operands and immediate uses. All infrastructure dealing with the old virtual operands has been removed. The operand structures are linked in the normal use and defs lists, the special lists for virtual uses and defs has been removed. This means that operand iterators have been simplified a lot. The possibility to iterate over only virtual operands has been removed (there is only at most a single one, directly accessible via gimple_v{use,def}_op). Most bulk changes in optimizers had to be done because of this simplification.
  • Global data that was not kept in sync and is either unused on the branch or very hard to keep up-to-date has been removed. This includes the bitmap of addressable variables, the escape type per variable and pointer and the bitmaps of loaded and stored symbols on statements.
  • Points-to information has been abstracted and is directly used for pointer information as well as call-clobber related information. All existing call-clobber related code has been removed. Low-level operations on points-to information are implemented in tree-ssa-structalias.c.
  • Numerous fixes to the points-to analysis code have been done.
  • The alias-oracle present on the trunk has been extended to also use points-to alias information and it has been split up to be more easily available in different contexts. The implementation resides in tree-ssa-alias.c, interfaces are exported from tree-ssa-alias.h.
  • The FRE and PRE passes using the common SCCVN infrastructure have been improved by improving the memory statement walking infrastructure of the alias-oracle and its precision.
  • The tree DSE pass has been improved to compensate the loss of DSE on aliased symbols done by the tree DCE pass on the trunk.
  • The tree DCE pass has been improved to do DSE of non-aliased symbols and thus fully replaces the SDSE (simple DSE) early optimization pass.

The alias-oracle and its interface is contained in the tree-ssa-alias.[ch] files. Common with the trunk is the refs_may_alias_p type of query which has been accompanied by statement based clobber / use queries.

Preliminary performance measurements show comparable performance for SPEC 2000, Polyhedron and our usual set of C++ benchmarks. SPEC 2006 measurements are on the way, all on x86_64. Daily testing can be monitored beyond http://gcc.opensuse.org/, look for "alias-improvements" branch annotation and compare with the other runs on the frescobaldi machine.

In general the trade-off with the new scheme is that walking statements via the virtual use-def links will be slower because possibly (a lot) more statements are visited. This is offsetted by leaner operand iterators, cheaper operand scanning and updating and less memory use for maintaining the virtual operands. And most important, a lot less code to maintain.

Pre-Merge TODO

Followup Improvements

There are a number of tasks left to do, either on the branch or (prefered) on trunk after merging the branch.

  • push/pop_stmt_changes can be cleaned up, simplified, or even removed.
  • The inliner can rewrite the virtual operands itself just like regular operands. The only difficulties are that new pure/const functions are exposed after IPA-pure-const and that new PHI-nodes need to be inserted to merge VDEFs in the inlined function to the VDEF of

    the call statement. Partial patch available.

  • The single pre-function gimple_vop (cfun) DECL can be made per CU. The disadvantage is that cross-function inconsistencies are harder to detect.
  • Most passes that mark symbols for renaming should not need to. This is mostly fixed.

  • PTA should keep both an unpruned points-to set and a TBAA-pruned points-to set. This allows per alias-oracle query decisions on whether TBAA should be applied or not. This also allows (a fixed - see its comments) merge_alias_info to do its job using the unpruned sets.
  • into-SSA could go into virtual SSA form directly, without doing an ssa update

    to rename the virtual operands after into-SSA. into-SSA expert needed.

  • In PTA we can generate a CALLUSED variable and its constraints for each pure call (just like we do with consttmp for const calls).
  • Simple mod/ref analysis can be done in PTA handle_rhs_call for builtins. call-used queries have to be extended to builtin function calls then (or finally a stmt local call-used points-to solution reference can be introduced).
  • PRE should compute a per-BB maximal set, unioning from the exit block upward the CFG. This should make the validity checks for the VUSEs unnecessary and speed up phi-translating the maximal set.
  • Add params for the magic number(s) in the DCE/DSE oracle query limits.
  • Add oracle query limits to SCCVN.

Testsuite status on x86_64, i?86

FAILs that appear also on trunk and XPASSes are omitted.

gcc tests

FAIL: gcc.dg/uninit-B.c uninit i warning (test for warnings, line 12)
FAIL: gcc.dg/uninit-pr19430.c  (test for warnings, line 32)
FAIL: gcc.dg/uninit-pr19430.c uninitialized (test for warnings, line 41)

Uninitialized memory warnings will no longer work, or rather, the current implementation will be even more fragile. Action: uninit-B.c to be XFAILed again, the rest to be XFAILed.

FAIL: gcc.dg/tree-ssa/ssa-fre-10.c scan-tree-dump pre "Insertions: 2"

Weird testcase, it is not clear what it was supposed to test. Action: Move to the compile torture. Actually PPRE would perform insertion here. See PR38401 for a patch unleashing it (at -O3). There is also missing DSE in this testcase (because DSE doesn't look through PHIs) and missing "FRE" (the loads are all from uninitialized memory which we could detect and substitute an uninitialized register instead). A really weird testcase.

FAIL: gcc.dg/tree-ssa/ssa-pre-24.c scan-tree-dump-not pre "prephitmp"

A new testcase on the branch. We shouldn't insert a prephitmp here. The issue is that PHI-translating VUSEs should return the same VUSE if the expression does not die on the other edges of the PHI (thus is loop invariant for example). In general PHI-translation shouldn't change the expression if its value doesn't change. This may conflict with the purpose of the VUSE being a stmt location, so further phi-translating a "non"-translated VUSE may confuse the machinery.

None: Alias_Improvements (last edited 2010-02-14 11:03:09 by RichardGuenther)