This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[wwwdocs] add a web page for the tree-profiling branch


This adds a page for the plans we have with the tree profiling
branch.  The primary reason for writing this was to just let
people know who's working on what and where.

There are so many ideas for things we can do with tree-ssa, I
think it would be great to make a web page for them, too.  Last
week, Diego gave me a list of things people planned to work on:

Various propagations in CCP (range, null ptr, string attrs) (dnovillo)
Reorganize DOM (dnovillo)
Jump threading without going out of SSA (law)
Enhancements to SRA (rth)
Emit GENERIC directly from the C and C++ FEs (rth/jason)
Match incoming PHI arguments and in-edges to a block (bje)
Expand RTL directly from SSA form
Export tree level aliasing information into RTL

Perhaps the people working on these ideas can add a send me a
few lines with a short description, so I can turn it into a web
Also, there are ideas missing from that list.  Let'm be known! :-)

An earlier revision of this new web page was already approved by
Gerald, but he asked for something else to be included, so I'll
wait with commiting this until he's seen the revised page, and 
until y'all have given your comments/suggestions.


Index: htdocs/cvs.html
RCS file: /cvs/gcc/wwwdocs/htdocs/cvs.html,v
retrieving revision 1.141
diff -c -3 -p -r1.141 cvs.html
*** htdocs/cvs.html	9 Jun 2004 17:33:26 -0000	1.141
--- htdocs/cvs.html	18 Jun 2004 15:06:59 -0000
*************** particular releases or snapshots or the 
*** 162,168 ****
    <dd>Work on implementing SSA analysis and optimization for trees
    is done in this branch.</dd>
!   <dt>tree-profiling-branch</dt>
    <dd>A sub-branch of tree-ssa for the development of profiling heuristics
    and profile based optimizations for trees, such as profile driven inline
    heuristics.  Another goal of this branch is to demonstrate that maintaining
--- 162,168 ----
    <dd>Work on implementing SSA analysis and optimization for trees
    is done in this branch.</dd>
!   <dt><a href="projects/tree-profiling.html">tree-profiling-branch</a></dt>
    <dd>A sub-branch of tree-ssa for the development of profiling heuristics
    and profile based optimizations for trees, such as profile driven inline
    heuristics.  Another goal of this branch is to demonstrate that maintaining
Index: htdocs/projects/tree-profiling.html
RCS file: htdocs/projects/tree-profiling.html
diff -N htdocs/projects/tree-profiling.html
*** /dev/null	1 Jan 1970 00:00:00 -0000
--- htdocs/projects/tree-profiling.html	18 Jun 2004 15:07:00 -0000
*** 0 ****
--- 1,221 ----
+ <html>
+ <head>
+ <title>Improving GCC's Interprocedural Optimization Infrastructure</title>
+ </head>
+ <body>
+ <h1>Improving GCC's Interprocedural Optimizaion Infrastructure</h1>
+ <p>This page describes ongoing work to improve GCC's infrastructure
+ for tree-based interprocedural optimizers. The work is done on a
+ branch in GCC's CVS repository called <code>tree-profiling-branch</code>.</p>
+ <p>The short-term goals of the branch are to bring profiling heuristics
+ and profile feedback information to the tree level, to add support for
+ managing multiple control flow graphs, and to implement profile guided
+ function inlining.  At the heart of this project is the call graph code
+ added for GCC 3.4, and the effort to maintain the control flow graph and
+ profile information over expanding from GIMPLE, which is the tree-based
+ intermediate representation, to RTL, the intermediate representation of
+ GCC's back-ends.</p>
+ <p>Longer-term goals include adding support for doing some optimizations
+ (such as CCP and DCE) before inlining, and implementing a framework for
+ other interprocedural optimizations, such as interprocedural constant
+ propagation and side-effects analysis.</p>
+ <p>As the patches stabilize, they will be submitted for review and
+ acceptance into the mainline development tree. Please contact <a
+ href="";>Jan Hubicka &lt;;</a> if you
+ would like to contribute.</p>
+ <h2>Bringing profile information to the tree optimization framework</h2>
+ <h3>Rationale</h3>
+ <p>When the compiler knows where optimizations are beneficial, it can
+ do a better job at generating good code for an application.  GCC has a
+ framework for profile-guided optimizations, but the profile is not
+ available early enough for several optimizations that could benefit
+ from it.  Specifically, function inlining, code generation for multiway
+ branches, various loop transformations, and value histogram based
+ optimizations are done, or could be done better, in the new Tree SSA
+ framework where profile information is not available.  Making it
+ available to these passes would allow GCC to generate better code.</p>
+ <h3>Current situation in GCC</h3>
+ <p>As of June 2004, GCC does support profile feedback and branch
+ prediction and several algorithms use the profile to improve performance.
+ The compiler can obtain profile information by trying to predict all
+ branches appearing in the intermediate representation for the application
+ during the compilation process, or from profile feedback collected during
+ test runs of the application.</p>
+ <p>Unfortunately, the profile is not available until very late in the
+ compilation process.  To understand why, you need to understand what
+ profile information is to the compiler, and how it is represented.</p>
+ <p>While the true execution profile is a program property, for the
+ compiler it is a control flow graph annotation.  Therefore, the profile
+ fed back from test runs is in reality the output of the application
+ instrumented with code on control flow graph edges and basic blocks to
+ write out execution counts.  This means that the compiler must read the
+ fed back profile at the same point in the compilation process where the
+ control flow graph was previously instrumented for the test runs,
+ because otherwise the profile data may not match with the control flow
+ graph for the intermediate representation at that point in the
+ compilation process.</p>
+ <p>After instrumenting or reading, the compiler has to update the profile
+ for each modification it does to the control flow graph.  So it has to
+ instrument the application for profiling somewhere between the first
+ point in the compilation process from where the CFG will be kept up
+ to date, and the passes for which having profile information available
+ is the most profitable.</p>
+ <p>In GCC, this means that profile information cannot be made available
+ to passes that run early in the compilation process.  The old loop passes
+ do not preserve the control flow graph, and when the CFG is destroyed,
+ any profile information available at that point is also destroyed.
+ The passes which benefit the most from profile information available,
+ such as scheduling and block reordering,all have to run after the loop
+ optimizer.  So in GCC we have to instrument the control flow graph for
+ for profiling between running the old loop optimizer and the final
+ scheduling pass.</p>
+ <p>To make things worse, the control flow graph is also destroyed before
+ the GIMPLE tree is expanded to RTL, and a profiling infrastructure for
+ trees is not available.  So simply replacing the old loop is not enough
+ to bring profile information to the Tree SSA passes.  GCC needs a way to
+ preserve profile information when expanding from trees to RTL.</p>
+ <h3>Status on the branch</h3>
+ <p>The old loop optimizer has been disabled in anticipation of its
+ replacement by the rewritten loop optimizers from the
+ <code>cfg-branch</code> and the <code>lno-branch</code>.
+ This means that all RTL passes now maintain the CFG from the start of
+ <code>rest_of_compilation</code> all the way through the reorg pass.
+ Also, GCC now knows how to maintain the control flow graph when expanding
+ from GIMPLE to RTL.  The CFG-aware expand code can be found in the new
+ file <code>cfgexpand.c</code>.  Code to instrument the GIMPLE tree for
+ profiling is being worked on also.</p>
+ <h2>Profile guided inlining</h2>
+ <h3>Rationale</h3>
+ <p>Judicious inlining of functions is important for good performance.
+ Inlining avoids function call overhead (register saves and restores,
+ parameter passing ABI (using specific registers, setting up the stack),
+ the call itself) and exposes larger functions to the optimizers for
+ for deeper enhancements.  However, inlining may cause exponential code
+ growth.  Therefore, inlining of nontrivial functions called from rarely
+ executed basic blocks is almost always a loss.</p>
+ <p>This implies that the compiler should have some means to estimate the
+ profitability of inlining using information extracted from an execution
+ profile.  The profile may be estimated from branch prediction heuristics,
+ or obtained using trial runs and fed back to the compiler.  Profiling can
+ reveal details about the actual usage of functions in the application to
+ direct the compiler to inline better and more effectively.</p>
+ <h3>Current situation in GCC</h3>
+ <p>Even if a profile was available, the current inliner would destroy it
+ for inlined functions because profile information is inherently associated
+ with the control flow graph, which is also not built until after inlining.
+ So a new inliner is needed that can inline function control flow graphs
+ instead of functions as trees.  The compiler will have to construct and
+ maintain a CFG for every function, instead of only for the function that
+ is currently being compiled.<p>
+ <h3>Status on the branch</h3>
+ <p>Support for multiple CFGs has recently been committed, and work is under
+ way on a new inliner that inlines a function's CFG instead of a GIMPLE
+ tree.  Our goal is to have this work ready in time for inclusion in the
+ next release of GCC.</p>
+ <h2>Framework for Interprocedural Analysis and Optimizations</h2>
+ <h3>Rationale</h3>
+ <p>Function calls hide information about side effects from the optimizers.
+ By inlining a function into a caller, all such information is made
+ available to the caller, but at the cost of increased code size and
+ degraded instruction locality.  Also it is not always possible to inline
+ a function, for example when it has unpredictable side effects (e.g. it
+ calls setjmp or it has non-local gotos).</p>
+ <p>Interprocedural data flow analyses (IPA) attempt to provide a different
+ means to overcome the information-hiding problem of function calls.  They
+ typically analyse the intraprocedural properties of a function,
+ and propagate that information across the call graph.  The compiler can
+ then use that information to improve intraprocedural information, such as
+ alias and dependency information, and generate better code for any single
+ function.  It can also choose to specialize a function, to remove paths
+ from a function that will never be taken, or to inline functions for which
+ intraprocedural analysis can not prove that inlining is profitable.</p>
+ <p>Experience from other compilers shows that interprocedural constant
+ propagation and interprocedural side-effects analysis may help
+ disambiguate memory dependencies, and that function specialization by
+ cloning can significantly reduce the cost of function calls in highly
+ abstracted applications (such as C++ applications using the STL).  For
+ GCC we believe that implementing these interprocedural analyses would
+ make it possible to greatly improve the generated code in many
+ applications.</p>
+ <h3>Current situation in GCC</h3>
+ <p>At present, function inlining is the only interprocedural optimization
+ implemented in GCC.  Until recently, a framework for IPA was lacking, and
+ the intermediate representation was too low-level to allow interprocedural
+ analyses to be effective.</p>
+ <p>With the contribution of code to construct and analyze the call graph,
+ and with the new optimization framework and higher level intermediate
+ representations from the Tree SSA project, an IPA framework is within
+ reach.  The major remaining technical obstacles for IPA are inefficient
+ data structures for the <code>tree</code> intermediate representations,
+ which may cause memory consumption to rise excessively when used as-is in
+ an IPA framework.  Other obstacles of non-technical nature also make it
+ it impossible at present to implement across-file interprocedural
+ optimizations other than those provided by the intermodule framework,
+ which also suffers from the inefficient structure of the
+ <code>tree</code> representations.</p>
+ <h3>Status on the branch</h3>
+ <p>At present, no interprocedural work is being done on the branch, but
+ a patch for interprocedural constant propagation is available and will
+ probably be included on the branch soonish.  Otherwise, plans exist for
+ reorganisation of the compiler passes as follows:</p>
+ <ol>
+ <li>For each SCC in the call graph in reverse topological order<br />
+     <ul>
+     <li>do simple intraprocedural optimizations in SSA form (basically,
+ 	run anything that is cheap and can not create overlapping live
+ 	ranges, but will improve the results of the next step);</li>
+     <li>collect intraprocedural information necessary for
+ 	interprocedural data flow analyses;</li>
+     <li>store the function's intermediate representation somewhere.</li>
+     </ul></li>
+ <li>Perform interprocedural analyses and optimizations (at least constant
+     propagation, side-effects analysis, and function cloning and
+     specialization).</li>
+ <li>For each SCC in the call graph in reverse topological order
+     <ul>
+     <li>do remaining intraprocedural optimizations;</li>
+     <li>hand over the optimized function to the RTL expanders for
+ 	finalization and code generation.</li>
+     </ul></li>
+ </ol>
+ </body>
+ </html>

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]