This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
[wwwdocs] cfg branch homepage update

From: Jan Hubicka <jh at suse dot cz>
To: gcc-patches at gcc dot gnu dot org
Date: Thu, 24 Jan 2002 22:00:21 +0100
Subject: [wwwdocs] cfg branch homepage update
Hi,
I am installing the attached patch to the branch to make it match current
status of cfg branch sources.

Honza

Index: cfg.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/projects/cfg.html,v
retrieving revision 1.2
diff -c -3 -p -r1.2 cfg.html
*** cfg.html	2001/11/21 13:35:49	1.2
--- cfg.html	2002/01/24 20:57:26
***************
*** 1,10 ****
! <html>
  <head>
  <title>Improving GCC's Infrastructure (Central Flow Graph)</title>
  </head>
  <body>
! <h1 align="center">Improving GCC's Infrastructure
! (Central Flow Graph)</h1>
  
  <p>This page describes ongoing work to improve GCC's infrastructure
  for RTL based optimizers. The work is done on a branch in GCC's CVS
--- 1,9 ----
! 
  <head>
  <title>Improving GCC's Infrastructure (Central Flow Graph)</title>
  </head>
  <body>
! <h1>Improving GCC's Infrastructure (Central Flow Graph)</h1>
  
  <p>This page describes ongoing work to improve GCC's infrastructure
  for RTL based optimizers. The work is done on a branch in GCC's CVS
*************** as described in <a href="#3">[3]</a>.</p
*** 64,76 ****
  
  <h4>Status</h4>
  
- <p>The code is on the branch.</p>
- 
  <p>The pass uses tail duplication to form superblocks for easy
! optimization by other passes. A first version of the code gives a
! speed up of about 1.7% on non feedback driven compilation at the
! expense of 5% code size increase.</p>
  
  <h3>Loop Peeling</h3>
  
  <p>This work is done by <a href="mailto:jh@suse.cz";>Jan Hubicka</a>.
--- 63,75 ----
  
  <h4>Status</h4>
  
  <p>The pass uses tail duplication to form superblocks for easy
! optimization by other passes. It brings more than 1% speed increase
! on specint2000 with profile feedback.  Using static profile estimation
! it is pretty much hit or miss.</p>
  
+ <p>The code is on the branch, it needs to be tunned.</p>
+ 
  <h3>Loop Peeling</h3>
  
  <p>This work is done by <a href="mailto:jh@suse.cz";>Jan Hubicka</a>.
*************** the loop.</p>
*** 87,93 ****
  
  <p>The loop peeling optimization will be done as part of the
  superblock formation.  The tracer should peel as many iterations as
! can be predicted.
  </p>
  
  <h4>Status</h4>
--- 86,93 ----
  
  <p>The loop peeling optimization will be done as part of the
  superblock formation.  The tracer should peel as many iterations as
! can be predicted.  Zdenek has also implemented simple loop peeling
! as part of new loop optimizer.
  </p>
  
  <h4>Status</h4>
*************** can be predicted.
*** 97,102 ****
--- 97,133 ----
  <p>The tracer peels loops with one predicted iteration. We should try
  to peel more than one iteration.</p>
  
+ <h3>Loop optimizer re-implementation</h3>
+ 
+ <p>This work is done by <a href=
+ "mailto:rakdver@atrey.karlin.mff.cuni.cz";>Zdenek Dvorak</a>.</p>
+ 
+ <h4>Theory</h4>
+ 
+ <p>Current loop optimizer do use information passed by frontend
+ to discover loop construct to simplify flow analysis.
+ It is dificult to keep the information up-to-date and nowday
+ it is easy to implement the loop discovery code on CFG.
+ </p>
+ 
+ <h4>Implementation in GCC</h4>
+ 
+ <p>
+ We want to use the Michael Hayes loop discovery code and slowly
+ replace existing features of loop optimizer by new one.
+ So far Zdenek has modified the datastructure to allow easier
+ updating and implemented loop unrolling, peeling and unswitching
+ on the top of it.
+ </p>
+ 
+ <h4>Status</h4>
+ 
+ <p>The main changed are on the branch, the optimizations itself
+ are to come later.</p>
+ 
+ <p>The tracer peels loops with one predicted iteration. We should try
+ to peel more than one iteration.</p>
+ 
  <h3>Software Trace Cache</h3>
  
  <p>This work is done by <a
*************** detailed description.</p>
*** 120,129 ****
  interprocedural optimizations are out of reach of current GCC.</p>
  
  <h4>Status</h4>
- 
- <p>A first version has been implemented.  The patch needs to be tested
- and benchmarked.</p>
  
  
  <h3>Web Construction Pass</h3>
  
--- 151,159 ----
  interprocedural optimizations are out of reach of current GCC.</p>
  
  <h4>Status</h4>
  
+ <p>The code is in the branch. The exact benefits needs to be measured
+ but on non-PDO runs it brings 5% to eon benchamrk.</p>
  
  <h3>Web Construction Pass</h3>
  
*************** need to unshare temporaries, such as tra
*** 143,149 ****
  <p>The implemention of the web pass uses Michael Hayes's dataflow
  module to construct du-chains and unionfind to produce webs.</p>
  
- 
  <h4>Status</h4>
  
  <p>The code is on the branch.</p>
--- 173,178 ----
*************** module to construct du-chains and unionf
*** 151,156 ****
--- 180,207 ----
  <p>This pass improves SPECint2000 performance by roughly 1%. We need
  to work on updating debug information in this pass.</p>
  
+ <h3>Register coalescing Pass</h3>
+ 
+ <p>This pass coalesces multiple registers into single in order to
+ avoid register to register copies that our register allocator is not
+ able to deal with very well.  It is a kind of temporary solution until
+ the new register allocator is in place.  The benefits after than are
+ questionable, but still it is more effective (and probably less
+ expensive) than the current copy propagation implementation.
+ 
+ <h4>Implementation in GCC</h4>
+ 
+ <p>It is designed as stand alone pass constructing conflict graph and
+ coalescing register run after GCSE.</p>
+ 
+ <h4>Status</h4>
+ 
+ <p>The code is on the branch.</p>
+ 
+ <p>This pass improves some of SPECint2000 tests and degrades others basically
+ because of lack of live range splitting and increased register pressure in
+ final program.  It helps by about 1 point on peak SPECint runs.<p>
+ 
  <h3>Jump Combining Pass</h3>
  
  <p>This work is done by <a href="mailto:jh@suse.cz";>Jan Hubicka</a>.
*************** Profile data is used to control the amou
*** 179,186 ****
  
  <h4>Status</h4>
  
! <p>A first version has been written and will be installed on the
! cfg-branch soon (as of 2001-11-21).</p>
  
  <h3>High Level Branch Prediction</h3>
  
--- 230,238 ----
  
  <h4>Status</h4>
  
! <p>The code is on the branch. Benefits for Athlon CPU are not emasurable.
! This can be saved by adding more simplifications. Benefit for wider issue
! CPUs needs to be evaulated.</p>
  
  <h3>High Level Branch Prediction</h3>
  
*************** that the internal (continue) loop is not
*** 210,217 ****
  
  <h4>Status</h4>
  
! <p>A first version has been written and will be installed on the
! cfg-branch soon (as of 2001-11-21).</p>
  
  <h3>Thread Safe Profiling</h3>
  
--- 262,270 ----
  
  <h4>Status</h4>
  
! <p>The code is on the branch. Constant/negative/nil return heuristics
! all have over 90% hitrate covering about 1% branches together. Continue
! heuristics increases hitrate of loop branch/exit heuristic a bit.</p>
  
  <h3>Thread Safe Profiling</h3>
  
*************** environment will be called that returns 
*** 231,240 ****
  
  <h4>Status</h4>
  
! <p>A first version has been written and will be installed on the
! cfg-branch soon (as of 2001-11-21).</p>
  
- 
  <h3>Flexible and Safe GCOV Format</h3>
  
  <p>This work is done by <a
--- 284,291 ----
  
  <h4>Status</h4>
  
! <p>A first version has been written and is installed to the branch.</p>
  
  <h3>Flexible and Safe GCOV Format</h3>
  
  <p>This work is done by <a
*************** look like a real file format.</p>
*** 263,272 ****
  in <a href="#3">[3]</a>.</p>
  
  <h4>Status</h4>
  
! <p>This task has not been started yet.</p>
  
  
  <h3>Miscellaneous Patches</h3>
  
  <p>Achievements that have been merged into mainline already:</p>
--- 314,438 ----
  in <a href="#3">[3]</a>.</p>
  
  <h4>Status</h4>
+ 
+ <p>First version is included in the branch. Due to ability to compute
+ overall summary of counters it allows detection of hot spots in program
+ and improves to detect hot spots better reducing code size and improving
+ performance occasionally.</p>
+ 
+ <h4>Procedure splitting</h4>
+ 
+ <p>To shorten branches and avoid pollution it is desirable to put
+ frequently executed procedures together and split their infrequently
+ executed parts.</p>
+ 
+ <h4>Status</h4>
  
! <p>We don't split the function body, but we do categorize functions
! into different parts.
! A first version has been written and has been installed on the branch.
! It adds about 1% to spec2000 runs with profile feedback.</p>
  
+ <h3>Flexible and Safe GCOV Format</h3>
+ 
+ <p>This work is done by <a
+ href="mailto:jh@suse.cz";>Jan Hubicka</a>.</p>
  
+ <h4>Theory</h4>
+ 
+ <p>Currently the profile is fragile, since there is no verification
+ that the compiled program matches the profiled data.  Since the
+ profiler eliminates the redundancy in data, the mismatch is often not
+ discovered at all thereby making results to be completely
+ nonsense.</p>
+ 
+ <h4>Implementation in GCC</h4>
+ 
+ <p>We plan to calculate a checksum (CRC) of each graph when it is
+ constructed.  Also the format will be extended so that more
+ information can be recorded, such as histograms for the number of
+ iterations for loops that can then be used for loop peeling and loop
+ unrolling.</p>
+ 
+ <p>We also want to add versioning and further information to make it
+ look like a real file format.</p>
+ 
+ <p>There are a few references to much more advanced profiling systems
+ in <a href="#3">[3]</a>.</p>
+ 
+ <h4>Status</h4>
+ 
+ <p>First version is included in the branch. Due to ability to compute
+ overall summary of counters it allows detection of hot spots in program
+ and improves to detect hot spots better reducing code size and improving
+ performance occasionally.</p>
+ 
+ <h3>MIDlevel RTL</h3>
+ 
+ <p>This work is done by <a
+ href="mailto:jh@suse.cz";>Jan Hubicka</a>.</p>
+ 
+ <h4>Theory</h4>
+ 
+ <p>The current RTL representation is too low level for a number of
+ optimizations that we want to implement.  This makes the
+ implementation difficult and even ineffective in some cases.  MidRTL
+ will be essentially a RTL representation of a program lacking the tons
+ of architectural details making the transformations of instruction
+ chain much easier and therefore being the basis for new optimization
+ passes.</p>
+ 
+ <h4>Implementation in GCC</h4>
+ 
+ <p>The following outline has been writen by Jeff Law:
+ <PRE>
+ Particularly for the task of building the machine description for the generic
+ RTL and the translation from generic RTL into target specific RTL.  Those two
+ (closely related) tasks are the biggest hunks of infrastructure we need.
+ 
+ Conceptually what we want is a md file which describes the set of generic
+ RTL patterns.  We're not precisely sure on what predicates to use for
+ operands, so invent something like "generic_operand" which we will more
+ precisely define later.  Make all operands use "generic_operand".  At
+ least initially accept MEMs, REGs and all constants.
+ 
+ The only "tricky" part of "generic_operand" is memory addresses; at least
+ at this stage, generic_operand should accept a MEM with a "generic_address".
+ A "generic address" should be any "generic operand", except another MEM.
+ (ie we don't allow (MEM (MEM)) at this stage.  We will likely refine this
+ later.
+ 
+ Once you've got a rough framework in place, you have to build out some
+ infrastructure for lowering.  Specifically we'll need the ability to have
+ two recognizers.  Early in compilation we'll only recognize the generic
+ patterns.  When we start lowering we want to recognize the target specific
+ expanders, patterns, splitters, etc.  So you'll probably have to hack up
+ genrecog to build two insn-recog files with different prefixes for anything
+ with external scope and some infrastructure to switch between them based
+ on a state variable which indicates we've started the lowering process.
+ 
+ For the actual lowering process, we want to avoid twiddling the existing
+ backends as much as possible.  If I recall, the general idea Richard and
+ I came up with was:
+ 
+ For each insn:
+ 
+   Try to see if it matches a pattern in the target's md file.  If it does,
+   then you're done.  Similarly, if it matches a splitter, then run the
+   splitter, match the resulting insns in the target md file and your done.
+ 
+   Else generic.md should call into optabs.c to expand operations -- this will
+   in large part allow target specific lowering using existing mechanisms.
+   (ie, lowering of operands, running target expanders, etc etc).  It's likely
+   we'll need to refine this some, but this is the general idea.
+ 
+ </PRE>
+ 
+ <h4>Status</h4>
+ 
+ <p>A first version bootstraps and will be hopefully included in the branch
+ soon.  It is far from being complete thought.</p>
+ 
  <h3>Miscellaneous Patches</h3>
  
  <p>Achievements that have been merged into mainline already:</p>
*************** duplication.</li>
*** 303,308 ****
--- 469,477 ----
  <li>Avoid rebuilding the  CFG for <code>toplev.c</code>.</li>
  </ul>
  
+ <li>Opcode heuristics tweeks.</li>
+ <p>Avoid predicting equivality comparisons to 0 and floating point comparisons
+ to increase hitrate of opcode heursitic.</p>
  
  <h2>Branch Status</h2>
  
*************** Ramakrishna Rau; ACM SIGPLAN Notices, 19
*** 394,396 ****
--- 563,566 ----
  
  </body>
  </html>
+
Follow-Ups:
- Re: [wwwdocs] cfg branch homepage update
  - From: Gerald Pfeifer
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]