This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [tree-ssa, RFC] CFG transparent RTL expansion


> On Thu, 2003-12-18 at 19:44, Jan Hubicka wrote:
> > Hi,
> > I am attaching patch implementing the quite much discussed idea of expanding
> > RTL out of GIMPLE CFG transparently.  The patch do have some rough edges and
> > can be decomposed to about 5 incremental steps so it is not intended for
> > detailed review rather than a demonstration/proof of the concept in a hope that
> > it will make decision easier (the patch however bootstraps and works well).
> > 
> > The problem I am trying to solve is question on how to preserve profile
> > information out of SSA based passes.  It is clear that most effective consumers
> > of profile include inlining, loop optimization, register allocation and basic
> > block reordering so profile must be available in about whole compilation pass.
> > Actually measuring profile in multiple stages of compilation is possible, but
> > earlier optimizations invalidate later measured profile so it require user to
> > do multiple train runs that is unfortunate so I think it is a must to have
> > machinery to preserve profile as well as possible from before inlining up to
> > final.
> > 
> > From the work on RTL, the experience shows that CFG is the most natural place
> > to store information into and about only place where updating after control
> > flow transformations is manageable (still not easy), the alternatives like
> > remembering execution counts for each instructions or branch probabilities for
> > conditional branches does not work well for nontrivial updates.
> 
> I need more details :-). 
> 
> This implements something which isnt actually used yet (ie, no
> collection emitted/done). If I read it correctly, just a proof of
> concept of a persistant CFG. So lets skip to how it would actually be
> used. And excuse my lackof knowledge about how gcc collects profile info
> :-)
I already do have patches to produce profile estimates early (no profile
feedback, but it is the same thing for updating)
> 
> 1 - Where do you propose to emit the measurement code, and where do you
> plan to read it in? Presumably they have to be the same location...  I
> assume the measurement code is currently emitted in RTL land, probably
> early in the rtl cycle? Would we be doing that in the front end, or in
> SSA land? or just before SSA after the CFG is created

Currently profiling is run just after loop unroller in RTL compilation
path.  This is because old loop unroller does mess up profile, otherwise
it would be done before that.

I plan to move it to very early SSA compilation path, just after early
cleanups (ideally DOM and DCE) is done so we don't end up profiling
unnecesairly compilicated CFG.
> 
> 2 - Does profiling work on abnormal edges?  Ie, are throw edges
> annotated, or are they guessed/ignored/calculated when possible?

It works on noncritical abnormal edges, for critical abnormal edges we
guess, but of course the idea is that once we make EH edges redirectable
(and thus not abnormal) abnormal edges will be real anomality and very
few of them will be critical (setjmp edges are not and computed jump
edges also won't create critical ones as long as we stay with one
computed jump per function trick we do at SSA now)
> 
> 3-  So, assume the profile information on trees is going to be kept in
> the CFG only right through into RTL. When I inline function 'a' into
> function b:
> 
>   int a(int y)
>   {
>     if (y < 20)   
>       return 1; 
>     else
>       return 0;  /* Profiling says this branch is taken 90% of the time.  */
>   }
> 
>   void b()
>   {
>     <...>
>     a(z)
>     <...>
>   }
> 
> 
> simplistic example, but lets say profiling indicates that we ought to
> inline a(z) when we are processing b. How does that profile information
> from a() get propagated into the code which is inlined into b? There is
> no CFG for a(), just the trees. Do you have to read the info for a() in
> from the profile source each time its inlined or how does that work?

The inlining problem is not dealt with this particular patch, it solves
just SSA<->RTL interface, but you know what is my longer term plan, but
lets try to not run into very deep details with the inlining interface
right now.

This is common problem with updating profile (not only after inlining
but after any code specialization - unrolling, tracing or whatever). 

The idea is when such conflict is noticed, the profile gets updates
somehow partly incorrectly (such that all basic block directly dominated
by edge being removed gets their frequencies subtracted).  The
optimizations expect such partially invalided profiles and must behave
sanely in presence of them.

The profile becomes less exact during optimization process, but overall
most of compilers (ORC, IMPACT for instance) are able to keep it by such
simplistics methods in good shape to bring benefits on BB reordering.
Most questions about the profile are easy, like "is this loop executed
insanily many times" or "is this basic block hot?"
these questions are relatively stable WRT slight degenerations.

Relearch compilers usually provide method to additionally profile in
later compilation stages but the benefits are very low. Another more
research topis are path profiles that allows profiling of such
situations.
> 
> 4- Does anyone other than me find the idea of inlining optimized trees
> appealing? I understand thats not on the table right now because there
> are difficulties with EH, at the very least. I would think we could
> encapsulate that info somehow, but clearly thats a problem that would
> need to be solved.

Sure it is appealing.  In order to do something sane, you need to do
some analysis that are not doable without CFG/early cleanups.
As pointed out already, the benefits of inlining estimate by time of
execution of the callee, while the costs is the size.  At the moment we
do have only the second information and our strategy is "do as many
inlinining as size constraint allows and lets hope that very fast
functions will get inlined too then"

For instance my current recursive inlining code would do much better job
if we are able to discover tail calls.  Another examples include partial
inlining if functions like
if (test)
   common fast path
else
   something large

can be inlined as

if (test)
  fast path
else
  do_something_large ()

Honza
> 
> Im sure more questions will occur to me, but thats it for now :-)
> 
> Andrew
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]