This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: FDO and LTO on ARM


> >
> > In a way I like the current scheme since it is simple and extending it
> > should IMO have some good reason. We could refine -Os behaviour without
> > changing current predicates to optimize for speed in
> > a) functions declared as "hot" by user and BBs in them that are not proved
> > cold.
> > b) based on profile feedback - i.e. we could have two thresholds, BBs with
> > very arge counts wil be probably hot, BBs in between will be maybe
> > hot/normal and BBs with low counts will be cold.
> > This would probably motivate introduction of probably_hot predicate that
> > summarize the above.
> 
> Introducing a new 'probably_hot' will be very confusing -- unless you
> also rename 'maybe_hot', but this leads to finer grained control:
> very_hot, hot, normal, cold, unlikely which can be hard to use.  The
> three state partition (not counting exec_once) seems ok, but

OK, I also preffer to have fewer stages than more ;)
> 
> 1) the unlikely state does not have controllable parameter

Well, it is defined as something that is not likely to be executed, so the requirement
on count to be less than 1/(number_of_test_runs*2) is very natural and don't seem
to need to be tuned.

> 2) hot_bb_count_fraction parameter which is used to determine
> maybe_hotness is shared for all FDO related passes. It is much more
> flexible (in terms of tuning) to allow each pass (such as inlining) to
> define its  own thresholds.

Some people call towards fewer parameters, other towards more, it is always
matter of some compromise.  So before forking the notion of hotness for individual
passes we would need to have some good reasoning on why this is very important.
> >
> > If we want to refine things, we could also re-consider how we want to behave
> > to BBs with 0 coverage. I.e. if we want to
> > ?a) consider them "normal" and let the presence of -Os/-O123 to decide
> > whether they are size/speed optimized,
> > ?b) consider them "cold" since they are not executed at all,
> > ?c) consider them "cold" in functions that are otherwise covered by the test
> > run and "normal" in case the function is not covered at all (i.e. training X
> > server on particular set of hardware may not convince GCC to optimize for
> > size all the other drivers not covered by the train run).
> >
> > We currently implement B and it sort of work well since users usually train
> > for what matters for them and are happy to see binaries smaller.
> 
> Yes -- we assume user will do his best to find representative training
> data to avoid bad optimizations, so b) should be fine.

I also think so, one notable exception are however the hardware drivers where it is inherently
hard to test all possible combinations in common use.  However I guess one should avoid
FDO compiling those for this reason.

Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]