This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Replace function specific cold attribute handling by profile


> 
> I was just finishing up my next round of patches for the function specific
> options.  One of the things the patches did was remove cold/hot from being
> default behavior on the i386/x86_64, and move it to a new switch that is not
> turned on by default.  I like your approach about tieing this into the

That would be nice too.  In particular I am concerned about side effect
that makes inlining disabled across hot/cold boundaries.  There are
definitly sane cases where cold functions should be inlined for size (or
for the branch prediction hint itself as in kernel) and hot functions
ale might benefit from being inlined into their callers.

So for next release I would like to have cold/hot functions inlinable
as this seems important for existing practices.

(for next stage1 I would like to improve code proving coldness of
functions since whole scheme is bit useless by fact that we hardly prove
significant portions of program cold with our function level predictor
and fixed rations.  With profile feedback we can find where program
spends 99% of time and mark everything else cold, without profile
feedback we can at least propagate cold/noreturn attributes to functions
always calling cold/noreturn functions and track what is called just
once in program to rule out main/constructors/destructors to be
perofmrance optimized except for loops.).

> profiler, which I had wanted to do, but hadn't gotten to that point.
> 
> How did you want to proceed?  Should I commit the larger patch, and then have
> you tweak the attribute parts?  Should I remove the new switch alltogether?  I

I guess removing new switch and using the profile bits would make more
sense now.
As I've mentioned in previous mail, optimize_size = 1 set should now be
completely subsumed by compiler except for cost tables I plan to work on
now. For hot situation is a bit more stubble, but I am not convinced
current scheme of enabling -O3 function specific optimizations (that is
postreload gcse, predictive commoning, unswithcing and vectorizing) is
very good solution since -O3 mix performance improvements with more
risky transformations...

Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]