This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: gcc compile-time performance

> > Here's another oddity.
> > 
> > Why is predict.c using the target floating-point emulation routines to do 
> > its branch probability calculations?  There must be a faster way of doing 
> > this that is good enough for the level of estimation needed here -- the 
> > probabilities are at best approximate.
> > 
> > When profiling a compilation of combine.c (a function with no floating 
> > point code), I was amazed to find that we spend 2.5% of the total 
> > compilation time in earith() and its children.
> This is curious, I was benchmarking it on similar testcase before sending
> the patch and it was about 0.5% of total compilation time spent in
> branch probability pass...

OK, I've re-run the same code but this time with a compiler built with 
-O2; it makes a little difference, but not much:

                0.01    0.06   22284/63991       estimate_bb_frequencies 
                0.03    0.11   41707/63991       propagate_freq [64]
[68]     1.8    0.04    0.17   63991         earith [68]
                0.01    0.11   28608/28608       ediv [107]
                0.01    0.02   14613/32590       emul [185]
                0.00    0.01   20770/38747       eadd1 [332]
                0.01    0.00  127982/779628      eisnan [244]
                0.00    0.00   14642/32619       eadd [635]
                0.00    0.00    6128/6128        esub [812]

That 1.8 means 1.8% of total run time, the only callers being the 
basic-block code.  Ie, ~2% of the entire compilation time is spent just 
estimating probabilities.

> > 
> > Surely either native floating-point code, or even some simple fixed-point 
> Native floating point code is problem, unfortunately since for i386 you get
> different results in optimized and non-optimized builds breaking bootstrap.
> Fixed point code is problem, as we are interested in comparisons relative
> to the highest fequency in the program.  This may be the entry block for
> tree-structured function, but it may be the internal loop and at high loop
> nests there is more than 2^30 differences between these two we can affort
> in integral arithmetics.

I'm a little surprised that we really need to preserve such huge levels of 
ordering.  Would it really matter (in terms of final code execution 
performance) if those smaller numbers underflowed to zero?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]