This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: std::pow implementation

	Sorry, I'm certainly replying a bit late... said:
> | > It suffices to point out that (defunct) KCC did outperform GCC on most
> | > real world code. 
> |
> | But surely not due to honouring the inline keyword. 

> Its honouring the inline keyword was most certainly part of that. 

Well to be honest at the time the inline-limit was put into gcc, gcc 
was much better than KCC (in terms of inlining not in terms of 
overall efficiency of the compiled code). A simple program like:

template <int N>
struct A {

   void f() {

template <>
struct A<0> {
   void f() { }

int main()

was causing a real memory hog with gcc (which inlined everything and 
not realizing that the functions were empty !!) 
while KCC was stopping after a few inlining levels. At that time, 
inlining was done on RTL was quite fast and the inlining limit was
very high...

Then a lot of things changed, garbage collection was introduced, tree 
based inlining was introduced, function-at-a-time and now even 
unit-at-a-time strategy, a new standard C++ library came in, ...
(this list is certainly not intented to be used as a list of guilty
changes) ... and inlining (presumably among other things) started to
be really slow. 

Then, someone noticed, that one way to reduce the time the compiler 
was spending in doing inlining was simply to reduce the inlining 
limit (and even to introduces new ones). And there were some 
convincing examples to show that the overall performance of the 
programs were not decreased (at least too much). Unfortunately, with 
more testing, this has proved to be not entirely true.

So saying that KCC was better than gcc because it inlined more was 
simply not true (at the time where this story begins), KCC was still 
often better than gcc and gcc was inlining everything it could (ie 
marked as inline either explicit or implicit).

The main conclusions this leads to, are (IMHO):

- The simple example above shows clearly that some limit has to exist 
and to be put by the compiler (of course here the function is empty 
and the compiler could have figured it out, but imagine if it were 

- At some point (egcs times, around 1998), even while inlining much 
much more than today gcc was faster and consumed less memory (and 
certainly had also difficult memory management related bugs...). So 
the scheme that Gaby describes worked (with some very high limits) at 
some point in the past and people were not complaining...

- The current limitting strategy is coming from a very practical point of 
view (restrict the compilation time of the released 
compiler) and might not be considered as the definitive answer on the 
problem all the more that users have reported that the inlining 
limits consequences are varying a lot depending on the langage in use,
the type of the code, etc.

- The inlining strategy plays some important role. In the function 
above, gcc which (at that time) was doing top-down inlining (ie from 
A<32> down to A<0>) had to inline everything before being able to 
realize that the function was empty. I think Nathan or Zack proposed 
a bottom-up experimental patch around that time. From what I read here, 
it seems that the ideal inlining strategy still has to be found (or 
more likely approximated).

- Context (in conjunction of optimization and inlining strategy) is
also certainly important (Imagine if you could prove that f() is empty
in the example above). I hope that tree based optimisation will allow
partial optimization of functions after inlining, so that the metrics of
the costs will be more meaningfull...

Final point: it has been reported here (and it is also my 
experience), that for some C++ code, sometimes -Os (which I believe 
restricts the amount of inlining) results with more efficient code 
than all over -O[0123] choices. This with timings that can go for 
about 12s at -O0 down to 1.2s for -Os and about 2.5s for -O2 (numbers 
are approximative and pre-date the unit-at-a-time patch but I'm not 
sure how this interferes, all functions were small). This tends to 
show that 

- it is the metrics that are used to determine the efficiency 
of the code that need some work.

- certainly, at some point the compiler will know better what to 
inline (sorry Gaby) and what to keep as a function. And I also buy, 
all the portability arguments that have been raised along this 

It really seems that this inlining problem is much more difficult to 
cope with than it was expected... I wish I had a good idea...

I just hope that SSA based optimisations and the work Jan just did on 
metrics and unit-at-a-time, will really make effective some of the 
infrastructure work (like function-at-a-time and tree based inlining) 
that might need tree based optimisation. Here I certainly show how 
little I know about how inlining interferes with other optimizations
(now and the past RTL based scheme).

Theodore Papadopoulo
Email: Tel: (33) 04 92 38 76 01

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]