This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Crucial C++ inlining broken under -Os


On 07/01/2010 02:45 PM, Richard Guenther wrote:
On Thu, Jul 1, 2010 at 11:36 PM, Taras Glek<tglek@mozilla.com> wrote:
On 07/01/2010 02:27 PM, Richard Guenther wrote:
On Thu, Jul 1, 2010 at 10:29 PM, Taras Glek<tglek@mozilla.com> wrote:
On 06/30/2010 03:06 PM, Jan Hubicka wrote:
If you can find actual simple examples where -Os is losing size and
speed
we can try
to do something about them.

According to our code size reports, inlining is completely screwed for
C++
wrapper classes like ones often used for smart pointers, arrays, etc. See
http://people.mozilla.com/~tglek/codesize45.txt

Would be really nice if this could be fixed in 4.5. It's tricky for us to
switch to 4.5 otherwise.

The following code inlines as expected under -Os in 4.4. It also inlines
properly with -O1+ in 4.5. But it generates giant bloated code under -Os
in
4.5.

class Container {
public:
  Container() {
    member = new int;
  }

  void cleanup() {
    delete member;
    member = 0;
  }

  int value() {
    return *member;
  }

  ~Container() {
    cleanup();
  }
private:
  int *member;
};



int gimme() {
  Container m;
  return m.value();
}
Without looking I bet the issue here is call_cost at -Os (which is 1).
In the above example we are only not inlining the constructor, which
is estimated as size 2 (a function call with one (constant) parameter).
Inlining that enlarges the caller as we'd replace a call without an
argument with one with an argument.

So the inlining decision isn't too bad for -Os here (which means
your testcase isn't a good representative of what is the issue).
You are right. I tried a -finline-limit=50 flag that we used for gcc 4.1&
4.2 and that appears to bring performance to slightly above 4.3 levels with
-Os.

However, this testcase that was most obvious regression from reading the
above code size report. Seems like a pretty serious regression given that
size inline.o
  Returns 158 with -Os and 93 with -O1.
This doesn't get "fixed" by -finline-limit=50
That's because we eliminate the out-of-line copy of the constructor at -O1.
But that's hardly countable as it is in a .comdat section and will be shared
with other uses in different units.
If you use -ffunction-sections and look at the size of the gimme()
function you'll see that gimme is 2 bytes smaller with -Os (on i?86)
compared to -O1.  As there are possibly unknown calls to the not
inlined constructor it is not fair to complain about its size when
not using -fwhole-program.

That is, we no longer optimistically assume that comdat functions
can be eliminated if there are no callers in the local TU in 4.5
(but we did in previous releases).
Instead gcc now optimistically assumes that the comdat functions will be shared :)

You made me realize that my testcase is incomplete, in that it should be using a class with a destructor/constructor instead of "int" which causes gcc to also not inline the destructors.

Is it easy to restore old behavior for comparison purposes? This pattern occurs a lot in templated code where there is some likelihood that a particular instantiation is unique. Seems like a good candidate for pgo to take optimistic assumptions out.


Taras



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]