This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GCC 3.0.3 produces large code



On Thu, 31 Jan 2002, Zack Weinberg wrote:

> In *my* opinion, GCC should generate equally good code for all three
> functions, rather than registering a preference for one style or
> other.  Also, GCC should care more about code size than it currently
> does.  It's true that most people use -O2, but with modern computers
> code size has direct effects on performance.

   I agree, like I said, small functions should be optimised for space
really, large functions with big loops for speed. Actually, most of my
inner loops are in assembly language, so for the C++ code I actually
prefer size optimisation so that as much will stay in cache as possible.

> Let's look at this in a bit more depth.  Here's what we get with
> Nicholas' switches and the current mainline.  (Warning, long lines.
> The numbers in parens are size in bytes as reported by nm
> --size-sort.)

> (cut out code)

   That actually seems quite a bit better than what 3.0.3 is generating. I
might try getting the latest snapshot/CVS of GCC and seeing if I get an
overall improvement.

> First, you will notice that the code generated for DoThing2 and
> DoThing3 is identical except for the position of one xorl instruction.
> That's good.  We ought to have hoisted the xor operation in DoThing2,
> but the global optimizer isn't up to it yet.

   That seems like an improvement.

> This code _looks_ smaller, but it produces bigger object code.  That's
> probably because the instructions being used take more bytes in
> machine language.  Other than that, it's the same thing.

   Yes, memory operands with offsets can create quite large instructions.
It's obvious why -Os should use push/pop and O2 should use mov. Once
again, I didn't notice this happening so much in 3.0.3,

> Okay, so why does the Visual C++ compiler do so much better on this?
> Well, let's look at (part of) the source code...
>
> struct b
> {
>         b() { m_pa = new a; }
>         ~b() { delete m_pa; }
>         virtual int DoThing()
>         {
>                 AutoRWL Lock(&m_RWL, 1);
>                 if( m_pa )
>                         return m_pa->DoOtherThing();
>                 else
>                         return 0;
>         }
> }
>
> You can see that gcc has inlined the calls to AutoRWL's constructor
> and destructor, and to a::DoOtherThing.  Now suppose we were going to
> write assembly language for DoThing by hand.  The first thing we'd
> probably notice is that DoThing can only be called on a validly
> constructed object of class b, which means that m_pa cannot possibly
> be NULL, and therefore we could throw away the else branch entirely:

   Also, 3.0.3 didn't seem to be inlining as much stuff when I asked it
to... maybe this was improved as well.

   Don't forget, operator new() can return NULL, so it still has to do
this check. Besides, in the "real" code (this is a simplified example),
the m_pa being NULL case is quite real.
or has been closed).

> We would then notice that there is no point in doing a complete
> construction of the AutoRWL object, since that data is never used
> again.  That in turn means we don't ever use the this pointer.
> Furthermore, the program cannot tell whether the second printf call
> occurs before or after the call to rand, and if we swap them we can
> use a sibling call.  Finally, we've managed to eliminate all need for
> a stack frame.

   Yes, the whole AutoRWL thing is a great case for potential
optimisation. Really, its only purpose is to make sure that the lock,
which is acquired when the object is created, will always be released.
Since the object must be destroyed when the scope/function is left, this
is guaranteed. But the object itself need not really exist. I'd love to
see optimisation good enough to realise this.

> _ZN1b7DoThingEv:
> 	pushl	$.LC0
> 	call	printf
> 	popl	%eax
> 	pushl	$.LC1
> 	call	printf
> 	popl	%eax
> 	jmp	rand
>
> You didn't post the code generated by Visual C++ but I bet it's
> capable of one or more of those optimizations.  GCC has basically no
> framework for whole-program analysis, but we're working on it.

   OK, well, I'm glad to hear... and yes, other than the check for NULL
which is necessary in my code, the function doesn't need to be much more
complicated than that.


         Nicholas



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]