This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

abysmal code generated by gcc 3.2


my application is the implementation of a virtual machine for an
emulated programming language.  Switching from gcc 2.95.x to 3.2
brought a few expected pains due to the change in data layout, but the
major issue is that gcc 3.2 produces extremely poor code for my
application on x86 (also on others, but I have not measured those
personally).

Measuring just the impact on the main emulator loop (which uses the
classical threaded code technique, i.e. jumps to first class labels) I
found that the emulator was slowed down by a FACTOR of 8.27.

Looking at the generated assembly code, it is clear that the 3.2
compiler expends a lot of effort trying to keep a certain set of
values in registers.  On x86, this is a horrible policy (especially in
a threaded code interpretation loop).

Part of the problem comes from an interaction with inlining.  I turned
inlining off for a couple of non-critical functions which were
exposing values that the compiler ended up trying to keep in
registers, and I declared one variable volatile (much better results
than trying to switch off gcse).

This got me to only a factor 1.37 slowdown :-) ... measured on
basically pure emulated recursion (i.e. the speed of looping while
doing nothing else).

Which of course still sucks majorly since this is the MAIN emulator
loop (and since _every_ part of the implementation has been sizeably
slowed down... aargh!)

Here is an example of what I still cannot get rid of.  Here is the
code produced by gcc 2.95.x for the MOVEXX instruction:

#APP
         MOVEXX:
#NO_APP
        movl 4(%ebp),%edx
        movl 8(%ebp),%eax
        addl $12,%ebp
        movl (%edx),%edx
        movl %edx,(%eax)
        jmp *(%ebp)

Here is the code produced by gcc 3.2:

#APP
         MOVEXX:
#NO_APP
        movl    4(%ebp), %esi
        movl    8(%ebp), %eax
        addl    $12, %ebp               #  PC
        movl    (%esi), %ebx
        movl    _oz_heap_end, %esi      #  _oz_heap_end
        movl    %ebx, (%eax)
        movl    _oz_heap_cur, %ebx      #  _oz_heap_cur,  sPointer
        movl    480(%esp), %eax         #  CAP
        movl    am+52, %ecx             #  <variable>._currentOptVar, <anonymous>
        movl    am+28, %edx             #  <variable>.statusReg,  <anonymous>
        leal    12(%eax), %edi          #  <anonymous>
        jmp     *(%ebp)                 # * PC

To my uneducated eye, it looks like gcc is now trying very hard to
keep a bunch of values in registers.  Every emulated instruction is
like that, thus resulting in considerable overhead.  I tried to
declare _oz_heap_end and _oz_heap_cur volatile, but, curiously, that
had no effect on this particular code generation.

I am at my wits ends. Can anyone help?  (I realize that my application
is atypical).

Cheers,

PS: the compiler options used for the emulator file are:
-fno-exceptions -O3 -pipe -fstrict-aliasing -march=pentium -mcpu=pentiumpro -fomit-frame-pointer

-- 
Dr. Denys Duchier			Denys.Duchier@ps.uni-sb.de
Forschungsbereich Programmiersysteme	(Programming Systems Lab)
Universitaet des Saarlandes, Geb. 45	http://www.ps.uni-sb.de/~duchier
Postfach 15 11 50			Phone: +49 681 302 5618
66041 Saarbruecken, Germany		Fax:   +49 681 302 5615


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]