This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
abysmal code generated by gcc 3.2
- From: Denys Duchier <Denys dot Duchier at ps dot uni-sb dot de>
- To: gcc at gcc dot gnu dot org
- Date: Mon, 21 Oct 2002 00:02:20 +0200
- Subject: abysmal code generated by gcc 3.2
my application is the implementation of a virtual machine for an
emulated programming language. Switching from gcc 2.95.x to 3.2
brought a few expected pains due to the change in data layout, but the
major issue is that gcc 3.2 produces extremely poor code for my
application on x86 (also on others, but I have not measured those
personally).
Measuring just the impact on the main emulator loop (which uses the
classical threaded code technique, i.e. jumps to first class labels) I
found that the emulator was slowed down by a FACTOR of 8.27.
Looking at the generated assembly code, it is clear that the 3.2
compiler expends a lot of effort trying to keep a certain set of
values in registers. On x86, this is a horrible policy (especially in
a threaded code interpretation loop).
Part of the problem comes from an interaction with inlining. I turned
inlining off for a couple of non-critical functions which were
exposing values that the compiler ended up trying to keep in
registers, and I declared one variable volatile (much better results
than trying to switch off gcse).
This got me to only a factor 1.37 slowdown :-) ... measured on
basically pure emulated recursion (i.e. the speed of looping while
doing nothing else).
Which of course still sucks majorly since this is the MAIN emulator
loop (and since _every_ part of the implementation has been sizeably
slowed down... aargh!)
Here is an example of what I still cannot get rid of. Here is the
code produced by gcc 2.95.x for the MOVEXX instruction:
#APP
MOVEXX:
#NO_APP
movl 4(%ebp),%edx
movl 8(%ebp),%eax
addl $12,%ebp
movl (%edx),%edx
movl %edx,(%eax)
jmp *(%ebp)
Here is the code produced by gcc 3.2:
#APP
MOVEXX:
#NO_APP
movl 4(%ebp), %esi
movl 8(%ebp), %eax
addl $12, %ebp # PC
movl (%esi), %ebx
movl _oz_heap_end, %esi # _oz_heap_end
movl %ebx, (%eax)
movl _oz_heap_cur, %ebx # _oz_heap_cur, sPointer
movl 480(%esp), %eax # CAP
movl am+52, %ecx # <variable>._currentOptVar, <anonymous>
movl am+28, %edx # <variable>.statusReg, <anonymous>
leal 12(%eax), %edi # <anonymous>
jmp *(%ebp) # * PC
To my uneducated eye, it looks like gcc is now trying very hard to
keep a bunch of values in registers. Every emulated instruction is
like that, thus resulting in considerable overhead. I tried to
declare _oz_heap_end and _oz_heap_cur volatile, but, curiously, that
had no effect on this particular code generation.
I am at my wits ends. Can anyone help? (I realize that my application
is atypical).
Cheers,
PS: the compiler options used for the emulator file are:
-fno-exceptions -O3 -pipe -fstrict-aliasing -march=pentium -mcpu=pentiumpro -fomit-frame-pointer
--
Dr. Denys Duchier Denys.Duchier@ps.uni-sb.de
Forschungsbereich Programmiersysteme (Programming Systems Lab)
Universitaet des Saarlandes, Geb. 45 http://www.ps.uni-sb.de/~duchier
Postfach 15 11 50 Phone: +49 681 302 5618
66041 Saarbruecken, Germany Fax: +49 681 302 5615