This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
GCC 2.95.3 vs top-of-tree: pessimization
- To: gcc at gcc dot gnu dot org
- Subject: GCC 2.95.3 vs top-of-tree: pessimization
- From: Iain McClatchie <iainmcc at ix dot netcom dot com>
- Date: Mon, 02 Apr 2001 16:37:39 -0700
Every so often I look at how GCC compiles my Binary Decision DAG
library. I thought I'd compare the top-of-trunk with GCC 2.95.3 to
see how the new basic block reordering and if-conversion optimizations
were doing. Now I hate it when someone whines to me about how the
stuff I'm working on is crappy, without giving examples I can try
myself. I'm happy to mail the code I've looked at below to anyone
who asks.
Good news: basic block reordering does just the right thing with
this code. Hooray!
Bad news: register allocation looks worse. Lots of extra loads.
Also, the if-converter finds fewer of the many opportunities, so
we have more badly-predicted branches.
Both compilers were run as
gcc -c -S -mcpu=i686 -march=i686 -O2
Lets just look at the very first bit of the function. I've expanded
the macros to make it easier to read. I've omitted the type
conversion. This doesn't show the if-conversions missed.
DdNode *
local_bddAndorRecur( DdManager *dd, DdNode *f, DdNode *g, DdNode *h )
{
DdNode *one = DD_ONE( dd );
DdNode *F = f & -2;
DdNode *G = g & -2;
DdNode *H = h & -2;
if( F == G ) ...
if( F == H ) ...
if( G == H ) ...
if( F == one ) ...
if( G == one ) ...
if( H == one ) ...
This turned into (top-of-tree compiler)
subl $60, %esp
movl 8(%ebp), %eax
movl %eax, -16(%ebp)
movl 12(%ebp), %eax
movl -16(%ebp), %edx
movl %eax, -20(%ebp)
movl 16(%ebp), %eax
movl -20(%ebp), %esi ; %esi has f
movl %eax, -24(%ebp)
movl 20(%ebp), %eax
movl -24(%ebp), %ecx ; load g
andl $-2, %esi ; now %esi has F
movl %eax, -28(%ebp)
movl (%edx), %eax ; load one
movl -28(%ebp), %ebx ; load h
movl %ecx, -60(%ebp) ; save G
andl $-2, -60(%ebp) ; compute G in place. extra load
movl %ebx, -64(%ebp) ; save H = ...
andl $-2, -64(%ebp) ; compute H in place. extra load
cmpl -60(%ebp), %esi ; %esi has F, extra load
movl $0, -52(%ebp)
je .L416
cmpl -64(%ebp), %esi ; extra load
je .L417
movl -64(%ebp), %edx ; extra load
cmpl %edx, -60(%ebp) ; extra load
je .L418
cmpl %eax, %esi ; %eax has one, good
je .L419
cmpl %eax, -60(%ebp) ; extra load
je .L420
cmpl %eax, -64(%ebp) ; why not %eax, %edx
je .L421
compare with gcc 2.95.3:
movl 8(%ebp),%edx
movl 12(%ebp),%esi ; f
movl 16(%ebp),%edi ; g
movl 20(%ebp),%ecx ; h
andl $-2,%esi ; F, good
andl $-2,%edi ; G, good
movl (%edx),%eax ; one, good
movl $0,-24(%ebp)
movl %ecx,-32(%ebp) ; H, doh, just leave it in %ecx!
andb $254,-32(%ebp) ; computing H in place, extra load
; probably pipe lock on variant
; accesses to -32(%ebp)
cmpl %edi,%esi ; good
jne .L547 ; fixed in top-of-trunk
...etc...
.L547:
cmpl -32(%ebp),%esi ; extra load
jne .L552
...etc...
.L552:
cmpl -32(%ebp),%edi ; extra load
jne .L557
...etc...
.L557:
cmpl %eax,%esi ; %eax has one, %esi has F
jne .L562
...etc...
.L562:
cmpl %eax,%edi
jne .L567
...etc...
.L567:
cmpl %eax,-32(%ebp) ; extra load
jne .L572
...etc...
If anyone is interested in tracking this crud down, you'll need a
test case. I'm happy to ship you my source code, it's a few KB.
I thought I would avoid posting it to cut down BW. If there is
a better way for me to make this posting, please tell me.
-Iain McClatchie
iain@mcclatchie.com