This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
Re: Serious code size regression from 3.0.2 to now
- From: tm <tm at mail dot kloo dot net>
- To: Joern Rennecke <joern dot rennecke at superh dot com>
- Cc: gcc-bugs at gcc dot gnu dot org, stephen dot clarke at superh dot com,shumpei dot kawasaki at hsa dot hitachi dot com
- Date: Thu, 18 Jul 2002 15:15:00 -0700 (PDT)
- Subject: Re: Serious code size regression from 3.0.2 to now
On Thu, 18 Jul 2002, Joern Rennecke wrote:
> tm wrote:
> > Basically, GCC is now generating HUGE groups of jump instructions which
> > are aligned to 32-byte boundaries.
>
> Was it not generating these lone jump instructions before,
> or did it align them less?
Okay, I've done a bit more investigation, and hopefully I'm understanding
this better.
Before, GCC was generating this sequence:
mov.l L_label,r0
jmp @r0
ins
Now, GCC is generating this sequence:
bra L_label
ins
.align 5
L_label:
bra L_label2
ins
L_label2:
This is fine for isolated cases. However, when you have a large function,
you wind up with many of these stacked relative branches, then you get:
.align 5
L_label:
bra L_label2
ins
.align 5
L_label3:
bra L_label4
ins
and each one of these instructions winds up in a different cache line like
this:
15756 .L2282:
15757 7ba0 AF19 bra .L2275
15758 7ba2 6013 mov r1,r0
15759 7ba4 00090009 .align 5
15759 00090009
15759 00090009
15759 00090009
15759 00090009
15760 .L2279:
15761 7bc0 AEFF bra .L2887
15762 7bc2 4011 cmp/pz r0
15763 7bc4 00090009 .align 5
15763 00090009
15763 00090009
15763 00090009
15763 00090009
15764 .L2276:
15765 7be0 AEE5 bra .L2888
15766 7be2 4811 cmp/pz r8
15767 7be4 00090009 .align 5
15767 00090009
15767 00090009
15767 00090009
15767 00090009
...
In map_fog.i/VideoDraw32OnlyFog32Alpha() alone I counted 94 (!) cache
lines which contain only either:
1. branch + delay slot
2. literal load + branch far + delay slot instruction
So basically there are two factors which create this situation:
1) GCC is now generating two relative branches instead of one absolute
jump for some reason, and
2) Branch targets are cache-line aligned, so the 2nd branch of the branch
pair winds up occupying an entire cache line.
I don't really like this idea of generating two relative branches.
It's bad for instruction prefetching, and obviously creates a lot of
ancillary problems.
Toshi