This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
CVS head nasty codegen for sh-elf
- From: tm at kloo dot net (tm)
- To: gcc-bugs at gcc dot gnu dot org
- Cc: shumpei dot kawasaki at hsa dot hitachi dot com, david dot alessio at hsa dot hitachi dot com,joern dot rennecke at superh dot com, stephen dot clarke at superh dot com
- Date: Wed, 14 Aug 2002 14:26:16 -0700 (PDT)
- Subject: CVS head nasty codegen for sh-elf
'm looking at some nasty code generated by CVS head for target
sh-elf, and would like some feedback.
The problem I'm noticing occurs in mpg123, which is a very popular
MP3 decoder utility. The problem occurs in the dct36() function which
is copied into most popular MP3 players.
The comments for the dct36() function indicate it is an "optimized DCT"
from Jeff Tsay's maplay 1.2.+ package. Unfortunately, many programmers
seem to think optimization consists entirely of blindly unrolling loops
and inlining dependent functions, and that's exactly what someone has
done here - the function is one basic block of *471 lines* with no function
calls and no loops. (Maybe I should have posted this on Halloween...)
As you can imagine, gcc has problems generating decent code for this
function. The assembly generated for this function is *2,488 lines* long.
The biggest problem with the code generated for this function seems to
be that registers are left holding unused values for extremely long
stretches of code.
The problem starts at the beginning of the function, and the code quality
goes downhill from there. We have this code sequence:
_dct36:
mov.l r8,@-r15
mov r4,r1
add #112,r1 <- line 9722, isn't used till line 9759 - unused for 37 instructions
mov.l r9,@-r15
mov r4,r9
add #72,r9 <- line 9725, isn't used till line 9840 - unused for 115 instructions
mov.l r10,@-r15
mov r4,r10
add #40,r10 <- line 9728, isn't used till line 9885 - unused for 157 instructions
mov.l r11,@-r15
mov r4,r11
add #88,r11 <- line 9731, isn't used till line 9817 - unused for 86 instructions
mov.l r12,@-r15
mov r4,r12
add #24,r12 <- line 9734, isn't used till line 9905 - unused for 171 instructions
mov.l r13,@-r15
mov r4,r13
add #120,r13 <- line 9737, isn't used till line 9768 - unused for 31 instructions
mov.l r14,@-r15
fmov.s fr12,@-r15
fmov.s fr13,@-r15
...
mov.l r1,@(8,r14) <- line 9759. This value has been hogging a register for 37 instructions...
then we save it into a stack slot till line 10113 (354 lines)
...
fmov.s @r13+,fr4 <- line 9768
fmov.s @r3+,fr2
fmov.s @r13,fr5
...
fmov.s @r11+,fr4 <- line 9817
fmov.s @r3+,fr2
fmov.s @r11,fr5
...
fmov.s @r9+,fr4 <- line 9840
fmov.s @r1+,fr2
fmov.s @r9,fr5
...
fmov.s @r10+,fr4 <- line 9885
fmov.s @r2+,fr2
fmov.s @r10,fr5
...
fmov.s @r12+,fr4 <- line 9905
fmov.s @r3+,fr2
fmov.s @r12,fr5
...
mov.l @(8,r14),r1 <- line 10113
mov.l .L1603,r3
This is another good example. In line 9762 of the generated assembly
for the file, we have this sequence:
mov.l .L1596,r8
add #64,r8
...
(386 lines removed for brevity)
...
fmov.s @r8+,fr4
fmov.s @r8,fr5
Another one. r0 is the only register in a small register class.
We hog it for a while:
mov.w .L1595,r0
mov.l @(r0,r14),r0
...
(485 lines removed for brevity)
...
add r0,r2
add #4,r2
There's other samples in the 2,488 lines, but hopefully you understand
the gist of the problem...
So, what's the best solution to this? Should new-regalloc's
live-range splitting/rematerialization mitigate these problems?
Toshi