This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

CVS head nasty codegen for sh-elf

From: tm at kloo dot net (tm)
To: gcc-bugs at gcc dot gnu dot org
Cc: shumpei dot kawasaki at hsa dot hitachi dot com, david dot alessio at hsa dot hitachi dot com,joern dot rennecke at superh dot com, stephen dot clarke at superh dot com
Date: Wed, 14 Aug 2002 14:26:16 -0700 (PDT)
Subject: CVS head nasty codegen for sh-elf

'm looking at some nasty code generated by CVS head for target
sh-elf, and would like some feedback.

The problem I'm noticing occurs in mpg123, which is a very popular
MP3 decoder utility. The problem occurs in the dct36() function which
is copied into most popular MP3 players.

The comments for the dct36() function indicate it is an "optimized DCT"
from Jeff Tsay's maplay 1.2.+ package. Unfortunately, many programmers
seem to think optimization consists entirely of blindly unrolling loops
and inlining dependent functions, and that's exactly what someone has
done here - the function is one basic block of *471 lines* with no function
calls and no loops. (Maybe I should have posted this on Halloween...)

As you can imagine, gcc has problems generating decent code for this
function. The assembly generated for this function is *2,488 lines* long.

The biggest problem with the code generated for this function seems to
be that registers are left holding unused values for extremely long
stretches of code.

The problem starts at the beginning of the function, and the code quality
goes downhill from there. We have this code sequence:

_dct36:
        mov.l   r8,@-r15
        mov     r4,r1
        add     #112,r1         <- line 9722, isn't used till line 9759 - unused for 37 instructions
        mov.l   r9,@-r15
        mov     r4,r9
        add     #72,r9          <- line 9725, isn't used till line 9840 - unused for 115 instructions
        mov.l   r10,@-r15
        mov     r4,r10
        add     #40,r10         <- line 9728, isn't used till line 9885 - unused for 157 instructions
        mov.l   r11,@-r15
        mov     r4,r11
        add     #88,r11         <- line 9731, isn't used till line 9817 - unused for 86 instructions
        mov.l   r12,@-r15
        mov     r4,r12
        add     #24,r12         <- line 9734, isn't used till line 9905 - unused for 171 instructions
        mov.l   r13,@-r15
        mov     r4,r13
        add     #120,r13        <- line 9737, isn't used till line 9768 - unused for 31 instructions
        mov.l   r14,@-r15
        fmov.s  fr12,@-r15
        fmov.s  fr13,@-r15
...
        mov.l   r1,@(8,r14)     <- line 9759. This value has been hogging a register for 37 instructions...
                                   then we save it into a stack slot till line 10113 (354 lines)
...
        fmov.s  @r13+,fr4       <- line 9768
        fmov.s  @r3+,fr2
        fmov.s  @r13,fr5
...
        fmov.s  @r11+,fr4       <- line 9817
        fmov.s  @r3+,fr2
        fmov.s  @r11,fr5
...
        fmov.s  @r9+,fr4        <- line 9840
        fmov.s  @r1+,fr2
        fmov.s  @r9,fr5
...
        fmov.s  @r10+,fr4       <- line 9885
        fmov.s  @r2+,fr2
        fmov.s  @r10,fr5
...
        fmov.s  @r12+,fr4       <- line 9905
        fmov.s  @r3+,fr2
        fmov.s  @r12,fr5
...
        mov.l   @(8,r14),r1     <- line 10113
        mov.l   .L1603,r3


This is another good example. In line 9762 of the generated assembly
for the file, we have this sequence:

        mov.l   .L1596,r8
        add     #64,r8
...
        (386 lines removed for brevity)
...
        fmov.s  @r8+,fr4
        fmov.s  @r8,fr5

Another one. r0 is the only register in a small register class.
We hog it for a while:

        mov.w   .L1595,r0
        mov.l   @(r0,r14),r0
...
        (485 lines removed for brevity)
...
        add     r0,r2
        add     #4,r2

There's other samples in the 2,488 lines, but hopefully you understand
the gist of the problem...

So, what's the best solution to this? Should new-regalloc's
live-range splitting/rematerialization mitigate these problems?

Toshi

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]