This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFA] expand from SSA form (1/2)


Hi,

On Tue, 2009-04-28 at 02:48 +0200, Michael Matz wrote:
> Hi,
> 
> On Mon, 27 Apr 2009, Luis Machado wrote:
> 
> > Speaking about powerpc, i've tracked down a 19% degradation on cpu2000's 
> > 32-bit sixtrack and found that revision 146817 caused/revealed it.
> > 
> > I'll have more details on it soon.
> 
> It seems also x86_64 is affected, so anything you find is very welcome.  
> 
> If I may speculate it could be related to the half TER we're now doing.  
> As in, we're not feeding large trees to expand anymore, so there're no 
> opportunities to cleverly expand them to short insn sequences.  For 
> cross-checking try to build with -fno-tree-ter (before the patch) and see 
> if it's resulting in the same slowdown.

I've tracked down the cause of the degradation on sixtrack.

We have a hot spot on sixtrack in a function called thin6d.

Such loop is generated by the old (pre-146817) gcc as a single BB, thus
the only way inside that loop is by executing instructions until we fall
into that code.

The post-146817 gcc breaks that loop in two BB's, such that we can
actually branch to the middle of that loop in the first iteration, and
then the loop runs just like in pre-146817.

The degradation comes from the fact that the creation of two BB's for
that single loop breaks good scheduling of instructions inside it, like
this:

Good code: All the fp load instructions are grouped in the upper portion
of the code.

fmul    f22,f11,f13
fmul    f23,f11,f0
addis   r12,r6,-27
lfd     f3,0(r6)
addi    r4,r6,8
lfd     f1,9472(r12)
addis   r12,r4,-27
fmadd   f8,f12,f0,f22
fmsub   f4,f12,f13,f23
lfd     f22,9472(r12)
lfd     f23,8(r6)
addi    r6,r4,8
fmul    f11,f8,f13
fmul    f24,f8,f1
fmul    f25,f8,f3
fmul    f5,f8,f0
fmadd   f11,f4,f0,f11
fmadd   f21,f4,f3,f24
fmsub   f2,f4,f1,f25
fmsub   f12,f4,f13,f5
fmul    f1,f11,f23
fmul    f8,f11,f22
fadd    f9,f9,f21
fadd    f10,f10,f2
fmsub   f24,f12,f22,f1
fmadd   f25,f12,f23,f8
fadd    f10,f10,f24
fadd    f9,f9,f25
bdnz    100ca878 <thin6d_+0x1018>

Bad code: The second pair of loads are pushed down the second BB,
causing slowdowns.

fmul    f5,f8,f0
addis   r3,r4,-27
lfd     f22,8(r7)
addi    r7,r4,8
lfd     f6,9472(r3)
fmadd   f10,f9,f0,f10
fmsub   f23,f9,f13,f5
fmul    f2,f10,f22
fmul    f9,f10,f6
fmr     f7,f23
fmsub   f25,f23,f6,f2
fmadd   f26,f23,f22,f9
fadd    f12,f12,f25
fadd    f11,f11,f26
fmul    f8,f10,f13
>> BB mark
fmul    f22,f10,f0
addis   r3,r7,-27
lfd     f21,0(r7)
addi    r4,r7,8
lfd     f25,9472(r3)
fmadd   f8,f7,f0,f8
fmsub   f9,f7,f13,f22
fmul    f23,f8,f21
fmul    f26,f8,f25
fmsub   f24,f9,f25,f23
fmadd   f7,f9,f21,f26
fadd    f12,f12,f24
fadd    f11,f11,f7
bdnz    100c9fe0 <thin6d_+0xfd0>

I've opened bugzilla http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39976
for this.

Best regards,
Luis


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]