This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RFA] expand from SSA form (1/2)
Hi,
On Tue, 2009-04-28 at 02:48 +0200, Michael Matz wrote:
> Hi,
>
> On Mon, 27 Apr 2009, Luis Machado wrote:
>
> > Speaking about powerpc, i've tracked down a 19% degradation on cpu2000's
> > 32-bit sixtrack and found that revision 146817 caused/revealed it.
> >
> > I'll have more details on it soon.
>
> It seems also x86_64 is affected, so anything you find is very welcome.
>
> If I may speculate it could be related to the half TER we're now doing.
> As in, we're not feeding large trees to expand anymore, so there're no
> opportunities to cleverly expand them to short insn sequences. For
> cross-checking try to build with -fno-tree-ter (before the patch) and see
> if it's resulting in the same slowdown.
I've tracked down the cause of the degradation on sixtrack.
We have a hot spot on sixtrack in a function called thin6d.
Such loop is generated by the old (pre-146817) gcc as a single BB, thus
the only way inside that loop is by executing instructions until we fall
into that code.
The post-146817 gcc breaks that loop in two BB's, such that we can
actually branch to the middle of that loop in the first iteration, and
then the loop runs just like in pre-146817.
The degradation comes from the fact that the creation of two BB's for
that single loop breaks good scheduling of instructions inside it, like
this:
Good code: All the fp load instructions are grouped in the upper portion
of the code.
fmul f22,f11,f13
fmul f23,f11,f0
addis r12,r6,-27
lfd f3,0(r6)
addi r4,r6,8
lfd f1,9472(r12)
addis r12,r4,-27
fmadd f8,f12,f0,f22
fmsub f4,f12,f13,f23
lfd f22,9472(r12)
lfd f23,8(r6)
addi r6,r4,8
fmul f11,f8,f13
fmul f24,f8,f1
fmul f25,f8,f3
fmul f5,f8,f0
fmadd f11,f4,f0,f11
fmadd f21,f4,f3,f24
fmsub f2,f4,f1,f25
fmsub f12,f4,f13,f5
fmul f1,f11,f23
fmul f8,f11,f22
fadd f9,f9,f21
fadd f10,f10,f2
fmsub f24,f12,f22,f1
fmadd f25,f12,f23,f8
fadd f10,f10,f24
fadd f9,f9,f25
bdnz 100ca878 <thin6d_+0x1018>
Bad code: The second pair of loads are pushed down the second BB,
causing slowdowns.
fmul f5,f8,f0
addis r3,r4,-27
lfd f22,8(r7)
addi r7,r4,8
lfd f6,9472(r3)
fmadd f10,f9,f0,f10
fmsub f23,f9,f13,f5
fmul f2,f10,f22
fmul f9,f10,f6
fmr f7,f23
fmsub f25,f23,f6,f2
fmadd f26,f23,f22,f9
fadd f12,f12,f25
fadd f11,f11,f26
fmul f8,f10,f13
>> BB mark
fmul f22,f10,f0
addis r3,r7,-27
lfd f21,0(r7)
addi r4,r7,8
lfd f25,9472(r3)
fmadd f8,f7,f0,f8
fmsub f9,f7,f13,f22
fmul f23,f8,f21
fmul f26,f8,f25
fmsub f24,f9,f25,f23
fmadd f7,f9,f21,f26
fadd f12,f12,f24
fadd f11,f11,f7
bdnz 100c9fe0 <thin6d_+0xfd0>
I've opened bugzilla http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39976
for this.
Best regards,
Luis