This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFA] expand from SSA form (1/2)


On Thu, Apr 30, 2009 at 6:26 PM, Luis Machado
<luisgpm@linux.vnet.ibm.com> wrote:
> Hi,
>
> On Tue, 2009-04-28 at 02:48 +0200, Michael Matz wrote:
>> Hi,
>>
>> On Mon, 27 Apr 2009, Luis Machado wrote:
>>
>> > Speaking about powerpc, i've tracked down a 19% degradation on cpu2000's
>> > 32-bit sixtrack and found that revision 146817 caused/revealed it.
>> >
>> > I'll have more details on it soon.
>>
>> It seems also x86_64 is affected, so anything you find is very welcome.
>>
>> If I may speculate it could be related to the half TER we're now doing.
>> As in, we're not feeding large trees to expand anymore, so there're no
>> opportunities to cleverly expand them to short insn sequences. ?For
>> cross-checking try to build with -fno-tree-ter (before the patch) and see
>> if it's resulting in the same slowdown.
>
> I've tracked down the cause of the degradation on sixtrack.
>
> We have a hot spot on sixtrack in a function called thin6d.
>
> Such loop is generated by the old (pre-146817) gcc as a single BB, thus
> the only way inside that loop is by executing instructions until we fall
> into that code.
>
> The post-146817 gcc breaks that loop in two BB's, such that we can
> actually branch to the middle of that loop in the first iteration, and
> then the loop runs just like in pre-146817.
>
> The degradation comes from the fact that the creation of two BB's for
> that single loop breaks good scheduling of instructions inside it, like
> this:
>
> Good code: All the fp load instructions are grouped in the upper portion
> of the code.
>
> fmul ? ?f22,f11,f13
> fmul ? ?f23,f11,f0
> addis ? r12,r6,-27
> lfd ? ? f3,0(r6)
> addi ? ?r4,r6,8
> lfd ? ? f1,9472(r12)
> addis ? r12,r4,-27
> fmadd ? f8,f12,f0,f22
> fmsub ? f4,f12,f13,f23
> lfd ? ? f22,9472(r12)
> lfd ? ? f23,8(r6)
> addi ? ?r6,r4,8
> fmul ? ?f11,f8,f13
> fmul ? ?f24,f8,f1
> fmul ? ?f25,f8,f3
> fmul ? ?f5,f8,f0
> fmadd ? f11,f4,f0,f11
> fmadd ? f21,f4,f3,f24
> fmsub ? f2,f4,f1,f25
> fmsub ? f12,f4,f13,f5
> fmul ? ?f1,f11,f23
> fmul ? ?f8,f11,f22
> fadd ? ?f9,f9,f21
> fadd ? ?f10,f10,f2
> fmsub ? f24,f12,f22,f1
> fmadd ? f25,f12,f23,f8
> fadd ? ?f10,f10,f24
> fadd ? ?f9,f9,f25
> bdnz ? ?100ca878 <thin6d_+0x1018>
>
> Bad code: The second pair of loads are pushed down the second BB,
> causing slowdowns.
>
> fmul ? ?f5,f8,f0
> addis ? r3,r4,-27
> lfd ? ? f22,8(r7)
> addi ? ?r7,r4,8
> lfd ? ? f6,9472(r3)
> fmadd ? f10,f9,f0,f10
> fmsub ? f23,f9,f13,f5
> fmul ? ?f2,f10,f22
> fmul ? ?f9,f10,f6
> fmr ? ? f7,f23
> fmsub ? f25,f23,f6,f2
> fmadd ? f26,f23,f22,f9
> fadd ? ?f12,f12,f25
> fadd ? ?f11,f11,f26
> fmul ? ?f8,f10,f13
>>> BB mark
> fmul ? ?f22,f10,f0
> addis ? r3,r7,-27
> lfd ? ? f21,0(r7)
> addi ? ?r4,r7,8
> lfd ? ? f25,9472(r3)
> fmadd ? f8,f7,f0,f8
> fmsub ? f9,f7,f13,f22
> fmul ? ?f23,f8,f21
> fmul ? ?f26,f8,f25
> fmsub ? f24,f9,f25,f23
> fmadd ? f7,f9,f21,f26
> fadd ? ?f12,f12,f24
> fadd ? ?f11,f11,f7
> bdnz ? ?100c9fe0 <thin6d_+0xfd0>
>
> I've opened bugzilla http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39976
> for this.

Does enabling the selective scheduler work around this problem?

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]