RFC: LRA for x86/x86-64 [0/9]

Tue Oct 2 09:34:00 GMT 2012

Il 02/10/2012 10:49, Steven Bosscher ha scritto:
> On Tue, Oct 2, 2012 at 10:29 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
>> Il 02/10/2012 09:28, Steven Bosscher ha scritto:
>>>>   My experience shows that these lists are usually 1-2 elements. Although in
>>>>> this case, there are pseudos with huge number elements (hundreeds).  I tried
>>>>> -fweb for this tests because it can decrease the number elements but GCC (I
>>>>> don't know what pass) scales even worse: after 20 min of waiting and when
>>>>> virt memory achieved 20GB I stoped it.
>>> Ouch :-)
>>>
>>> The webizer itself never even runs, the compiler blows up somewhere
>>> during the df_analyze call from web_main. The issue here is probably
>>> in the DF_UD_CHAIN problem or in the DF_RD problem.
>>
>> /me is glad to have fixed fwprop when his GCC contribution time was more
>> than 1-2 days per year...
> 
> I thought you spent more time on GCC nowadays, working for Red Hat?

No, I work on QEMU most of the time. :)  Knowing myself, if I had
GCC-related assignments you'd see me _a lot_ on upstream mailing lists!

>> Unfortunately, the fwprop solution (actually a rewrite) was very
>> specific to the problem and cannot be reused in other parts of the compiler.
> 
> That'd be too bad... But is this really true? I thought you had
> something done that builds chains only for USEs reached by multiple
> DEFs? That's the only interesting kind for web, too.

No, it's the other way round.  I have a dataflow problem that recognizes
USEs reached by multiple DEFs, so that I can use a dominator walk to
build singleton def-use chains.  It's very similar to how you build SSA,
but punting instead of inserting phis.

Another solution is to build factored use-def chains for web, and use
them instead of RD.  In the end it's not very different from regional
live range splitting, since the phi functions factor out the state of
the pass at loop (that is region) boundaries.  I thought you had looked
at FUD chains years ago?

> FWIW: part of the problem for this particular test case is that there
> are many registers with partial defs (vector registers) and the RD
> problem doesn't (and probably cannot) keep track of one partial
> def/use killing another partial def/use.

So they are subregs of regs?  Perhaps they could be represented with
VEC_MERGE to break the live range:

 (set (reg:V4SI 94) (vec_merge:V4SI (reg:V4SI 94)
                                    (const_vector:V4SI [(const_int 0)
                                                        (const_int 0)
                                                        (const_int 0)
                                                        (reg:SI 95)])
                                    (const_int 7)))

And then reload, or something after reload, would know how to split
these when spilling V4SI to memory.

Paolo