Multi-Threading GCC Continuation
Thu Dec 12 15:21:00 GMT 2019
On 12/12/19 4:11 AM, Segher Boessenkool wrote:
> Hi Nick,
> On Sun, Dec 08, 2019 at 03:03:56PM -0500, Nicholas Krause wrote:
>> The first questions are:
>> 1. What current heuristics do we have as it seems none for figuring out
>> what state is shared
>> as it seems none? If I correct the first thing to do is discuss what
>> bits/bitmasks we want
>> for figuring out shared state or other ways.
> Shared between what and what?
Between the passes in gcc. If we can launch certain passes inÂ gcc on
thread and join up to other passes that depend on the state used by a
pass. For example if a loop pass does not touch outside a function in GIMPLE
or RTL we should launch it on another thread. Then join up to the next pass
that requires the state. Seems not all passes touch everything or only parts
of either GIMPLE or RTL so this may be worth considering. The question
was what internal compiler data can be currently use for finding when this
should be the case .Â I don't see anything so figuring out how to detect
is going to be part of the challenge.
I'm going to write up a wiki article on the GCC wiki explaining it
that's a very brief idea alongside some other ideas like figuring out if
dominator trees should or can be lockless or very close to in nature
>> 2. MD files seem to be a major source of shared state or reading them.
>> Is it possible
>> to read from them async? Doesn't seem to be a problem but the current
>> docs don't
>> mention it nor does it seem easy to do.
> MD files are not read *at all* by the compiler itself; they aren't
> installed, even. They are read by the gen* programs when the compiler
> itself is built, to create the insn-*.c files and the like.
>> 3. There are two ways to write this for RTL either one class for all the
>> state or a core
>> class will each major part being a subclass like delayed branch
>> scheduling e.t.c.Not sure
>> which is better so thought I would ask.
> RTL as it is is pretty efficient. Please keep it that way. It also is
> a dumb (and very "open") data structure, by design. See how "XEXP" and
> similar work.
> That could be changed of course, for non-trivial cost, but what for?
I'm not talking about changing RTL itself it terms of its optimizations
but rewriting it for reading work queues in parallel on non shared
state between the current running pass and joining it back up
to the next pass requiring it.
For example why not run parts of the register allocation on separate
work queues if possible? I was asking Peter at Cauldron about the
register part and he seems to like doing something like this for
cost of allocating registers if I recall correctly.
Hopefully that explains it a little better,
More information about the Gcc