[00/23] Make fwprop use an on-the-side RTL SSA representation

Richard Sandiford richard.sandiford@arm.com
Thu Nov 26 16:03:18 GMT 2020


Thanks for the reviews.

Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On 11/13/20 1:10 AM, Richard Sandiford via Gcc-patches wrote:
>> Just after GCC 10 stage 1 closed (oops), I posted a patch to add a new
>> combine pass.  One of its main aims was to allow instructions to move
>> around where necessary in order to make a combination possible.
>> It also tried to parallelise instructions that use the same resource.
>>
>> That pass contained its own code for maintaining limited def-use chains.
>> When I posted the patch, Segher asked why we wanted yet another piece
>> of pass-specific code to do that.  Although I had specific reasons
>> (which I explained at the time) I've gradually come round to agreeing
>> that that was a flaw.
>>
>> This series of patches is the result of a Covid-time project to add
>> a more general, pass-agnostic framework.  There are two parts:
>> adding the framework itself, and using it to make fwprop.c faster.
>>
>> The framework part
>> ------------------
>>
>> The framework provides an optional, on-the-side SSA view of existing
>> RTL instructions.  Each instruction gets a list of definitions and a
>> list of uses, with each use having a single definition.  Phi nodes
>> handle cases in which there are multiple possible definitions of a
>> register on entry to a basic block.  There are also routines for
>> updating instructions while keeping the SSA representation intact.
>>
>> The aim is only to provide a different view of existing RTL instructions.
>> Unlike gimple, and unlike (IIRC) the old RTL SSA project from way back,
>> the new framework isn't a “native” SSA representation.  This means that
>> all inputs to a phi node for a register R are also definitions of
>> register R; no move operation is “hidden” in the phi node.
> Hmm, I'm trying to parse what the last phrase means.  Does it mean that
> the "hidden copy" problem for out-of-ssa is avoided?  And if so, how is
> that maintained over time.  Things like copy-prop will tend to introduce
> those issues even if they didn't originally exist.

Yeah, the phi nodes simply say which definition of register R provides
the value of R on a particular incoming edge.  That definition will
itself be a phi node for R, an artificial definition of R created by DF
(e.g. for incoming function arguments or for EH data registers), or an
actual instruction that sets R.

In other words, the SSA form is a purely on-the-side thing and the
underlying RTL instructions are maintained in the same way as normal.
The SSA form can be deleted at any time without performing a separate
out-of-ssa step.  In that respect it's different from cfglayout,
for example.

One of the goals was to allow the SSA form to be used even after RA,
where invisible copies would be more problematic.

>> Like gimple, the framework treats memory as a single unified resource.
>>
>> A more in-depth summary is contained in the doc patch, but some
>> other random notes:
>>
>> * At the moment, the SSA information is local to one pass, but it might
>>   be good to maintain it between passes in future.
> Right.  I think we can look at the passes near fwprop as good targets
> for extending the lifetime over which we have an SSA framework.   I note
> CSE is just before the first fwprop and CSE is a hell of a lot easier in
> an SSA world :-)

Yeah, agree this would be a good approach: start from fwprop and spread
out in both directions.

> It's unfortunately that there's no DCE passes abutting
> fwprop as DCE is really easy in an SSA world.

fwprop.c calls delete_trivially_dead_insns, so it does some light DCE.
One thing I wanted to do (but ran out of time to do) was get the main
SSA insn-change routine (rtl_ssa::change_insns) to record when an
instruction becomes dead, and then perform DCE as part of the later
rtl_ssa::perform_pending_updates step.  This would be much cheaper
than doing another full scan of the instruction stream (which is what
delete_trivially_dead_insns needs needs to do).

Unfortunately, I suspect we're relying on this delete_trivially_dead_insns
call to delete instructions that became dead during earlier passes, not just
those that become dead during fwprop.c.  So I guess we would need a full
DCE at some point: making fwprop.c clean up its own mess might not be
enough.

>> * The SSA code groups blocks into extended basic blocks, with the
>>   EBBs rather than individual blocks having phi nodes.  
> So I haven't looked at the patch, but the usual place to put PHIs is at
> the dominance frontier.  But extra PHIs just increase time/memory and
> shouldn't affect correctness.

Yeah, the phis still are at dominance frontiers (except for certain
cases where we use degenerate phis to maintain a linear RPO view;
see the doc patch for more details about that).  It wasn't clear from
the description above, but I was really talking about a pure data
structure choice: once we have both BBs and EBBs, the phis naturally
attach to the EBB data structure rather than the BB data structure,
since second and subsequent BBs in an EBB have a single predecessor
and so never need phi nodes.

>> * The framework also provides live range information for registers
>>   within an extended basic block and allows instructions to move within
>>   their EBB.  It might be useful to allow further movement in future;
>>   I just don't have a use case for it yet.
> Yup.   You could do something like Click's algorithm to schedule the
> instructions in a block to maximize CSE opportunities on top of this.

Yeah.

>> * One advantage of the new infrastructure is that it gives
>>   recog_for_combine-like behaviour: if recog wants to add clobbers
>>   of things like the flags register, the SSA code will make sure
>>   that the flags register is free.
> I look more at the intersection between combine and SSA as an
> opportunity to combine on extended blocks, simplify the "does dataflow
> allow this combination" logic, drop the need to build/maintain LOG_LINKS
> and more generally simplify note distribution.

Yeah, my ultimate goal (for GCC12, I hope this time for real) is still
to provide an SSA version of combine.  Initially it could sit alongside
the existing combine pass, and perhaps be run only after RA by default.
(Testing locally, that seems to give nice results, and reduces the
pressure to reimplement everything in combine.c in one go.)

But the point above was instead that, at the moment, combine is the
only pass that can add (say) new clobbers of the flags register as
part of a recog.  I think ideally *all* passes should be able to do that.
But passes would then need to track the live ranges of the flags
register in order to tell when the flags register is free.  One of the
side-benefits of the SSA stuff is that it can do this in amortised
sublinear complexity.  So RTL SSA provides its own interface to recog
that can do the same things as recog_for_combine does.

>> * Debug instructions get SSA information too, on a best-effort basis.
>>   Providing complete information would be significantly more expensive.
>>
>> * I wasn't sure for new C++ code whether to stick to the old C /* … */
>>   comments, or whether to switch to //.  In the end I went for //,
>>   on the basis that:
>>
>>   - The ranger code already does this.
>>
>>   - // is certainly more idiomatic in C++.
>>
>>   - // is in the lisp tradition of per-line comments and it matches the
>>     ;; used in .md files.  I feel sure that GCC would have been written
>>     using // from the outset if that had been possible.
> I think we're allowing both and realistically /* */ vs // shouldn't be
> something we spend a lot of time arguing about :-)

Agreed :-)

Richard


More information about the Gcc-patches mailing list