This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation
- From: Jeff Law <law at redhat dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: Ajit Kumar Agarwal <ajit dot kumar dot agarwal at xilinx dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Vinod Kathail <vinodk at xilinx dot com>, Shail Aditya Gupta <shailadi at xilinx dot com>, Vidhumouli Hunsigida <vidhum at xilinx dot com>, Nagaraju Mekala <nmekala at xilinx dot com>
- Date: Thu, 10 Dec 2015 13:08:45 -0700
- Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation
- Authentication-results: sourceware.org; auth=none
- References: <37378DC5BCD0EE48BA4B082E0B55DFAA41F3F56C at XAP-PVEXMBX02 dot xlnx dot xilinx dot com> <37378DC5BCD0EE48BA4B082E0B55DFAA41F42541 at XAP-PVEXMBX02 dot xlnx dot xilinx dot com> <CAFiYyc23ZRxFv4S=XKQUX2VgT113GDZVVjOYViSoYy4yP11cdg at mail dot gmail dot com> <37378DC5BCD0EE48BA4B082E0B55DFAA4295155D at XAP-PVEXMBX02 dot xlnx dot xilinx dot com> <37378DC5BCD0EE48BA4B082E0B55DFAA4295ADCB at XAP-PVEXMBX02 dot xlnx dot xilinx dot com> <55D4F921 dot 2020708 at redhat dot com> <37378DC5BCD0EE48BA4B082E0B55DFAA4297704C at XAP-PVEXMBX02 dot xlnx dot xilinx dot com> <5643A732 dot 4040707 at redhat dot com> <CAFiYyc07fxKy=pxo4t81M8WVOx9PLnzmMthkbBm52wub93dErQ at mail dot gmail dot com> <5644C6CC dot 90203 at redhat dot com> <5644DB59 dot 9040809 at redhat dot com> <56450B62 dot 4090404 at redhat dot com> <CAFiYyc06=_w4+B0C1d8y0rgCmNCy7vvsFfdvCqstnh6t5zsqbA at mail dot gmail dot com> <56460F19 dot 5010009 at redhat dot com> <0B62FFB6-DF7A-4080-A655-3E51070E1DEE at gmail dot com> <564646AA dot 5030300 at redhat dot com> <564673DA dot 3020403 at redhat dot com> <CAFiYyc2WXOS5kJFeKPa0aqaWbHV-CnDeWJ98zRa5X=FbAZ2THw at mail dot gmail dot com>
On 12/03/2015 07:38 AM, Richard Biener wrote:
This pass is now enabled by default with -Os but has no limits on the amount of
stmts it copies.
The more statements it copies, the more likely it is that the path
spitting will turn out to be useful! It's counter-intuitive.
The primary benefit AFAICT with path splitting is that it exposes
additional CSE, DCE, etc opportunities.
IIRC Ajit posited that it could help with live/conflict analysis, I
never saw that, and with the changes to push splitting deeper into the
pipeline I'd further life/conflict analysis since that work also
involved preserving the single latch property.
It also will make all loops with this shape have at least two
exits (if the resulting loop will be disambiguated the inner loop will
have two exits).
Having more than one exit will disable almost all loop optimizations after it.
Hmmm, the updated code keeps the single latch property, but I'm pretty
sure it won't keep a single exit policy.
To keep a single exit policy would require keeping an additional block
around. Each of the split paths would unconditionally transfer to this
new block. The new block would then either transfer to the latch block
or out of the loop.
The pass itself documents the transform it does but does zero to motivate it.
What's the benefit of this pass (apart from disrupting further optimizations)?
It's essentially building superblocks in a special case to enable
additional CSE, DCE and the like.
Unfortunately what is is missing is heuristics and de-duplication. The
former to drive cases when it's not useful and the latter to reduce
codesize for any statements that did not participate in optimizations
when they were duplicated.
The de-duplication is the "sink-statements-through-phi" problems, cross
jumping, tail merging and the like class of problems.
It was only after I approved this code after twiddling it for Ajit that
I came across Honza's tracer implementation, which may in fact be
retargettable to these loops and do a better job. I haven't
experimented with that.
I can see a _single_ case where duplicating the latch will allow threading
one of the paths through the loop header to eliminate the original exit. Then
disambiguation may create a nice nested loop out of this. Of course that
is only profitable again if you know the remaining single exit of the inner
loop (exiting to the outer one) is executed infrequently (thus the inner loop
actually loops).
It wasn't ever about threading.
But no checks other than on the CFG shape exist (oh, it checks it will
at _least_ copy two stmts!).
Again, the more statements it copies the more likely it is to be
profitable. Think superblocks to expose CSE, DCE and the like.
Given the profitability constraints above (well, correct me if I am
wrong on these)
it looks like the whole transform should be done within the FSM threading
code which might be able to compute whether there will be an inner loop
with a single exit only.
While it shares some concepts with jump threading, I don't think the
transformation belongs in jump threading.
I'm inclined to request the pass to be removed again or at least disabled by
default.
I wouldn't lose any sleep if we disabled by default or removed,
particularly if we can repurpose Honza's code. In fact, I might
strongly support the former until we hear back from Ajit on performance
data.
I also keep coming back to Click's paper on code motion -- in that
context, copying statements would be a way to break dependencies and
give the global code motion algorithm more freedom. The advantage of
doing it in a framework like Click's is it's got a built-in sinking step.
What closed source benchmark was this transform invented for?
I think it was EEMBC or Coremark. Ajit should know for sure. I was
actually still hoping to see benchmark results from Ajit to confirm the
new placement in the pipeline didn't negate all the benefits he saw.
jeff