This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][combine] PR rtl-optimization/68651 Try changing rtx from (r + r) to (r << 1) to aid recognition



On 16/12/15 17:28, Jeff Law wrote:
On 12/16/2015 10:00 AM, Kyrill Tkachov wrote:

On 16/12/15 12:18, Bernd Schmidt wrote:
On 12/15/2015 05:21 PM, Kyrill Tkachov wrote:
Then for the shift pattern in the MD file we'd have to
dynamically select the scheduling type depending on whether or
not the shift amount is 1 and the costs line up?

Yes. This isn't unusual, take a look at i386.md where you have a
lot of switches on attr type to decide which string to print.


I'm just worried that if we take this idea to its logical conclusion,
we have to add a new canonicalisation rule: "all (plus x x)
expressions shall be expressed as (ashift x 1)". Such a rule seems
too specific to me and all targets would have to special-case it in
their MD patterns and costs if they ever wanted to treat an add and a
shift differently. In this particular case we'd have to
conditionalise the scheduling string selection on a particular CPU
tuning and the shift amount, which will make the pattern much harder
to read. To implement this properly we'd also have to
That's not terribly unusual.  And we've done those kind of canonicalization rules before -- most recently to deal with issues in combine we settled on canonicalization rules for ashift vs mult.  While there was fallout, it's manageable.



The price we pay when trying these substitutions is an iteration
over the rtx with FOR_EACH_SUBRTX_PTR. recog gets called only if
that iteration actually performed a substitution of x + x into x
<< 1. Is that too high a price to pay? (I'm not familiar with the
performance characteristics of the FOR_EACH_SUBRTX machinery)

It depends on how many of these transforms we are going to try; it
 also feels very hackish, trying to work around the core design of
the combiner. IMO it would be better for machine descriptions to
work with the pass rather than against it.


Perhaps I'm lacking the historical context, but what is the core
design of the combiner? Why should the backend have to jump through
these hoops if it already communicates to the midend (through correct
rtx costs) that a shift is more expensive than a plus? I'd be more
inclined to agree that this is perhaps a limitation in recog rather
than combine, but still not a backend problem.
The historical design of combine is pretty simple.

Use data dependence to substitute the definition of an operand in a use
of the operand. Essentially create bigger blobs of RTL. Canonicalize and
simplify that larger blob of RTL, then try to match it with a pattern in
the backend.

Note that costing didn't enter the picture. The assumption was that if
the combination succeeds, then it's profitable (fewer insns).  We haven't generally encouraged trying to match multiple forms of equivalent expressions, instead we declare a canonical form and make sure combine uses it.



Thanks for the explanation.


If you can somehow arrange for the (plus x x) to be turned into a
shift while substituting that might be yet another approach to
try.


I did investigate where else we could make this transformation. For
the zero_extend+shift case (the ubfiz instruction from the testcase
in my original submission) we could fix this by modifying
make_extraction to convert its argument to a shift from (plus x x)
as, in that context, shifts are undoubtedly more likely to simplify
with the various extraction operations that it's trying to perform.
Note that canonicalizing (plus x x) to (ashift x 1) is consistent with the canonicalization we do for (mult x C) to (ashift x log2 (C)) where C is an exact power of two.

When we made that change consistently (there were cases where we instead preferred MULT in the past), we had to fix some backends, but the fallout wasn't terrible.

I would think from a representational standpoint canonicalizing (plus x x) to (ashift x 1) would be generally a good thing.


Ok. Gathering from the above it's combines' job to canonicalise, so implementing this approach would be simply a matter
of adding the transformation to combine_simplify_rtx. At least that does the trick for my testcases and the backend
can decide whether to emit the add instruction for the shift-by-one case.

If there's consensus on this approach I'll propose a patch for that.

Thanks for all your inputs,
Kyrill



jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]