This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)
- From: Bernhard Reutner-Fischer <rep dot dot dot nop at gmail dot com>
- To: gcc-patches at gcc dot gnu dot org,Jeff Law <law at redhat dot com>,Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>,Jackson Woodruff <jackson dot woodruff at foss dot arm dot com>,Richard Biener <richard dot guenther at gmail dot com>
- Cc: "kyrylo dot tkachov at foss dot arm dot com" <kyrylo dot tkachov at foss dot arm dot com>,"Joseph S. Myers" <joseph at codesourcery dot com>,GCC Patches <gcc-patches at gcc dot gnu dot org>,nd <nd at arm dot com>
- Date: Sat, 16 Sep 2017 23:39:12 +0200
- Subject: Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)
- Authentication-results: sourceware.org; auth=none
- References: <375649d4-3c43-0c37-3e4d-3913c7213993@foss.arm.com> <CAFiYyc3W59mDXn-pEKNO1S3Uk=2XpowRtT65ZgJb_E8AGnCK-Q@mail.gmail.com> <DB6PR0801MB205392EDB2B58B757167CC26838D0@DB6PR0801MB2053.eurprd08.prod.outlook.com> <CAFiYyc0wGaBuujfTUSVrsAOUoOkoO6SCYB9Mh+kPJ7DP68pSUA@mail.gmail.com> <DB6PR0801MB2053EA8A9D0D899A21DF5CF183830@DB6PR0801MB2053.eurprd08.prod.outlook.com> <41575a02-f31a-3944-9efa-c4db14e1606d@redhat.com> <2c8f1237-fdb2-b7de-0f7d-6ba15c1368fe@foss.arm.com> <CAFiYyc3ggiuEoPKG5e_KNU0aMzT4TS+N4RkXcnynt+dUS6ZXtQ@mail.gmail.com> <dbeb5c74-3035-71d6-e2d8-49e8b0c46b7f@foss.arm.com> <CAFiYyc0Su3E_qu=pucVnPfH2O+9pz+=AJAtGdkgFvgiwLA=7mQ@mail.gmail.com> <e4218e1c-5c2f-56d3-4682-1eab065fdd5c@foss.arm.com> <d278713e-3075-51fb-af1a-27ca35825876@redhat.com> <DB6PR0801MB2053A5F95768973B9D5421CF836E0@DB6PR0801MB2053.eurprd08.prod.outlook.com> <a1e736d4-180c-184b-8bdc-c0659df359b1@redhat.com>
On 15 September 2017 18:50:26 CEST, Jeff Law <law@redhat.com> wrote:
>On 09/13/2017 03:20 PM, Wilco Dijkstra wrote:
>> Jeff Law wrote:
>>> On 09/06/2017 03:55 AM, Jackson Woodruff wrote:
>>>> On 08/30/2017 01:46 PM, Richard Biener wrote:
>>
>>>>> rdivtmp = 1 / (y*C);
>>>>> tem = x *rdivtmp;
>>>>> tem2= z * rdivtmp;
>>>>>
>>>>> instead of
>>>>>
>>>>> rdivtmp = 1/y;
>>>>> tem = x * 1/C * rdivtmp;
>>>>> tem2 = z * 1/C * rdivtmp;
>>>>
>>>> Ideally we would be able to CSE that into
>>>>
>>>> rdivtmp = 1/y * 1/C;
>>>> tem = x * rdivtmp;
>>>> tem2 = z * rdivtmp;
>>> So why is your sequence significantly better than Richi's desired
>>> seqeuence? They both seem to need 3 mults and a division (which in
>both
>>> cases might be a reciprocal estimation). In Richi's sequence we
>have
>>> to mult and feed the result as an operand into the reciprocal insn.
>In
>>> yours we feed the result of the reciprocal into the multiply.
>>
>> Basically this stuff happens a lot in real code, which is exactly why
>I proposed it.
>> I even provided counts of how many divisions each transformation
>avoids:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026.
>I don't doubt that it happens in real code. There's a reason why MIPS
>IV added recip and rsqrt 20 years ago. Our ability to exploit them has
>always been fairly limited though.
>
>What I'm trying to avoid is a transformation where the two forms are
>roughly equal in terms of what they expose vs what they hide.
>
>If pulling the 1/c out is consistently better, then that's obviously
>good. If it's essentially a toss-up because of the other interactions
>with CSE, then we need to think about it more deeply.
>
>I _suspect_ that pulling the 1/c out is generally better, but something
>more concrete than my intuition would be helpful.
>
>
>
>
>>
>> Note this transformation is a canonicalization - if you can't merge a
>> constant somehow, moving it out to the LHS can expose more
>> opportunities, like in (C1 * x) / (C2 * y) -> (C1 * x * C2) / y ->
>(C3 * x) / y.
>> Same for negation as it behaves like * -1.
>FWIW, I generally agree.
>
>>
>> The key here is that it is at least an order of magnitude worse if
>you have
>> to execute an extra division than if you have an extra multiply.
>No doubt. I'll trade a division for a multiply any day. Similarly 1/C
>is just a constant, so I consider it essentially free.
>
>
>>
>>> ISTM in the end if y*C or 1/(y*C) is a CSE, then Richi's sequence
>wins.
>>> Similarly if 1/y is a CSE, then yours wins. Is there some reason
>to
>>> believe that one is a more likely CSE than the other?
>>
>> The idea is that 1/y is much more likely a CSE than 1/(y*C).
>Do we have anything other than intuition to back that up?
>>
>> We could make the pattern only fire in single use cases and see
>whether
>> that makes a difference. It would be easy to test old vs new vs
>single-use
>> new and count how many divisions we end up with. We could add 1/ (y *
>C)
>> to the reciprocal phase if it is unacceptable as a canonicalization,
>but then
>> you won't be able to optimize (C1 * x * C2) / y.
>We could. I think the question would then become does restricting to
>the single-use case kill too many opportunities.
>
>Sigh. I think the 4 of us could go round and round on this forever in
>the pursuit of the perfect code. Though in reality I'm happy with a
>clear improvement.
Btw reminds me a little bit of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28417
which IIRC never was applied. Maybe someone could talk Denys (a colleague of yours nowadays, Jeff) into submitting a real patch? ;)
Cheers,
>
>>
>>> I think there's a fundamental phase ordering problem here. You want
>to
>>> CSE stuff as much as possible, then expose reciprocals, then CSE
>again
>>> because exposing reciprocals can expose new CSE opportunities.
>>
>> I agree there are phase ordering issues and various problems in
>> reassociation, CSE and division optimizations not being able to find
>and
>> optimize complex cases that are worthwhile.
>>
>> However I don't agree doing CSE before reciprocals is a good idea. We
>> want to expose reciprocals early on, even if that means we find fewer
>> CSEs as a result - again because division is so much more expensive
>than
>> any other operation. CSE is generally not smart enough to CSE a * x
>in
>> a * b * x and a * c * x, something which is likely to happen quite
>frequently -
>> unlike the made up division examples here.
>We have much stronger reassociation capabilities for multiplication --
>it's a well understood problem and if we fail to pick up a * x across
>those two kind of expressions, I'd consider our reassociation code as
>failing pretty badly, particularly for integer types.
>
>BUt yes, division is expensive. And I'm all for a tranformation that
>turns a division into a multiplication. That's almost always a win.
>
>>
>> The first question is whether you see it as a canonicalization. If
>so, then
>> match.pd should be fine.
>Pulling the constant part out of the denominator and turning it into a
>multiplication by the recip constant should likely be considered
>canonicalization. I think Richi largely agreed to this in the thread
>as
>well and asked for that hunk of the patch to be extracted and submitted
>independent of the other changes so that it could go ahead and move
>forward while we figure out the rest.
>
>Note that tree-ssa-reassoc.c has a fair amount of this kind of
>canonicalization. In fact, I think that's where we handle things like
>pulling out negations. You may actually do better handling it there.
>
>
>
>Jeff