This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)

From: Richard Biener <richard dot guenther at gmail dot com>
To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
Cc: Jeff Law <law at redhat dot com>, Jackson Woodruff <jackson dot woodruff at foss dot arm dot com>, "kyrylo dot tkachov at foss dot arm dot com" <kyrylo dot tkachov at foss dot arm dot com>, "Joseph S. Myers" <joseph at codesourcery dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
Date: Fri, 15 Sep 2017 14:07:53 +0200
Subject: Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)
Authentication-results: sourceware.org; auth=none
References: <375649d4-3c43-0c37-3e4d-3913c7213993@foss.arm.com> <CAFiYyc3W59mDXn-pEKNO1S3Uk=2XpowRtT65ZgJb_E8AGnCK-Q@mail.gmail.com> <DB6PR0801MB205392EDB2B58B757167CC26838D0@DB6PR0801MB2053.eurprd08.prod.outlook.com> <CAFiYyc0wGaBuujfTUSVrsAOUoOkoO6SCYB9Mh+kPJ7DP68pSUA@mail.gmail.com> <DB6PR0801MB2053EA8A9D0D899A21DF5CF183830@DB6PR0801MB2053.eurprd08.prod.outlook.com> <41575a02-f31a-3944-9efa-c4db14e1606d@redhat.com> <2c8f1237-fdb2-b7de-0f7d-6ba15c1368fe@foss.arm.com> <CAFiYyc3ggiuEoPKG5e_KNU0aMzT4TS+N4RkXcnynt+dUS6ZXtQ@mail.gmail.com> <dbeb5c74-3035-71d6-e2d8-49e8b0c46b7f@foss.arm.com> <CAFiYyc0Su3E_qu=pucVnPfH2O+9pz+=AJAtGdkgFvgiwLA=7mQ@mail.gmail.com> <e4218e1c-5c2f-56d3-4682-1eab065fdd5c@foss.arm.com> <d278713e-3075-51fb-af1a-27ca35825876@redhat.com> <DB6PR0801MB2053A5F95768973B9D5421CF836E0@DB6PR0801MB2053.eurprd08.prod.outlook.com>

On Wed, Sep 13, 2017 at 11:20 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Jeff Law wrote:
>> On 09/06/2017 03:55 AM, Jackson Woodruff wrote:
>> > On 08/30/2017 01:46 PM, Richard Biener wrote:
>
>>>>   rdivtmp = 1 / (y*C);
>>>>   tem = x *rdivtmp;
>>>>   tem2= z * rdivtmp;
>>>>
>>>> instead of
>>>>
>>>>   rdivtmp = 1/y;
>>>>   tem = x * 1/C * rdivtmp;
>>>>   tem2 = z * 1/C * rdivtmp;
>>>
>>> Ideally we would be able to CSE that into
>>>
>>> rdivtmp = 1/y * 1/C;
>>> tem = x * rdivtmp;
>>> tem2 = z * rdivtmp;
>> So why is your sequence significantly better than Richi's desired
>> seqeuence?  They both seem to need 3 mults and a division (which in both
>> cases might be a reciprocal estimation).    In Richi's sequence we have
>> to mult and feed the result as an operand into the reciprocal insn.  In
>> yours we feed the result of the reciprocal into the multiply.
>
> Basically this stuff happens a lot in real code, which is exactly why I proposed it.
> I even provided counts of how many divisions each transformation avoids:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026.
>
> Note this transformation is a canonicalization - if you can't merge a
> constant somehow, moving it out to the LHS can expose more
> opportunities, like in (C1 * x) / (C2 * y) -> (C1 * x * C2) / y -> (C3 * x) / y.
> Same for negation as it behaves like * -1.
>
> The key here is that it is at least an order of magnitude worse if you have
> to execute an extra division than if you have an extra multiply.
>
>> ISTM in the end if  y*C or 1/(y*C) is a CSE, then Richi's sequence wins.
>> Similarly if 1/y is a CSE, then yours wins.  Is there some reason to
>> believe that one is a more likely CSE than the other?
>
> The idea is that 1/y is much more likely a CSE than 1/(y*C).
>
> We could make the pattern only fire in single use cases and see whether
> that makes a difference. It would be easy to test old vs new vs single-use
> new and count how many divisions we end up with. We could add 1/ (y * C)
> to the reciprocal phase if it is unacceptable as a canonicalization, but then
> you won't be able to optimize (C1 * x * C2) / y.
>
>> I think there's a fundamental phase ordering problem here.  You want to
>> CSE stuff as much as possible, then expose reciprocals, then CSE again
>> because exposing reciprocals can expose new CSE opportunities.
>
> I agree there are phase ordering issues and various problems in
> reassociation, CSE and division optimizations not being able to find and
> optimize complex cases that are worthwhile.
>
> However I don't agree doing CSE before reciprocals is a good idea. We
> want to expose reciprocals early on, even if that means we find fewer
> CSEs as a result - again because division is so much more expensive than
> any other operation. CSE is generally not smart enough to CSE a * x in
> a * b * x and a * c * x, something which is likely to happen quite frequently -
> unlike the made up division examples here.
>
>> And I suspect that no matter how hard we try, there's going to be cases
>> that get exposed by various transformations in the pipeline such that to
>> fully optimize the cse - reciprocal - cse sequence would need to be
>> repeated to fully optimize.  We may have to live with not being
>> particularly good at picking up the those second order effects.
>>
>> I do think that the need for cse - reciprocal - cse  sequencing suggests
>> that match.pd may not be the right place for these transformations.  I
>> think Richi has pointed this out a couple times already.
>
> I don't think you can ever expect to find the optimal case by repeating
> optimizations. It's quite easy to construct examples where first doing CSE
> makes things significantly worse. Ultimately to get something optimal you'd
> need to try lots of permutations and count for each possible permutation
> how many multiplies and divisons you end up after full optimization.
> Quite impractical...
>
>> I'm not going to accept or reject at this time.  I think we need to make
>> a higher level decision.  Are these transformations better suited for
>> match.pd or the reciprocal transformation pass?  I realize that some
>> patterns are already in match.pd, but let's try to settle the higher
>> level issue before we add more.
>
> The first question is whether you see it as a canonicalization. If so, then
> match.pd should be fine.

A canonicalization to more divisions is not fine.  That is, if we think
that x / (C * y) is non-canonical because constant parts should be
in the denominator then fine, canonicalize it as (x * C') / y with
C' = 1/C.  But then implement it so, not in a pattern that suggests
you'll end up with two divisions.

Let me repeat though that this looks like a job for re-association.

>From that perspective the part of the patch doing

+ /* Simplify x / (C * y) to (x * (1 / C)) / y where C is a constant.  */
+ (if (optimize)
+  (simplify
+   (rdiv @0
+    (mult @1 REAL_CST@2))
+   (if (!real_zerop (@1))
+    (with
+     { tree tem = const_binop (RDIV_EXPR, type, build_one_cst (type), @2); }
+     (if (tem)
+      (rdiv (mult @0 { tem; } ) @1))))))

looks ok apart from the if (optimize) check and the real_zerop (@1) check
which should test @2.

_please_ split the patch up.

+/* Simplify x / (- y) to -x / y.  */
+(simplify
+ (rdiv @0 (negate @1))
+ (rdiv (negate @0) @1))

I think the comments on both patterns should say 'Canonicalize' instead of
'Simplify' as those are not simplifications, they are even Canonicalizations
that change the value in one case (which usually makes canonicalizations
questionable IMHO given nothign guarantees the followup optimziation they
should enable).

Richard.

> Wilco

References:
- Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)
  - From: Jackson Woodruff
- Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)
  - From: Jeff Law
- Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]