This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)


On 09/06/2017 03:55 AM, Jackson Woodruff wrote:
> On 08/30/2017 01:46 PM, Richard Biener wrote:
>> On Wed, Aug 30, 2017 at 11:46 AM, Jackson Woodruff
>> <jackson.woodruff@foss.arm.com> wrote:
>>> On 08/29/2017 01:13 PM, Richard Biener wrote:
>>>>
>>>> On Tue, Aug 29, 2017 at 1:35 PM, Jackson Woodruff
>>>> <jackson.woodruff@foss.arm.com> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> Apologies again to those CC'ed, who (again) received this twice.
>>>>>
>>>>> Joseph: Yes you are correct. I misread the original thread, now fixed.
>>>>>
>>>>> Richard: I've moved the optimizations out of fold-const.c. One has
>>>>> been
>>>>> replicated in match.pd, and the other (x / C +- y / C -> (x +- y) / C)
>>>>> I've
>>>>> deleted as it only introduced a new optimization when running with the
>>>>> flags
>>>>> '-O0 -funsafe-math-optimizations'.
>>>>
>>>>
>>>> Hmm, how did you verify that, that it only adds sth with -O0
>>>> -funsafe-math-optimizations?
>>>
>>>
>>> By checking with various flags, although not exhaustively. I looked for
>>> reasons for the behavior in match.pd (explained below).
>>>
>>> I have also since discovered that the combinations of
>>> '-funsafe-math-optimizations -frounding-math' (at all levels) and
>>> '-fno-recriprocal-math -funsafe-math-optimizations' mean this pattern
>>> adds
>>> something.
>>>
>>>> Is that because in GIMPLE the reassoc pass should do this transform?
>>>
>>> It's because the pattern that changes (X / C) -> X * (1 / C) is gated
>>> with
>>> O1:
>>>
>>>    (for cst (REAL_CST COMPLEX_CST VECTOR_CST)
>>>     (simplify
>>>      (rdiv @0 cst@1)
>>> ->    (if (optimize)
>>> ->     (if (flag_reciprocal_math
>>>        && !real_zerop (@1))
>>>        (with
>>>         { tree tem = const_binop (RDIV_EXPR, type, build_one_cst
>>> (type), @1);
>>> }
>>>         (if (tem)
>>>          (mult @0 { tem; } )))
>>>        (if (cst != COMPLEX_CST)
>>>         (with { tree inverse = exact_inverse (type, @1); }
>>>          (if (inverse)
>>>           (mult @0 { inverse; } ))))))))
>>>
>>>
>>> I've flagged the two lines that are particularly relevant to this.
>>
>> So this means we go x / (C * y) -> (x / C) / y -> (x * (1/C)) / y
>> why's that in any way preferable?  I suppose this is again to enable
>> the recip pass to detect / y (as opposed to / (C * y))?  What's the
>> reason to believe that / y is more "frequent"?
>>
>>> Removing this pattern, as I would expect, means that the divisions in
>>> the
>>> above optimization (and the one further down) are not removed.
>>>
>>> So then there is the question of edge cases. This pattern is
>>> (ignoring the
>>> second case) going to fail when const_binop returns null. Looking
>>> through
>>> that function says that it will fail (for reals) when:
>>>
>>> - Either argument is null (not the case)
>>>
>>> - The operation is not in the list (PLUS_EXPR,
>>>    MINUS_EXPR, MULT_EXPR, RDIV_EXPR, MIN_EXPR, MAX_EXPR)
>>>    (again not the case)
>>>
>>> - We honor Signalling NaNs and one of the operands is a sNaN.
>>>
>>> - The operation is a division, and the second argument is zero
>>>    and dividing by 0.0 raises an exception.
>>>
>>> - The result is infinity and neither of the operands were infinity
>>>    and flag_trapping_math is set.
>>>
>>> - The result isn't exact and flag_rounding_math is set.
>>>
>>>
>>> For (x / ( y * C) -> (x / C) / y), I will add some gating for each of
>>> these
>>> so that the pattern is never executed if one of these would be the case.
>>
>> Why not transform this directly to (x * (1/C)) / y then (and only then)?
>> That makes it obvious not two divisions prevail.
> 
> Done.
> 
>>
>> That said, I'm questioning this canonicalization.  I can come up with a
>> testcase where it makes things worse:
>>
>>   tem = x / (y * C);
>>   tem2 = z / (y * C);
>>
>> should generate
>>
>>   rdivtmp = 1 / (y*C);
>>   tem = x *rdivtmp;
>>   tem2= z * rdivtmp;
>>
>> instead of
>>
>>   rdivtmp = 1/y;
>>   tem = x * 1/C * rdivtmp;
>>   tem2 = z * 1/C * rdivtmp;
> 
> Ideally we would be able to CSE that into
> 
> rdivtmp = 1/y * 1/C;
> tem = x * rdivtmp;
> tem2 = z * rdivtmp;
So why is your sequence significantly better than Richi's desired
seqeuence?  They both seem to need 3 mults and a division (which in both
cases might be a reciprocal estimation).    In Richi's sequence we have
to mult and feed the result as an operand into the reciprocal insn.  In
yours we feed the result of the reciprocal into the multiply.

ISTM in the end if  y*C or 1/(y*C) is a CSE, then Richi's sequence wins.
 Similarly if 1/y is a CSE, then yours wins.  Is there some reason to
believe that one is a more likely CSE than the other?

> Although we currently do not. An equally (perhaps more?) problematic
> case is something like:
> 
> tem = x / (y * C)
> tem2 = y * C
> 
> Which becomes:
> 
> tem = x * (1 / C) / y
> tem2 = y * C
> 
> Instead of
> 
> K = y * C
> tem = x / K
> tem2 = K
> 
> Which ultimately requires context awareness to avoid. This does seem to
> be a general problem with a large number of match.pd patterns rather
> than anything specific to this one. For example, a similar example can
> be constructed for (say) (A / B) / C -> (A / (B * C)).
I think there's a fundamental phase ordering problem here.  You want to
CSE stuff as much as possible, then expose reciprocals, then CSE again
because exposing reciprocals can expose new CSE opportunities.

And I suspect that no matter how hard we try, there's going to be cases
that get exposed by various transformations in the pipeline such that to
fully optimize the cse - reciprocal - cse sequence would need to be
repeated to fully optimize.  We may have to live with not being
particularly good at picking up the those second order effects.

I do think that the need for cse - reciprocal - cse  sequencing suggests
that match.pd may not be the right place for these transformations.  I
think Richi has pointed this out a couple times already.

I'm not going to accept or reject at this time.  I think we need to make
a higher level decision.  Are these transformations better suited for
match.pd or the reciprocal transformation pass?  I realize that some
patterns are already in match.pd, but let's try to settle the higher
level issue before we add more.

jeff



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]