[EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
Victor Tong
vitong@microsoft.com
Wed Jun 16 18:49:46 GMT 2021
Hi Richard,
Thanks for the feedback. From what you said, I can think of two possible solutions (though I'm not sure if either is feasible/fully correct):
Option 1: Have the new X * (Y / X) --> Y - (Y % X) optimization only run in scenarios that don't interfere with the existing X - (X / Y) * Y --> X % Y optimization.
This would involve checking the expression one level up to see if there's a subtraction that would trigger the existing optimization. I looked through the match.pd file and couldn't find a bail condition like this. It doesn't seem like there's a link from an expression to its parent expression one level up. This also feels a bit counter-intuitive since it would be doing the opposite of the bottom-up expression matching where the compiler would like to match a larger expression rather than a smaller one.
Option 2: Add a new pattern to support scenarios that the existing nop_convert pattern bails out on.
Existing pattern:
(simplify
(minus (nop_convert1? @0) (nop_convert2? (minus (nop_convert3? @@0) @1)))
(view_convert @1))
New pattern to add:
/* X - (X - Y) --> Y */
(simplify
(minus @0 (convert? (minus @@0 @1)))
(if (INTEGRAL_TYPE_P (type)
&& TYPE_OVERFLOW_UNDEFINED(type)
&& INTEGRAL_TYPE_P (TREE_TYPE(@1))
&& TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@1))
&& !TYPE_UNSIGNED (TREE_TYPE (@1))
&& !TYPE_UNSIGNED (type)
&& TYPE_PRECISION (TREE_TYPE (@1)) <= TYPE_PRECISION (type))
(convert @1)))
I think the truncation concerns that you brought up should be covered if the external expression type precision is greater than or equal to the internal expression type. There may be a sign extension operation (which is why the nop_convert check fails) but that shouldn't affect the value of the expression. And if the types involved are signed integers where overflow/underflow results in undefined behavior, the X - (X - Y) --> Y optimization should be legal.
Please correct me if I'm wrong with either one of these options, or if you can think of a better option to fix the regression.
Thanks,
Victor
From: Richard Biener <richard.guenther@gmail.com>
Sent: Monday, June 7, 2021 1:25 AM
To: Victor Tong <vitong@microsoft.com>
Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
On Wed, Jun 2, 2021 at 10:55 PM Victor Tong <vitong@microsoft.com> wrote:
>
> Hi Richard,
>
> Thanks for reviewing my patch. I did a search online and you're right -- there isn't a vector modulo instruction. I'll remove the X * (Y / X) --> Y - (Y % X) pattern and the existing X - (X / Y) * Y --> X % Y from triggering on vector types.
>
> I looked into why the following pattern isn't triggering:
>
> (simplify
> (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
> (view_convert @1))
>
> The nop_converts expand into tree_nop_conversion_p checks. In fn2() of the testsuite/gcc.dg/fold-minus-6.c, the expression during generic matching looks like:
>
> 42 - (long int) (42 - 42 % x)
>
> When looking at the right-hand side of the expression (the (long int) (42 - 42 % x)), the tree_nop_conversion_p check fails because of the type precision difference. The expression inside of the cast has a 32-bit precision and the outer expression has a 64-bit precision.
>
> I looked around at other patterns and it seems like nop_convert and view_convert are used because of underflow/overflow concerns. I'm not familiar with the two constructs. What's the difference between using them and checking TYPE_OVERFLOW_UNDEFINED? In the scenario above, since TYPE_OVERFLOW_UNDEFINED is true, the second pattern that I added (X - (X - Y) --> Y) gets triggered.
But TYPE_OVERFLOW_UNDEFINED is not a good condition here since the
conversion is the problematic one and
conversions have implementation defined behavior. Now, the above does
not match because it wasn't designed to,
and for non-constant '42' it would have needed a (convert ...) around
the first @0 as well (matching of constants is
by value, not by value + type).
That said, your
+/* X - (X - Y) --> Y */
+(simplify
+ (minus (convert1? @0) (convert2? (minus @@0 @1)))
+ (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) &&
TYPE_OVERFLOW_UNDEFINED(type))
+ (convert @1)))
would match (int)x - (int)(x - y) where you assert the outer subtract
has undefined behavior
on overflow but the inner subtract could wrap and the (int) conversion
can be truncating
or widening. Is that really always a valid transform then?
Richard.
> Thanks,
> Victor
>
>
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Tuesday, April 27, 2021 1:29 AM
> To: Victor Tong <vitong@microsoft.com>
> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> Subject: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
>
> On Thu, Apr 1, 2021 at 1:03 AM Victor Tong via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hello,
> >
> > This patch fixes PR tree-optimization/95176. A new pattern in match.pd was added to transform "a * (b / a)" --> "b - (b % a)". A new test case was also added to cover this scenario.
> >
> > The new pattern interfered with the existing pattern of "X - (X / Y) * Y". In some cases (such as in fn4() in gcc/testsuite/gcc.dg/fold-minus-6.c), the new pattern is applied causing the existing pattern to no longer apply. This results in worse code generation because the expression is left as "X - (X - Y)". An additional subtraction pattern of "X - (X - Y) --> Y" was added to this patch to avoid this regression.
> >
> > I also didn't remove the existing pattern because it triggered in more cases than the new pattern because of a tree_invariant_p check that's inserted by genmatch for the new pattern.
>
> Yes, we do not handle using Y multiple times when it might contain
> side-effects in GENERIC folding
> (comments in genmatch suggest we can use save_expr but we don't
> implement this [anymore]).
>
> On GIMPLE there's also the issue that your new pattern creates a
> complex expression which
> makes it failed to be used by value-numbering for example where the
> old pattern was OK
> (eventually, if no conversion was required).
>
> So indeed it looks OK to preserve both.
>
> I wonder why you needed the
>
> +/* X - (X - Y) --> Y */
> +(simplify
> + (minus (convert1? @0) (convert2? (minus @@0 @1)))
> + (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) &&
> TYPE_OVERFLOW_UNDEFINED(type))
> + (convert @1)))
>
> pattern since it should be handled by
>
> /* Match patterns that allow contracting a plus-minus pair
> irrespective of overflow issues. */
> /* (A +- B) - A -> +- B */
> /* (A +- B) -+ B -> A */
> /* A - (A +- B) -> -+ B */
> /* A +- (B -+ A) -> +- B */
>
> in particular
>
> (simplify
> (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
> (view_convert @1))
>
> if there's supported cases missing I'd rather extend this pattern than
> replicating it.
>
> +/* X * (Y / X) is the same as Y - (Y % X). */
> +(simplify
> + (mult:c (convert1? @0) (convert2? (trunc_div @1 @@0)))
> + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
> + (minus (convert @1) (convert (trunc_mod @1 @0)))))
>
> note that if you're allowing vector types you have to use
> (view_convert ...) in the
> transform and you also need to make sure that the target can expand
> the modulo - I suspect that's an issue with the existing pattern as well.
> I don't know of any vector ISA that supports modulo (or integer
> division, that is).
> Restricting the patterns to integer types is probably the most
> sensible solution.
>
> Thanks,
> Richard.
>
> > I verified that all "make -k check" tests pass when targeting x86_64-pc-linux-gnu.
> >
> > 2021-03-31 Victor Tong <vitong@microsoft.com>
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Two new patterns: One to optimize division followed by multiply and the other to avoid a regression as explained above
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/20030807-10.c: Update existing test to look for a subtraction because a shift is no longer emitted
> > * gcc.dg/pr95176.c: New test to cover optimizing division followed by multiply
> >
> > I don't have write access to the GCC repo but I've completed the FSF paperwork as I plan to make more contributions in the future. I'm looking for a sponsorship from an existing GCC maintainer before applying for write access.
> >
> > Thanks,
> > Victor
More information about the Gcc-patches
mailing list