[EXTERNAL] Re: [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd

Navid Rahimi navidrahimi@microsoft.com
Tue Nov 9 16:25:09 GMT 2021


Hi Richard,

Thank you so much for your detailed feedback. I am attaching another version of the patch which does include all the changes you mentioned.

Bellow you can see my response to your feedbacks:

> the canonical order of the plus is (plus:s (trunc_div ...) integer_onep) as
> constants come last - you can then remove the 'c'
Fixed. I was not aware of the canonical order.

> you can use INTEGRAL_TYPE_P (type).
Fixed. Didn't know about "type" either.

> this test is not necessary
Fixed.

> But should we also optimize x * (2 + y/x) - y -> 2*x - y % x?  So
> it looks like a conflict with the x * (1 + b) <-> x + x * b transform
> (fold_plusminus_mult)?  That said, the special case of one
> definitely makes the result cheaper (one less operation).
For this special case, it does remove operation indeed. But I was not able to reproduce it for any other constant [1]. If it was possible to do it with other constants I would've changed the pattern to have be more general like "x * (C + y/x) - y -> C*x - y % x". Basically anything other than 1 wasn't a win. Regarding the "x * (1 + b) <-> x + x * b" transformation, it appears to me when there is a "- y" at the end "x * (1 + b)", there is opportunity to optimize. Without that "- y" I was not able to make any operation more performant. Either direction, looks like same amount of computation.

1) https://compiler-explorer.com/z/dWsq7Tzf4

> Please move the pattern next to the most relatest which I think is
Fixed.

> the return value of 1 is an unreliable way to fail, please instead call
> __builtin_abort ();
Fixed.

> do we really need -O3 for this?  I think that -O should be enough here?
We don't specifically need that. But I realized that the optimization can happen in two different level at the compiler. It seems if you spread the computation over multiple statement like:
  int c = a/b;
  int y = b * (1 + c);
  return y - a;

instead of :
  return b * (1 + a / b) - a;

Then you have to have at least -O1 to have it optimized. Granted, I am not doing that in the testcase. In the new patch I am changing it to -O. Let me know if you have any suggestions.

Best wishes,
Navid.

________________________________________
From: Richard Biener <richard.guenther@gmail.com>
Sent: Tuesday, November 9, 2021 02:36
To: Navid Rahimi
Cc: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd

On Tue, Nov 9, 2021 at 5:12 AM Navid Rahimi via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi GCC community,
>
> This patch will add the missed pattern described in bug 102232 [1] to the match.pd. The testcase will test whether the multiplication and division has been removed from the code or not. The correctness proof for this pattern is here [2] in case anyone is curious.
>
> PR tree-optimization/102232
>       * match.pd (x * (1 + y / x) - y) -> (x - y % x): New optimization.

+/* x * (1 + y / x) - y -> x - y % x */
+(simplify
+ (minus (mult:cs @0 (plus:cs integer_onep (trunc_div:s @1 @0))) @1)

the canonical order of the plus is (plus:s (trunc_div ...) integer_onep) as
constants come last - you can then remove the 'c'

+ (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))

you can use INTEGRAL_TYPE_P (type).

+      && types_match (@0, @1))

this test is not necessary

+  (minus @0 (trunc_mod @1 @0))))

But should we also optimize x * (2 + y/x) - y -> 2*x - y % x?  So
it looks like a conflict with the x * (1 + b) <-> x + x * b transform
(fold_plusminus_mult)?  That said, the special case of one
definitely makes the result cheaper (one less operation).

Please move the pattern next to the most relatest which I think is

/* X - (X / Y) * Y is the same as X % Y.  */
(simplify
 (minus (convert1? @0) (convert2? (mult:c (trunc_div @@0 @@1) @1)))
 (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
  (convert (trunc_mod @0 @1))))

+int
+main (void)
+{
+  // few randomly generated test cases
+  if (foo (71856034, 238) != 212)
+    {
+      return 1;

the return value of 1 is an unreliable way to fail, please instead call
__builtin_abort ();

+/* { dg-options "-O3 -fdump-tree-optimized" } */

do we really need -O3 for this?  I think that -O should be enough here?

Thanks,
Richard.

>       * gcc.dg/tree-ssa/pr102232.c: testcase for this optimization.
>
>
> 1) https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D102232&data=04%7C01%7Cnavidrahimi%40microsoft.com%7Cbc89643c6a14411d11ac08d9a36ce282%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637720510334697749%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Br7B1ShCT4ynLwypyM29LqaNF9GBJF3%2BIR6sZrDRTKM%3D&reserved=0
> 2) https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Falive2.llvm.org%2Fce%2Fz%2F2VScjD&data=04%7C01%7Cnavidrahimi%40microsoft.com%7Cbc89643c6a14411d11ac08d9a36ce282%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637720510334707741%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FH0%2BwEHmaGjyclNFD2HrKMXtXDscPAlc9uUs3p6UMkw%3D&reserved=0
>
> Best wishes,
> Navid.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-PR-tree-optimization-102232.patch
Type: application/octet-stream
Size: 2563 bytes
Desc: 0001-PR-tree-optimization-102232.patch
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20211109/3e6bbe9e/attachment-0001.obj>


More information about the Gcc-patches mailing list