[EXTERNAL] Re: [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd
Tue Nov 23 03:09:07 GMT 2021
On 11/10/2021 1:35 AM, Richard Biener via Gcc-patches wrote:
> On Tue, Nov 9, 2021 at 5:25 PM Navid Rahimi <email@example.com> wrote:
>> Hi Richard,
>> Thank you so much for your detailed feedback. I am attaching another version of the patch which does include all the changes you mentioned.
>> Bellow you can see my response to your feedbacks:
>>> the canonical order of the plus is (plus:s (trunc_div ...) integer_onep) as
>>> constants come last - you can then remove the 'c'
>> Fixed. I was not aware of the canonical order.
>>> you can use INTEGRAL_TYPE_P (type).
>> Fixed. Didn't know about "type" either.
>>> this test is not necessary
>>> But should we also optimize x * (2 + y/x) - y -> 2*x - y % x? So
>>> it looks like a conflict with the x * (1 + b) <-> x + x * b transform
>>> (fold_plusminus_mult)? That said, the special case of one
>>> definitely makes the result cheaper (one less operation).
>> For this special case, it does remove operation indeed. But I was not able to reproduce it for any other constant . If it was possible to do it with other constants I would've changed the pattern to have be more general like "x * (C + y/x) - y -> C*x - y % x". Basically anything other than 1 wasn't a win. Regarding the "x * (1 + b) <-> x + x * b" transformation, it appears to me when there is a "- y" at the end "x * (1 + b)", there is opportunity to optimize. Without that "- y" I was not able to make any operation more performant. Either direction, looks like same amount of computation.
>> 1) https://compiler-explorer.com/z/dWsq7Tzf4
>>> Please move the pattern next to the most relatest which I think is
>>> the return value of 1 is an unreliable way to fail, please instead call
>>> __builtin_abort ();
>>> do we really need -O3 for this? I think that -O should be enough here?
>> We don't specifically need that. But I realized that the optimization can happen in two different level at the compiler. It seems if you spread the computation over multiple statement like:
>> int c = a/b;
>> int y = b * (1 + c);
>> return y - a;
>> instead of :
>> return b * (1 + a / b) - a;
>> Then you have to have at least -O1 to have it optimized. Granted, I am not doing that in the testcase. In the new patch I am changing it to -O. Let me know if you have any suggestions.
> -O is fine, generally at -O0 we shouldn't expect such transforms to
> happen (but they still do, of course).
> The patch looks OK now.
I've pushed this to the trunk for Navid.
More information about the Gcc-patches