[PATCH][v2] tree-optimization: Fold (type)X / (type)Y [PR103855]

Jeff Law jeffreyalaw@gmail.com
Sat Jul 9 18:19:40 GMT 2022

On 3/16/2022 6:49 PM, Zhao Wei Liew via Gcc-patches wrote:
> Thanks for the detailed review. I have uploaded patch v2 based on the review.
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590604.html
> Changes since v1:
> 1. Add patterns for the cases CST / (T)x and (T)x / CST as well. Fix
> test regressions caused by those patterns.
> 2. Support signed integers except for the possible INT_MIN / -1 case.
> 3. Support cases where X and Y have different precisions.
> On Tue, 22 Feb 2022 at 15:54, Richard Biener <richard.guenther@gmail.com> wrote:
>> +/* (type)X / (type)Y -> (type)(X / Y)
>> +   when the resulting type is at least precise as the original types
>> +   and when all the types are unsigned integral types. */
>> +(simplify
>> + (trunc_div (convert @0) (convert @1))
>> + (if (INTEGRAL_TYPE_P (type)
>> +      && INTEGRAL_TYPE_P (TREE_TYPE (@0))
>> +      && INTEGRAL_TYPE_P (TREE_TYPE (@1))
>> +      && TYPE_UNSIGNED (type)
>> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
>> +      && TYPE_UNSIGNED (TREE_TYPE (@1))
>> +      && TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@0)))
>> +  (convert (trunc_div @0 @1))))
>> since you are requiring the types of @0 and @1 to match it's easier to write
>>       && types_match (TREE_TYPE(@0), TREE_TYPE (@1))
> Thanks. In the new patch, TREE_TYPE (@0) and TREE_TYPE (@1) can now
> have different precisions, so I believe that I can't use types_match()
> anymore.
>> that allows you to elide checks on either @0 or @1.  I suppose the transform
>> does not work for signed types because of the -INT_MIN / -1 overflow case?
>> It might be possible to use expr_not_equal_to (@0, -INT_MIN) ||
>> expr_not_equal_to (@1, -1)
>> (correctly written, lookup the existing examples in match.pd for the X
>> % -Y transform)
> That's right. I have made changes to support signed types except for
> the INT_MIN / -1 case.
>> I'll note that as written the transform will not catch CST / (T)x or
>> (T)x / CST since
>> you'll not see conversions around constants.  I'm not sure whether
>> using (convert[12]? ...)
>> or writing special patterns with INTEGER_CST operands is more convenient.
>> There is int_fits_type_p to check whether a constant will fit in a
>> type without truncation
>> or sign change.
> I see. I couldn't find an easy way to use (convert[12]? ...) in this
> case so I added 2 other patterns for the CST / (T)x and (T)x / CST
> cases.
>> When @0 and @1 do not have the same type there might still be a common type
>> that can be used and is smaller than 'type', it might be as simple as using
>> build_nonstandard_integer_type (MIN (@0-prec, @1-prec), 1 /*unsigned_p*/).
> Thanks for the tip. I've made a similar change, but using either
> TREE_TYPE (@0) or TREE_TYPE (@1) instead of
> build_nonstandard_integer_type().
>> In the past there have been attempts to more globally narrow operations using
>> a new pass rather than using individual patterns.  So for more complicated cases
>> that might be the way to go.  There's now also the ISEL pass which does
>> pre-RTL expansion transforms that need some global context and for example
>> can look at SSA uses.
> I had wanted to do something similar to handle all integral
> trunc_divs, but I'm not sure where/how to start. It seems out of my
> league at this moment. I'll gladly explore it after this change
> though!
Just a couple general notes in case you want to poke further in this space.

Earlier you mentioned you were trying to do some of this work in 
expand_divmod, but the operands had rtx types rather than tree types, 
thus you're unable to get at the range data.

In expand_expr_divmod there's treeop0, treeop1.  So if they are 
SSA_NAMEs, then you can query for their range.

Richi mentioned attempts to globally narrow during a new pass. That's a 
much broader chance, but has the potential to apply in a lot more 
places.    Narrowing early would potentially expose the narrowed 
statements to the vectorizer which could be a win. Narrowing late is 
likely a win too, but it likely only helps the expansion phase generate 
narrower operations.  This is obviously a larger project than just 
handling the division cases in match.pd. Tackling it is not (IMHO) a 
requirement to move forward.

Finally, narrowing isn't always a win.  There have been 
micro-architectures were sub-word accesses are slower than word 
accesses.  I'm not too worried about it for this patch, but it could 
become more important if you were to look into a more general solution.

As far as the patch is concerned.  At one point I was a bit worried that 
expr_not_equal_to wasn't right.  But after digging a bit deeper into its 
implementation, it should do the right thing here.  Note if you wanted 
to see how to get to underlying range data, you can find a simple 
example in that function.

Your new testcase needs to limit itself to targets where integers are at 
least 32 bits wide.    Something like this at the top of the new test:

/* { dg-require-effective-target int32plus } */

There are additional cases Andrew Pinski pointed out in BZ.  Does your 
new code handle all of them.  It's OK if it doesn't, but obviously not 
optimal.  Consider adding his additional testcases, potentially xfailed 
if they're not handled yet.

You changed gcc.dg/ipa/pr91088.c and the ChangeLog indicates "fix a test 
regression".   Can you say a bit more about what's going on there.  It's 
likely find, but we want to make sure that your change hasn't 
compromised the test.

Otherwise it looks pretty good.


More information about the Gcc-patches mailing list