[PATCH] Improve AB+-A -> A(B+-1) and A+-AB -> A(1+-B) match.pd optimization

Tue Dec 3 09:57:00 GMT 2019

On Tue, 3 Dec 2019, Richard Sandiford wrote:

> Jakub Jelinek <jakub@redhat.com> writes:
> > Hi!
> >
> > As discussed in the PR, we can't optimize e.g.
> >   int a = t - 1;
> >   int b = a * v;
> >   return b + v;
> > into return t * v; for signed non-wrapv arithmetics.  This can be done
> > by the match.pd (A * B) +- A -> (B +- 1) * A or
> > A +- (A * B) -> (1 +- B) * A canonicalizations.  Being a lazy guy,
> > I wrote attached brute force proglet to look for all the cases (for signed
> > char) where there is no UB in the original and the transformation would
> > introduce UB.  For the simple cases with just A and B, rather than A, B and
> > C, the problematic cases are for signed char only:
> > A*B+A -> (B+1)*A A==-1 && B==127
> > A*B+A -> (B+1)*A A==0 && B==127
> > A*B-A -> (B-1)*A A==0 && B==-128
> > A-A*B -> (1-B)*A A==-1 && B==-127
> > A-A*B -> (1-B)*A A==0 && B==-128
> > A-A*B -> (1-B)*A A==0 && B==-127
> > The current patterns already use VRP (tree_expr_nonzero_p and
> > expr_not_equal_to) to do the transformation only if A is known not to be 0
> > or -1.  But as the above problematic cases show, for A*B-A the -1
> > case is actually not problematic (transformation doesn't introduce UB;
> > if A is -1, -1*B+1 has UB only for minimum and minimum+1, and the
> > replacement (B-1)*-1 has also UB for those two cases only) and even when we
> > know nothing about A value range, if we know something about B value range,
> > we could still optimize.  So, for the
> >   int a = t - 1;
> >   int b = a * v;
> >   return b + v;
> > case, the a = t - 1 has value range that doesn't include maximum and
> > so we can conclude it is ok to transform it into return ((t - 1) + 1) * v
> > and thus t * v.
> >
> > Unfortunately, the patch "broke" a few loop_versioning_*.f90 tests (CCing
> > author), where there are small differences between the lversion pass,
> > e.g. in loop_versioning_1.f90 (f1):
> > -  # RANGE ~[0, 0]
> > -  _4 = iftmp.11_6 * S.4_19;
> > -  _11 = _4 - iftmp.11_6;
> > +  # RANGE [0, 9223372036854775806] NONZERO 9223372036854775807
> > +  _4 = S.4_19 + -1;
> > +  _11 = _4 * iftmp.11_6;
> > and the lversion pass then emits just one message instead of two, but in the
> > end assembly is identical.  In loop_versioning_6.f90 (though, with -m32
> > only), the code before lversion pass actually looks better in f1:
> > -  # i_35 = PHI <1(8), i_28(9)>
> > -  _9 = iftmp.33_15 * i_35;
> > -  _10 = _9 * 2;
> > -  _21 = _10 - iftmp.33_15;
> > -  (*x.0_23)[_21] = 1.0e+2;
> > -  _11 = i_35 * 2;
> > -  _12 = _11 + 1;
> > -  _13 = _12 * iftmp.33_15;
> > -  _22 = _13 - iftmp.33_15;
> > -  (*x.0_23)[_22] = 1.01e+2;
> > -  i_28 = i_35 + 1;
> > -  if (iftmp.36_25 < i_28)
> > +  # i_31 = PHI <1(8), i_26(9)>
> > +  _10 = iftmp.33_13 * i_31;
> > +  _11 = _10 * 2;
> > +  _19 = _11 - iftmp.33_13;
> > +  (*x.0_21)[_19] = 1.0e+2;
> > +  (*x.0_21)[_11] = 1.01e+2;
> > +  i_26 = i_31 + 1;
> > +  if (iftmp.36_23 < i_26)
> > where due to the new canonicalizations we managed to avoid some
> > multiplications.  One index was iftmp*i*2-iftmp and the other
> > was iftmp*(i*2+1)-iftmp and with the patch we managed to simplify
> > the latter into iftmp*i*2 and use for that the temporary used for
> > the first expression.  f1 is actually in assembly smaller because of this.
> > The lp64 vs. ! lp64 is just a wild guess, guess testing on further targets
> > will show what is the target property that matters.
> 
> We're supposed to version all 9 functions on all targets though,
> so I think we should xfail the test for ! lp64 rather than expect
> different output.
> 
> E.g. we longer vectorise f1 for -mabi=ilp32 on aarch64-linux-gnu,
> which is a regression of sorts,
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

> Richard
> 
> 
> 2019-12-03  Richard Sandiford  <richard.sandiford@arm.com>
> 
> gcc/testsuite/
> 	* gfortran.dg/loop_versioning_6.f90: XFAIL the scans for ! lp64.
> 
> Index: gcc/testsuite/gfortran.dg/loop_versioning_6.f90
> ===================================================================
> --- gcc/testsuite/gfortran.dg/loop_versioning_6.f90	2019-12-02 17:38:19.276428679 +0000
> +++ gcc/testsuite/gfortran.dg/loop_versioning_6.f90	2019-12-03 09:45:30.673414203 +0000
> @@ -89,7 +89,5 @@ subroutine f9(x, limit, step)
>    end do
>  end subroutine f9
>  
> -! { dg-final { scan-tree-dump-times {want to version containing loop} 9 "lversion" { target lp64 } } }
> -! { dg-final { scan-tree-dump-times {versioned this loop} 9 "lversion" { target lp64 } } }
> -! { dg-final { scan-tree-dump-times {want to version containing loop} 8 "lversion" { target { ! lp64 } } } }
> -! { dg-final { scan-tree-dump-times {versioned this loop} 8 "lversion" { target { ! lp64 } } } }
> +! { dg-final { scan-tree-dump-times {want to version containing loop} 9 "lversion" { xfail { ! lp64 } } } }
> +! { dg-final { scan-tree-dump-times {versioned this loop} 9 "lversion" { xfail { ! lp64 } } } }
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix ImendÃ¶rffer; HRB 36809 (AG Nuernberg)

[PATCH] Improve A*B+-A -> A*(B+-1) and A+-A*B -> A*(1+-B) match.pd optimization

[PATCH] Improve AB+-A -> A(B+-1) and A+-AB -> A(1+-B) match.pd optimization