This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Fix PR44214
On Fri, 2012-04-20 at 10:04 +0200, Richard Guenther wrote:
> On Thu, 19 Apr 2012, William J. Schmidt wrote:
>
> > This enhances constant folding for division by complex and vector
> > constants. When -freciprocal-math is present, such divisions are
> > converted into multiplies by the constant reciprocal. When an exact
> > reciprocal is available, this is done for vector constants when
> > optimizing. I did not implement logic for exact reciprocals of complex
> > constants because either (a) the complexity doesn't justify the
> > likelihood of occurrence, or (b) I'm lazy. Your choice. ;)
> >
> > Bootstrapped with no new regressions on powerpc64-unknown-linux-gnu. Ok
> > for trunk?
>
> See below ...
>
> > Thanks,
> > Bill
> >
> >
> > gcc:
> >
> > 2012-04-19 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
> >
> > PR rtl-optimization/44214
> > * fold-const.c (exact_inverse): New function.
> > (fold_binary_loc): Fold vector and complex division by constant into
> > multiply by recripocal with flag_reciprocal_math; fold vector division
> > by constant into multiply by reciprocal with exact inverse.
> >
> > gcc/testsuite:
> >
> > 2012-04-19 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
> >
> > PR rtl-optimization/44214
> > * gcc.target/powerpc/pr44214-1.c: New test.
> > * gcc.dg/pr44214-2.c: Likewise.
> > * gcc.target/powerpc/pr44214-3.c: Likewise.
> >
> >
> > Index: gcc/fold-const.c
> > ===================================================================
> > --- gcc/fold-const.c (revision 186573)
> > +++ gcc/fold-const.c (working copy)
> > @@ -9693,6 +9693,48 @@ fold_addr_of_array_ref_difference (location_t loc,
> > return NULL_TREE;
> > }
> >
> > +/* If the real or vector real constant CST of type TYPE has an exact
> > + inverse, return it, else return NULL. */
> > +
> > +static tree
> > +exact_inverse (tree type, tree cst)
> > +{
> > + REAL_VALUE_TYPE r;
> > + tree unit_type, *elts;
> > + enum machine_mode mode;
> > + unsigned vec_nelts, i;
> > +
> > + switch (TREE_CODE (cst))
> > + {
> > + case REAL_CST:
> > + r = TREE_REAL_CST (cst);
> > +
> > + if (exact_real_inverse (TYPE_MODE (type), &r))
> > + return build_real (type, r);
> > +
> > + return NULL_TREE;
> > +
> > + case VECTOR_CST:
> > + vec_nelts = VECTOR_CST_NELTS (cst);
> > + elts = XALLOCAVEC (tree, vec_nelts);
> > + unit_type = TREE_TYPE (type);
> > + mode = TYPE_MODE (unit_type);
> > +
> > + for (i = 0; i < vec_nelts; i++)
> > + {
> > + r = TREE_REAL_CST (VECTOR_CST_ELT (cst, i));
> > + if (!exact_real_inverse (mode, &r))
> > + return NULL_TREE;
> > + elts[i] = build_real (unit_type, r);
> > + }
> > +
> > + return build_vector (type, elts);
> > +
> > + default:
> > + return NULL_TREE;
> > + }
> > +}
> > +
> > /* Fold a binary expression of code CODE and type TYPE with operands
> > OP0 and OP1. LOC is the location of the resulting expression.
> > Return the folded expression if folding is successful. Otherwise,
> > @@ -11734,23 +11776,25 @@ fold_binary_loc (location_t loc,
> > so only do this if -freciprocal-math. We can actually
> > always safely do it if ARG1 is a power of two, but it's hard to
> > tell if it is or not in a portable manner. */
> > - if (TREE_CODE (arg1) == REAL_CST)
> > + if (TREE_CODE (arg1) == REAL_CST
> > + || (TREE_CODE (arg1) == COMPLEX_CST
> > + && COMPLEX_FLOAT_TYPE_P (TREE_TYPE (arg1)))
> > + || (TREE_CODE (arg1) == VECTOR_CST
> > + && VECTOR_FLOAT_TYPE_P (TREE_TYPE (arg1))))
> > {
> > if (flag_reciprocal_math
> > - && 0 != (tem = const_binop (code, build_real (type, dconst1),
> > + && 0 != (tem = fold_binary (code, type, build_one_cst (type),
> > arg1)))
>
> Any reason for not using const_binop?
As it turns out, no. I (blindly) made this change based on your comment
in the PR...
"The fold code should probably simply use fold_binary to do the constant
folding (which already should handle 1/x for x vector and complex. There
is a build_one_cst to build the constant 1 for any type)."
...but now that I've looked at it that was unnecessary, so I must have
misinterpreted this. I'll revert to using const_binop.
>
> > return fold_build2_loc (loc, MULT_EXPR, type, arg0, tem);
> > - /* Find the reciprocal if optimizing and the result is exact. */
> > - if (optimize)
> > + /* Find the reciprocal if optimizing and the result is exact.
> > + TODO: Complex reciprocal not implemented. */
> > + if (optimize
> > + && TREE_CODE (arg1) != COMPLEX_CST)
>
> I know this is all pre-existing, but really the flag_reciprocal_math
> case should be under if (optimize), too. So, can you move this check
> to the toplevel covering both cases?
Sure.
>
> The testcases should apply to generic vectors, too, and should scan
> the .original dump (where folding first applied). So they should
> not be target specific (and they should use -freciprocal-math).
OK. I was ignorant of the generic vector syntax using __attribute__.
If I change pr44214-1.c to use:
--
typedef double v2sd __attribute__ ((vector_size (16)));
void do_div (v2sd *a, v2sd *b)
{
v2sd c = { 2.0, 3.0 };
*a = *b / c;
}
--
it compiles fine, but the .original dump doesn't show the fold:
--
;; Function do_div (null)
;; enabled by -tree-original
{
v2sd c = { 2.0e+0, 3.0e+0 };
v2sd c = { 2.0e+0, 3.0e+0 };
*a = *b / c;
}
--
However, the .optimized dump does:
--
;; Function do_div (do_div, funcdef_no=0, decl_uid=1998, cgraph_uid=0)
do_div (v2sd * a, v2sd * b)
{
double D.2019;
double D.2018;
double D.2017;
double D.2016;
vector(2) double D.2009;
vector(2) double D.2008;
<bb 2>:
D.2008_2 = *b_1(D);
D.2016_7 = BIT_FIELD_REF <D.2008_2, 64, 0>;
D.2017_8 = D.2016_7 * 5.0e-1;
D.2018_9 = BIT_FIELD_REF <D.2008_2, 64, 64>;
D.2019_10 = D.2018_9 *
3.33333333333333314829616256247390992939472198486328125e-1;
D.2009_3 = {D.2017_8, D.2019_10};
*a_4(D) = D.2009_3;
return;
}
--
For the complex case, the .original dump works fine. Is there something
different I should be doing for the vector case?
Thanks,
Bill
>
> Ok with those changes.
>
> Thanks,
> Richard.
>
> > {
> > - REAL_VALUE_TYPE r;
> > - r = TREE_REAL_CST (arg1);
> > - if (exact_real_inverse (TYPE_MODE(TREE_TYPE(arg0)), &r))
> > - {
> > - tem = build_real (type, r);
> > - return fold_build2_loc (loc, MULT_EXPR, type,
> > - fold_convert_loc (loc, type, arg0), tem);
> > - }
> > + tree inverse = exact_inverse (TREE_TYPE (arg0), arg1);
> > +
> > + if (inverse)
> > + return fold_build2_loc (loc, MULT_EXPR, type, arg0, inverse);
> > }
> > }
> > /* Convert A/B/C to A/(B*C). */
> > Index: gcc/testsuite/gcc.target/powerpc/pr44214-3.c
> > ===================================================================
> > --- gcc/testsuite/gcc.target/powerpc/pr44214-3.c (revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/pr44214-3.c (revision 0)
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mcpu=power7 -fdump-tree-optimized" } */
> > +
> > +void do_div (vector double *a, vector double *b)
> > +{
> > + *a = *b / (vector double) { 2.0, 2.0 };
> > +}
> > +
> > +/* Since 2.0 has an exact reciprocal, constant folding should multiply *b
> > + by the reciprocals of the vector elements. As a result there should be
> > + one vector multiply and zero divides in the optimized code. The string
> > + " * " occurs 3 times: one multiply and two indirect parameters. */
> > +
> > +/* { dg-final { scan-tree-dump-times " \\\* " 3 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times " / " 0 "optimized" } } */
> > +/* { dg-final { cleanup-tree-dump "optimized" } } */
> > Index: gcc/testsuite/gcc.target/powerpc/pr44214-1.c
> > ===================================================================
> > --- gcc/testsuite/gcc.target/powerpc/pr44214-1.c (revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/pr44214-1.c (revision 0)
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -ffast-math -mcpu=power7 -fdump-tree-optimized" } */
> > +
> > +void do_div (vector double *a, vector double *b)
> > +{
> > + *a = *b / (vector double) { 2.0, 3.0 };
> > +}
> > +
> > +/* Constant folding should multiply *b by the reciprocals of the
> > + vector elements. As a result there should be one vector multiply
> > + and zero divides in the optimized code. The string " * " occurs
> > + 3 times: one multiply and two indirect parameters. */
> > +
> > +/* { dg-final { scan-tree-dump-times " \\\* " 3 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times " / " 0 "optimized" } } */
> > +/* { dg-final { cleanup-tree-dump "optimized" } } */
> > Index: gcc/testsuite/gcc.dg/pr44214-2.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/pr44214-2.c (revision 0)
> > +++ gcc/testsuite/gcc.dg/pr44214-2.c (revision 0)
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -ffast-math -fdump-tree-optimized" } */
> > +
> > +void do_div (_Complex double *a, _Complex double *b)
> > +{
> > + *a = *b / (4.0 - 5.0fi);
> > +}
> > +
> > +/* Constant folding should multiply *b by the reciprocal of 4-5i
> > + = 4/41 - (5/41)i. As a result there should be 4 multiplies and
> > + zero divides in the optimized code. The string " * " occurs 6
> > + times: 4 multiplies and 2 indirect parameters. */
> > +
> > +/* { dg-final { scan-tree-dump-times " \\\* " 6 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times " / " 0 "optimized" } } */
> > +/* { dg-final { cleanup-tree-dump "optimized" } } */
>