This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Fix PR19431 a bit, make PRE remove loads
- From: Dorit Nuzman <DORIT at il dot ibm dot com>
- To: Richard Guenther <rguenther at suse dot de>
- Cc: Daniel Berlin <dberlin at dberlin dot org>, gcc-patches at gcc dot gnu dot org, Ira Rosen <IRAR at il dot ibm dot com>
- Date: Tue, 17 Apr 2007 21:42:00 +0300
- Subject: Re: [PATCH] Fix PR19431 a bit, make PRE remove loads
> On Tue, 17 Apr 2007, Richard Guenther wrote:
>
> >
> > This improves PR19431 by teaching PRE on how to remove trivial loads
like
> > in (from std::min):
> >
> > <bb 2>:
> > D.2477_6 = a;
> > D.2478_7 = b;
> > if (D.2477_6 < D.2478_7) goto <L1>; else goto <L2>;
> >
> > <L1>:;
> >
> > # a_8 = PHI <&b(2), &a(3)>
> > <L2>:;
> > D.2459_2 = *a_8;
> > return D.2459_2;
> >
> > so that phiopt can later turn this into a MIN_EXPR.
> >
> > Fortunately a more complete solution has been posted (but keeps being
> > ignored) here:
> > http://gcc.gnu.org/ml/gcc-patches/2007-02/msg01331.html
> >
> > So I'll post this variant again which improves PRE but that lacks in
> > several ways:
> >
> > - PRE is run too late, only the phiopt pass after loop (and
vectorizer)
> > recognizes the new opportunity
> >
> > - nothing marks no longer addressable vars as registers after PRE, so
> > the C testcase has to be XFAILed for now.
> >
> > - It doesn't work for chains like std::min(std::max(a, b), c)
> >
> >
> > Still, bootstrap and regtest in progress. Ok for mainline if it
passes?
>
> So the vectorizer doesn't like if we apply this to loads in loops, so
> I'm back to the original variant which requires all loads to be simple.
just curious: how does the load look like to the vectorizer, and why
doesn't it like it? (I wonder if it makes sense to extend the vectorizer to
handle such non-simple loads? are we losing a lot of PRE opportunities by
reverting to the original variant?)
thanks,
dorit
> Also there's a fortran FE problem (PR31608) which makes us ICE on
> gfortran.dg/achar_4.f90.
>
> Richard.
>
> For reference:
>
> 2007-04-17 Richard Guenther <rguenther@suse.de>
>
> PR tree-optimization/19431
> * tree-ssa-pre.c (do_regular_insertion): Do insertion if
> that can remove all loads.
>
> * gcc.dg/tree-ssa/ssa-pre-17.c: New testcase.
> * g++.dg/tree-ssa/ssa-pre-1.C: Likewise.
>
> Index: tree-ssa-pre.c
> ===================================================================
> *** tree-ssa-pre.c (revision 123910)
> --- tree-ssa-pre.c (working copy)
> *************** do_regular_insertion (basic_block block,
> *** 2596,2601 ****
> --- 2596,2602 ----
> bool by_some = false;
> bool cant_insert = false;
> bool all_same = true;
> + bool is_load, all_loads_simple;
> tree first_s = NULL;
> edge pred;
> basic_block bprime;
> *************** do_regular_insertion (basic_block block,
> *** 2612,2617 ****
> --- 2613,2621 ----
> continue;
> }
>
> + is_load = TREE_CODE (expr) == INDIRECT_REF;
> + all_loads_simple = true;
> +
> avail = XCNEWVEC (tree, last_basic_block);
> FOR_EACH_EDGE (pred, ei, block->preds)
> {
> *************** do_regular_insertion (basic_block block,
> *** 2645,2650 ****
> --- 2649,2671 ----
> break;
> }
>
> + /* See if this is a load and if so, if on this edge
> + the load can be trivially removed. */
> + if (is_load
> + && TREE_CODE (eprime) == INDIRECT_REF)
> + {
> + tree vhe = TREE_OPERAND (eprime, 0);
> + if (TREE_CODE (vhe) == VALUE_HANDLE)
> + {
> + bitmap_set_t exprset = VALUE_HANDLE_EXPR_SET (vhe);
> + unsigned int firstbit;
> + firstbit = bitmap_first_set_bit (exprset->expressions);
> + vhe = expression_for_id (firstbit);
> + }
> + if (TREE_CODE (vhe) != ADDR_EXPR)
> + all_loads_simple = false;
> + }
> +
> eprime = fully_constant_expression (eprime);
> vprime = get_value_handle (eprime);
> gcc_assert (vprime);
> *************** do_regular_insertion (basic_block block,
> *** 2670,2676 ****
> already existing along every predecessor, and
> it's defined by some predecessor, it is
> partially redundant. */
> ! if (!cant_insert && !all_same && by_some)
> {
> if (insert_into_preds_of_block (block, get_expression_id
(expr),
> avail))
> --- 2691,2698 ----
> already existing along every predecessor, and
> it's defined by some predecessor, it is
> partially redundant. */
> ! if (!cant_insert && !all_same
> ! && (by_some || (is_load && all_loads_simple)))
> {
> if (insert_into_preds_of_block (block, get_expression_id
(expr),
> avail))
> Index: testsuite/gcc.dg/tree-ssa/ssa-pre-17.c
> ===================================================================
> *** testsuite/gcc.dg/tree-ssa/ssa-pre-17.c (revision 0)
> --- testsuite/gcc.dg/tree-ssa/ssa-pre-17.c (revision 0)
> ***************
> *** 0 ****
> --- 1,28 ----
> + /* { dg-do compile } */
> + /* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> + /* We want this optimized as
> + <bb 2>:
> + if (k != 0) goto <L2>; else goto <L3>;
> +
> + <L3>:;
> + i1 = j1;
> +
> + <L2>:;
> + return i1;
> +
> + This requires that i1 and j1 are changed into registers after they
> + no longer have their address taken. */
> +
> + int f(int k, int i1, int j1)
> + {
> + int *f1;
> + if(k)
> + f1 = &i1;
> + else
> + f1 = &j1;
> + return *f1;
> + }
> +
> + /* { dg-final { scan-tree-dump-times "goto" 1 "optimized" { xfail
> *-*-* } } } */
> + /* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: testsuite/g++.dg/tree-ssa/ssa-pre-1.C
> ===================================================================
> *** testsuite/g++.dg/tree-ssa/ssa-pre-1.C (revision 0)
> --- testsuite/g++.dg/tree-ssa/ssa-pre-1.C (revision 0)
> ***************
> *** 0 ****
> --- 1,13 ----
> + /* { dg-do compile } */
> + /* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> + template<class T> static inline const T &ref_min(const T &a, const T
&b)
> + { return a<b ? a : b; }
> +
> + int foo(int a, int b)
> + {
> + return ref_min(a, b);
> + }
> +
> + /* { dg-final { scan-tree-dump "MIN_EXPR" "optimized" } } */
> + /* { dg-final { cleanup-tree-dump "optimized" } } */