This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [fix] PR39300: make PRE not disturb vectorizer


On Mon, Jul 20, 2009 at 5:57 PM, Michael Matz<matz@suse.de> wrote:
> Hi,
>
> so this solves a long standing interaction between PRE and the vectorizer.
> The latter is incapable of vectorizing this code:
>
> ?float tmp = data[0];
> ?for (i = 1; i < 1024; ++i)
> ? ?{
> ? ? ?float tmp2 = data[i];
> ? ? ?res[i] = tmp + tmp2;
> ? ? ?tmp = tmp2;
> ? ?}
>
> This loop-carried dependency of data[i-1] inhibits vectorization.
> Predictive commoning and PRE can create these situations even when the
> user didn't explicitely wrote them in this way. ?The predcom case was
> fixed earlier by moving that pass behind the vectorizer, leaving the PRE
> problem.
>
> Now, there are multiple options of fixing this, and I tried all of them in
> various degrees:
> 1) enhance the vectorizer to deal with this, that would help also code
> ? written explicitely in that form. ?That turned out to be rather
> ? non-trivial in a half-way optimal way, but in case we ever are able to
> ? do that we can easily rip out the code that follows.
> 2) move PRE behind the vectorizer much like predcom. ?After making PRE and
> ? sink_code not destroy loop-closed SSA form this somewhat works, but
> ? introduces quite some regression, because PRE is one of the few passes
> ? that moves out loop-independent accesses, which the vectorizer expects
> ? to have happened. ?I extended LIM do do that too, but limiting its
> ? heuristics of when this is profitable to all cases that the vectorizer
> ? needs introduces other regressions regarding speed.
> 3) add another PRE pass, one at the current place, but dumbed down a bit
> ? to not create the bad situations, and another one, not dumbed down,
> ? after the vectorizer. ?I ruled that out fairly early because it would
> ? add another not terribly fast pass for little gain.
> 4) dumb down PRE just the right amount to not create the bad situations
> ? but only when we can't prove that the vectorizer might be disturbed by
> ? some simple deductions.
>
> After having explored 1, 2 and 4 I'm most satisfied with option 4. ?It's
> easy to rip out, doesn't introduce regressions, fixed the problem and
> improves SPEC2006 like so:
>
> -O3 -ffast-math
> ? ? ? ? ? ? ? ?without patch ? ? ? ? ? ? with patch
> 458.sjeng ? ? ? 12100 ? ?999 ? ? 12.1 ? * 12100 ? ?973 ? ? ? ? 12.4 ? *
> 434.zeusmp ? ? ? 9100 ? ?988 ? ? ?9.21 ?* ?9100 ? ?913 ? ? ? ? ?9.97 ?*
> 436.cactusADM ? 11950 ? 1490 ? ? ?8.02 ?* 11950 ? 1210 ? ? ? ? ?9.84 ?*
>
> So, that's a 3%, 8% to 23% improvement.
>
> So, what it does is basically extend the predicates which are used already
> now to generate loop-carried dependencies (e.g. induction variables).
> First we also inhibit some REFERENCE expression, but only when its place
> looks like an induction variable. ?Otherwise it's either loop-independent
> and then something else will either move it out the loop or prevent the
> vectorizer from working, or it's loop-dependend but not in some easy way,
> which also prevents vectorization.
>
> In either way we're not going to inhibit vectorization more then it
> already was, so we can as well go out our way and generate this
> loop-carried dependency. ?We do the same if the vectorizer isn't even
> active, so we certainly don't regress with -O2 or the like.
>
> Regstrapped on x86_64-linux with Ada, no regressions. ?Okay for trunk?
>
>
> Ciao,
> Michael.
> --
> ? ? ? ?PR tree-optimization/39300
>
> ? ? ? ?* tree-ssa-pre.c (includes): Include tree-scalar-evolution.h.
> ? ? ? ?(inhibit_phi_insertion): New function.
> ? ? ? ?(insert_into_preds_of_block): Call it for REFERENCEs.
> ? ? ? ?(init_pre): Initialize and finalize scalar evolutions.
> ? ? ? ?* Makefile.in (tree-ssa-pre.o): Depend on tree-scalar-evolution.h .
>
> Index: Makefile.in
> ===================================================================
> --- Makefile.in (revision 149806)
> +++ Makefile.in (working copy)
> @@ -2248,7 +2248,7 @@ tree-ssa-pre.o : tree-ssa-pre.c $(TREE_F
> ? ?$(TM_H) coretypes.h $(TREE_DUMP_H) $(TREE_PASS_H) $(FLAGS_H) langhooks.h \
> ? ?$(CFGLOOP_H) alloc-pool.h $(BASIC_BLOCK_H) $(BITMAP_H) $(HASHTAB_H) \
> ? ?$(GIMPLE_H) $(TREE_INLINE_H) tree-iterator.h tree-ssa-sccvn.h $(PARAMS_H) \
> - ? $(DBGCNT_H)
> + ? $(DBGCNT_H) tree-scalar-evolution.h
> ?tree-ssa-sccvn.o : tree-ssa-sccvn.c $(TREE_FLOW_H) $(CONFIG_H) \
> ? ?$(SYSTEM_H) $(TREE_H) $(GGC_H) $(DIAGNOSTIC_H) $(TIMEVAR_H) $(FIBHEAP_H) \
> ? ?$(TM_H) coretypes.h $(TREE_DUMP_H) $(TREE_PASS_H) $(FLAGS_H) $(CFGLOOP_H) \
> Index: tree-ssa-pre.c
> ===================================================================
> --- tree-ssa-pre.c ? ? ?(revision 149806)
> +++ tree-ssa-pre.c ? ? ?(working copy)
> @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.
> ?#include "langhooks.h"
> ?#include "cfgloop.h"
> ?#include "tree-ssa-sccvn.h"
> +#include "tree-scalar-evolution.h"
> ?#include "params.h"
> ?#include "dbgcnt.h"
>
> @@ -3081,6 +3082,63 @@ create_expression_by_pieces (basic_block
> ?}
>
>
> +/* Returns true if we want to inhibit the insertions of PHI nodes
> + ? for the given EXPR for basic block BB (a member of a loop).
> + ? We want to do this, when we fear that the induction variable we
> + ? create might inhibit vectorization. ?*/
> +
> +static bool
> +inhibit_phi_insertion (basic_block bb, pre_expr expr)
> +{
> + ?vn_reference_t vr = PRE_EXPR_REFERENCE (expr);
> + ?VEC (vn_reference_op_s, heap) *ops = vr->operands;
> + ?vn_reference_op_t op;
> + ?unsigned i;
> +
> + ?/* If we aren't going to vectorize we don't inhibit anything. ?*/
> + ?if (!flag_tree_vectorize)
> + ? ?return false;
> +
> + ?/* Otherwise we inhibit the insertion when the address of the
> + ? ? memory reference is a simple induction variable. ?In other
> + ? ? cases the vectorizer won't do anything anyway (either it's
> + ? ? loop invariant or a complicated expression). ?*/
> + ?for (i = 0; VEC_iterate (vn_reference_op_s, ops, i, op); ++i)
> + ? ?{
> + ? ? ?switch (op->opcode)
> + ? ? ? {
> + ? ? ? case ALIGN_INDIRECT_REF:
> + ? ? ? case INDIRECT_REF:
> + ? ? ? case MISALIGNED_INDIRECT_REF:
> + ? ? ? case COMPONENT_REF:
> + ? ? ? case VAR_DECL:
> + ? ? ? ? break;

Rather than the above and

> +       default:
> +         return true;

just do

            default:
               break;

to be conservative in what you are preventing PHIs for.  In
particular you miss explicit handling of ARRAY_REFs.  Thus you
likely want to add

            case ARRAY_REF:
            case ARRAY_RANGE_REF:
                /* Check if the array index, op->op0 is an induction
variable.  */
                /* Fallthru.  */

> + ? ? ? case SSA_NAME:
> + ? ? ? ? {
> + ? ? ? ? ? basic_block defbb = gimple_bb (SSA_NAME_DEF_STMT (op->op0));
> + ? ? ? ? ? affine_iv iv;
> + ? ? ? ? ? /* Default defs are loop invariant. ?*/
> + ? ? ? ? ? if (!defbb)
> + ? ? ? ? ? ? return false;
> + ? ? ? ? ? /* Defined outside this loop, also loop invariant. ?*/
> + ? ? ? ? ? if (!flow_bb_inside_loop_p (bb->loop_father, defbb))
> + ? ? ? ? ? ? return false;
> + ? ? ? ? ? /* If it's a simple induction variable inhibit insertion,
> + ? ? ? ? ? ? ?the vectorizer might be interested in this one. ?*/
> + ? ? ? ? ? if (simple_iv (bb->loop_father, bb->loop_father,
> + ? ? ? ? ? ? ? ? ? ? ? ? ?op->op0, &iv, true))
> + ? ? ? ? ? ? return true;
> + ? ? ? ? ? /* No simple IV, vectorizer can't do anything, hence no
> + ? ? ? ? ? ? ?reason to inhibit the transformation. ?*/
> + ? ? ? ? ? return false;

and simply break here instead of returing false right away (further
array indices or an indirect-ref base may follow).

IMHO the performance numbers speak for themselves.  Can you
re-check with the above change (and also throw it onto the C++
tester once) and add a vectorization testcase as HJ suggests?

Ok with that changes.

Thanks,
Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]