This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] Predictive commoning, updated


> Hello,
>
> some progress report, since it took me rather long time to get to
> including your remarks.

this delay actually gives us a chance to incorporate into the patch the
decision (that haven't been finalized yet) to move predictive-commoning to
after vectorization. I think it was only pending this:

"
> So, is there a reason to schedule predictive-commoning davka before
> vectorization?

so that the loop optimization are also performed on the loops created by
loop unrolling in predictive commoning.  Nevertheless, I do not have a
strong preference about this.  If you come up with some performance
numbers/testcases that would support moving it elsewhere, it can be
done.
"

which I never got back to you on...
So, one thing, is the benefit from not preventing vectorization, as in this
case:
      for (i=0; i<n; i++)
         c[i] += a[i] * a[i+2];
If predcom is moved after vectorization we can vectorize the loop and gain
the respective speedup factor.

Also, there's the potential benefit from applying predictive-commoning on
vectorized code, especially when vectorizing unaligned accesses. We can't
see this benefit right now because the vectorizer effectively does
predictive commoning by itself (because it's really critical in order to
get reasonable performance with misaligned data, at least on PowerPC), but
it opens the option to simplify the realignment code in the vectorizer,
letting predcom do its job. Also I think we may create more opportunities
for predcom in the vectorizer in the future.

On top of that I was going to check the effect of moving the predcom pass
on mgrid, to make sure that predcom still has the same impact, but the
patch (http://gcc.gnu.org/ml/gcc-patches/2007-02/msg01304.html) didn't
compile for me (problems with ../../gcc/gcc/tree-affine.c:508 ?). If you
think it's worth it, I'll be happy to do this experiment (compile mgrid
with predcom before and after vectorization) if/when there's an updated
version of the patch that you can give me.

> I have run into some problems with
> data reference analysis (and I do not see a simple clear solution;
> either I can hack over them, or rewrite half of tree-data-ref.c;
> slowly I am getting persuaded that the latter is a better choice).
>

I'm curious what are the problems you're seeing, especially in case it's
problems that should concern the vectorizer?

thanks,
dorit

> > Would you be willing to be the maintainer of this pass?
>
> yes
>
> > > + /* Hash function for struct name_expansion.  */
> > > +
> > > + static hashval_t
> > > + name_expansion_hash (const void *e)
> > > + {
> > > +   return SSA_NAME_VERSION (((struct name_expansion *) e)->name);
> > > + }
> > > +
> > > + /* Equality function for struct name_expansion.  The second
> argument is an
> > > +    SSA name.  */
> > > +
> > > + static int
> > > + name_expansion_eq (const void *e, const void *n)
> > > + {
> > > +   return ((struct name_expansion *) e)->name == n;
> > > + }
> > > +
> >
> > Why not use a dense array indexed by SSA name?  Or a pointer map?
>
> regarding the array indexed by SSA versions -- this does not seem
> like a good choice,
> since only a few ssa names are usually inserted to the table, so
> allocating and deallocating it would consume too much time.  Pointer
> maps were not available when the patch was written; I have rewritten the
> patch to use them (although it does not simplify it significantly).
>
> > > + /* Similar to operand_equal_p, but handles the case that X and
> Y are NULL.  */
> > > +
> > > + static bool
> > > + operand_eq_p (tree x, tree y)
> > > + {
> > > +   if (!x)
> > > +     return y == NULL_TREE;
> > > +   if (!y)
> > > +     return false;
> > > +
> > > +   return operand_equal_p (x, y, 0);
> > > + }
> >
> > I would rather handle NULL_TREE in operand_equal_p.  Any reason not to?
>
> operand_equal_p is heavily used function, introducing any new checks to
> it could have negative impact on compilation speed (there are several
> other places in gcc that test for NULL_TREE before operand_equal_p,
> and they were not inlined to operand_equal_p for this reason).
>
> > > +   while (1)
> > > +     {
> > > +       block_stmt_iterator bsi;
> > > +
> > > +       bsi = bsi_for_stmt (stmt);
> > > +
> > > +       name = GIMPLE_STMT_OPERAND (stmt, 0);
> > > +       gcc_assert (TREE_CODE (name) == SSA_NAME);
> > > +
> > > +       next = single_nonlooparound_use (name);
> > > +
> > > +       mark_virtual_ops_for_renaming (stmt);
> > > +       bsi_remove (&bsi, true);
> > > +
> > > +       if (!next
> > > +      || TREE_CODE (next) != GIMPLE_MODIFY_STMT
> > > +      || GIMPLE_STMT_OPERAND (next, 1) != name)
> > > +    return;
> > > +
> > > +       stmt = next;
> > > +     }
> >
> > Ouch.  Why not just store BSIs in the chains?
>
> Since bsi_for_stmt will be a constant-time operation once gimple tupples
> are implemented, I do not think it is worth complicating the patch with
> this.
>
> Zdenek


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]