[PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

Wed Nov 10 09:43:52 GMT 2021

On Sun, Oct 24, 2021 at 9:03 PM Di Zhao OS
<dizhao@os.amperecomputing.com> wrote:
>
> Hi,
>
> Attached is a new version of the patch, mainly for improving performance
> and simplifying the code.

The patch doesn't apply anymore, can you update it please?

I see the new ssa-fre-101.c test already passing without the patch.
Likewise ssa-fre-100.c and ssa-fre-102.c would PASS if you scan
the pass dump after fre1 which is evrp so it seems that evrp already
handles the equivalences (likely with the relation oracle) now?
I'm sure there are second order effects when eliminating conditions
in FRE but did you re-evaluate what made you improve VN to see
if the cases are handled as expected now without this change?

I will still look at and consider the change btw, but given the EVRP
improvements I'm also considering to remove the predication
support from VN alltogether.  At least in the non-iterating mode
it should be trivially easy to use rangers relation oracle to simplify
predicates.  For the iterating mode it might not be 100% effective
since I'm not sure we can make it use the current SSA values and
how it would behave with those eventually changing to worse.

Andrew, how would one ask the relation oracle to simplify a
condition?  Do I have to do any bookkeeping to register
predicates on edges for it?

Thanks,
Richard.

> First, regarding the comments:
>
> > -----Original Message-----
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: Friday, October 1, 2021 9:00 PM
> > To: Di Zhao OS <dizhao@os.amperecomputing.com>
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH v2] tree-optimization/101186 - extend FRE with
> > "equivalence map" for condition prediction
> >
> > On Thu, Sep 16, 2021 at 8:13 PM Di Zhao OS
> > <dizhao@os.amperecomputing.com> wrote:
> > >
> > > Sorry about updating on this after so long. It took me much time to work out a
> > > new plan and pass the tests.
> > >
> > > The new idea is to use one variable to represent a set of equal variables at
> > > some basic-block. This variable is called a "equivalence head" or "equiv-head"
> > > in the code. (There's no-longer a "equivalence map".)
> > >
> > > - Initially an SSA_NAME's "equivalence head" is its value number. Temporary
> > >   equivalence heads are recorded as unary NOP_EXPR results in the vn_nary_op_t
> > >   map. Besides, when inserting into vn_nary_op_t map, make the new result at
> > >   front of the vn_pval list, so that when searching for a variable's
> > >   equivalence head, the first result represents the largest equivalence set at
> > >   current location.
> > > - In vn_ssa_aux_t, maintain a list of references to valid_info->nary entry.
> > >   For recorded equivalences, the reference is result->entry; for normal N-ary
> > >   operations, the reference is operand->entry.
> > > - When recording equivalences, if one side A is constant or has more refs, make
> > >   it the new equivalence head of the other side B. Traverse B's ref-list, if a
> > >   variable C's previous equiv-head is B, update to A. And re-insert B's n-ary
> > >   operations by replacing B with A.
> > > - When inserting and looking for the results of n-ary operations, insert and
> > >   lookup by the operands' equiv-heads.
> > > ...
> > >
> > > Thanks,
> > > Di Zhao
> > >
> > > --------
> > > Extend FRE with temporary equivalences.
> >
> > Comments on the patch:
> >
> > +  /* nary_ref count.  */
> > +  unsigned num_nary_ref;
> > +
> >
> > I think a unsigned short should be enough and that would nicely
> > pack after value_id together with the bitfield (maybe change that
> > to unsigned short :1 then).
>
> Changed num_nary_ref to unsigned short and moved after value_id.
>
> > @@ -7307,17 +7839,23 @@ process_bb (rpo_elim &avail, basic_block bb,
> >             tree val = gimple_simplify (gimple_cond_code (last),
> >                                         boolean_type_node, lhs, rhs,
> >                                         NULL, vn_valueize);
> > +           vn_nary_op_t vnresult = NULL;
> >             /* If the condition didn't simplfy see if we have recorded
> >                an expression from sofar taken edges.  */
> >             if (! val || TREE_CODE (val) != INTEGER_CST)
> >               {
> > -               vn_nary_op_t vnresult;
> >
> > looks like you don't need vnresult outside of the if()?
>
> vnresult is reused later to record equivalences generated by PHI nodes.
>
> > +/* Find predicated value of vn_nary_op by the operands' equivalences.  Return
> > + * NULL_TREE if no known result is found.  */
> > +
> > +static tree
> > +find_predicated_value_by_equivs (vn_nary_op_t vno, basic_block bb,
> > +                                vn_nary_op_t *vnresult)
> > +{
> > +  lookup_equiv_heads (vno->length, vno->op, vno->op, bb);
> > +  tree result
> > +    = simplify_nary_op (vno->length, vno->opcode, vno->op, vno->type);
> >
> > why is it necessary to simplify here?  It looks like the caller
> > already does this.
>
> In the new patch, changed the code a little to remove redundant calculation.
>
> > I wonder whether it's valid to always perform find_predicated_value_by_equivs
> > from inside vn_nary_op_get_predicated_value instead of bolting it to only
> > a single place?
>
> Removed function find_predicated_value_by_equivs and inlined the code.
>
> Because lookup_equiv_head uses vn_nary_op_get_predicated_value, so I left
> vn_nary_op_get_predicated_value unchanged. Instead, operands are set to
> equiv-heads in init_vn_nary_op_from_stmt. So altogether, predicates are always
> inserted and searched by equiv-heads.
>
> > +
> > +static vn_nary_op_t
> > +val_equiv_insert (tree op1, tree op2, edge e)
> > +{
> >
> > +  if (is_gimple_min_invariant (lhs))
> > +    std::swap (lhs, rhs);
> > +  if (is_gimple_min_invariant (lhs) || TREE_CODE (lhs) != SSA_NAME)
> > +    /* Possible if invoked from record_equiv_from_previous_cond.  */
> > +    return NULL;
> >
> > Better formulate all of the above in terms of only SSA_NAME checks since...
> >
> > +  /* Make the hand-side with more recorded n-ary expressions new
> > +   * equivalence-head, to make fewer re-insertions.  */
> > +  if (TREE_CODE (rhs) == SSA_NAME
> > +      && VN_INFO (rhs)->num_nary_ref < VN_INFO (lhs)->num_nary_ref)
> > +    std::swap (lhs, rhs);
> >
> > here LHS needs to be an SSA_NAME.
>
> Tried to fix this in the new patch.
>
> > +  /* Record equivalence as unary NOP_EXPR.  */
> > +  vn_nary_op_t val
> > +    = vn_nary_op_insert_pieces_predicated_1 (1, NOP_EXPR, TREE_TYPE (lhs),
> > +                                            &lhs, rhs, 0, bb);
> >
> > so you are recording an equivalence of lhs to rhs as (NOP_EXPR)lhs
> > with value rhs?
> > Presumably you think that's good enough to not overlap with entries
> > for "real" IL?
>
> By checking related codes, it seems to me there's no overlap as long as
> equivalences are inserted and searched as predicated_values. It should be easy
> to replace this with other preferred opcode.
>
> > In particular why did you choose to _not_ use (2, EQ_EXPR, boolean_type_node,
> > &[lhs, rhs], boolean_true_node, ...) like I think what would be
> > inserted for by the
> > existing code in process_bb via vn_nary_op_insert_pieces_predicated?
> > It would probably felt more natural if the "new" code bolted on the case where
> > that is called with an equivalence?
>
> The "equivalence head"s actually implement Disjoint-sets. So by a hashtable
> search in the nary-map, the set of equivalences of a variable can be found.
> Also it's easy to find out whether two variables are equal. And the
> disjoint-sets are path-compressed by updating "equivalence head" and
> re-inserting predicates.
>
> In the new attached patch, predicates like "A==B is true" and "A!=B is false"
> are not inserted, as they can be computed from equivalences. Same with the
> predicates derived from them. This reduces insertions of predicates by a lot (
> counting the equivalences themselves and re-inserted predicates). Below are
> test results on SPEC intrate, comparing with trunk (f45610a4, Oct 19):
>
>               |more bb |more bb |more stmt|more stmt|less      |less
>               |removed |removed |removed  |removed  |predicates|predicates
>               |at fre1 |at fre1 |at fre1  |at fre1  |inserted  |inserted
> --------------------------------------------------------------------------
> 500.perlbench_r| 64    | 1.56%  | 102    | 0.18%   | 51928    | 53.44%
> 502.gcc_r      | 679   | 4.88%  | 290    | 0.21%   | 149968   | 65.5%
> 505.mcf_r      | 5     | 35.71% | 9      | 1.40%   | 349      | 27.48%
> 520.omnetpp    | 132   | 5.36%  | 48     | 0.14%   | 28192    | 58.38%
> 523.xalancbmk_r| 231   | 3.15%  | 308    | 0.35%   | 65131    | 58.95%
> 525.x264_r     | 4     | 1.36%  | 27     | 0.11%   | 10401    | 40.68%
> 531.deepsjeng_r| 1     | 3.45%  | 2      | 0.14%   | 1412     | 53.81%
> 541.leela_r    | 2     | 0.62%  | 5      | 0.06%   | 3874     | 48.56%
> 548.exchange2_r| 0     | 0.00%  | 3      | 0.04%   | 159      | 3.33%
> 557.xz_r       | 0     | 0.00%  | 3      | 0.07%   | 1894     | 52.39%
>
> Considering the copying and unwinding work saved, this patch could reduce
> compile time rather than increase it.
>
> > Btw, I didn't find any condition that prevents equivalences being used for
> > non-integer types?  Are you sure this is all valid for FP compares without
> > further restrictions?
>
> I think this approach follows sccvn's previous decisions on FP comparisons,
> for equivalences are inserted only when EQ_EXPRs were evaluated to be true.
> The patch is tested on regression tests and SPEC fprate cases.
>
> >
> > @@ -4224,36 +4404,345 @@ vn_nary_op_insert_pieces_predicated (unsigned
> > int length, enum tree_code code,
> >        fprintf (dump_file, " == %s\n",
> >                integer_zerop (result) ? "false" : "true");
> >      }
> > -  vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id);
> > -  init_vn_nary_op_from_pieces (vno1, length, code, type, ops);
> > -  vno1->predicated_values = 1;
> > -  vno1->u.values = (vn_pval *) obstack_alloc (&vn_tables_obstack,
> > -                                             sizeof (vn_pval));
> > -  vno1->u.values->next = NULL;
> > -  vno1->u.values->result = result;
> > -  vno1->u.values->n = 1;
> > -  vno1->u.values->valid_dominated_by_p[0] = pred_e->dest->index;
> > -  return vn_nary_op_insert_into (vno1, valid_info->nary, true);
> > +  /* Insert by operands' equiv-heads.  */
> > +  tree new_ops[3];
> > +  lookup_equiv_heads (length, ops, new_ops, pred_e->dest);
> > +  /* Skip if the result is known.  */
> > +  tree simplified = simplify_nary_op (length, code, new_ops, type);
> > +  if (simplified)
> > +    {
> > +      if (dump_file && (dump_flags & TDF_DETAILS))
> > +       {
> > +         fprintf (dump_file, "Result is known: ");
> > +         print_generic_expr (dump_file, simplified, TDF_NONE);
> > +         fprintf (dump_file, ", skipped.\n");
> > +       }
> > +      return NULL;
> > +    }
> >
> > soo - how do the existing predicated expressions change to be handled here?
> > Why would they ever be simplified?  Should this simplification not better
> > happen
> > upthread when visiting the conditional?  Maybe that needs to use the equivs
> > already?
>
> This happens when updating variable A's equiv_head from B to C, and
> re-inserting former predicates of B with C. For example, "x < 0" is recorded,
> then x is assigned with new equiv_head -1 at some location, then -1 < 0 can
> simplify. It shouldn't happen when inserting new predicates.
>
> >
> > You re-use the predicated value list which I think is great, can you explain
> > how "latest" is meant in the following?
> >
> > @@ -4145,7 +4264,12 @@ vn_nary_op_insert_into (vn_nary_op_t vno,
> > vn_nary_op_table_type *table,
> >               next = &(*next)->next;
> >             }
> >           if (!found)
> > -           *next = nval;
> > +           {
> > +             /* Insert new value at the front, so that lookup_equiv_head can
> > +              * find the latest result.  */
> > +             nval->next = vno->u.values;
> > +             vno->u.values = nval;
> > +           }
>
> It meant the "most general" result. (I picked the wrong word. The description
> is changed in the new patch file. )
>
> There can be multiple valid results at the same time. As we traverse in RPO and
> update equiv-heads (i.e. unite disjoint-sets), the last inserted result
> represents the most complete set of equivalences of the variable.
>
> (In the last patch version, when "Appending predicate to value", the new result
> is not moved to the front. That violates the property of equiv-heads and some
> opportunities will be lost. Fixed this in the new patch file.)
>
> >
> > OK, I have to stop now but will try to understand and review further next week.
> > Sorry for the delays but the patch is quite complex to understand :/
> >
> > Richard.
> >
> > > 2021-09-16  Di Zhao  <dizhao@os.amperecomputing.com>
> > >
>
> Other changes:
> - Fixed some errs in comments.
> - On recording temporary equivalences generated by PHI nodes, altered the
>   logic, hope it can be a bit more clear.
>   1. When a conditional expression's true/false edge E is assessed to be
>      executable, search for conflicting predicates for E to be taken. If a
>      conflicting predicate is found at L, and L's single predecessor P
>      dominates current BB, then we know P->L cannot be taken if E is taken.
>   2. Process phi_bb (immediately dominated by P, and dominates BB), rule out
>      PHI arguments (x2) that can only come from P->L. If there's a single
>      PHI argument (x1) left, record the equivalence of x1 and PHI result x
>      at edge E.
>           -----
>           | P |
>           -----
>            |  \
>            |   -----
>            |   | L |  <== conflicting predicate's location
>            |   -----
>            |    /
>   -------------------------
>   | x = PHI<x1(P), x2(L)> | phi_bb
>   -------------------------
>              |
>             ....
>              |
>            -----
>            | BB | <== current bb
>            -----
>           /     \ edge E
>         ...     -------
>                 | dest |
>                 -------
>
> - Fixed a correctness problem when recording equivalences from PHI in previous
>   version. Also test case ssa-fre-95.c:k in previous version was incorrect,
>   fixed that.
>
> Thanks,
> Di Zhao
>
> ---
> Extend FRE with temporary equivalences.
>
> 2021-10-24  Di Zhao  <dizhao@os.amperecomputing.com>
>
> gcc/ChangeLog:
>         PR tree-optimization/101186
>         * tree-ssa-sccvn.c (VN_INFO): remove assertions (there could be a
>         predicate already).
>         (dominated_by_p_w_unex): Moved upward.
>         (vn_nary_op_get_predicated_value): Moved upward.
>         (is_vn_valid_at_bb): Check if vn_pval is valid at BB.
>         (lookup_equiv_head): Lookup the "equivalence head" of given node.
>         (lookup_equiv_heads): Lookup the "equivalence head"s of given nodes.
>         (vn_tracking_edge): Extracted utility function.
>         (init_vn_nary_op_from_stmt): Insert and lookup by "equivalence head"s.
>         (vn_nary_op_insert_into): Insert new value at the front.
>         (vn_nary_op_insert_pieces_predicated_1): Insert as predicated values
>         from pieces.
>         (fold_const_from_equiv_heads): Fold N-ary expression of equiv-heads.
>         (push_new_nary_ref): Insert a back-reference to vn_nary_op_t.
>         (val_equiv_insert): Record temporary equivalence.
>         (vn_nary_op_insert_pieces_predicated): Record equivalences instead of
>         some predicates; insert back-refs.
>         (record_equiv_from_prev_phi_1): Record temporary equivalences generated
>         by PHI nodes.
>         (record_equiv_from_prev_phi): Given an outgoing edge of a conditional
>         expression taken, record equivalences generated by PHI nodes.
>         (visit_nary_op): Add lookup previous results of N-ary operations by
>         equivalences.
>         (insert_related_predicates_on_edge): Some predicates can be computed
>         from equivalences, no need to insert them.
>         (process_bb): Add lookup predicated values by equivalenc~/tmp/time-frees.
>         (struct unwind_state): Unwind state of back-refs to vn_nary_op_t.
>         (do_unwind): Unwind the back-refs to vn_nary_op_t.
>         (do_rpo_vn): Update back-reference unwind state.
>         * tree-ssa-sccvn.h (struct nary_ref): hold a lists of references to the
>         nary map entries.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/pr68619-2.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-1.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-2.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-3.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-5.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-7.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-8.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-9.c: Disable fre.
>         * gcc.dg/tree-ssa/vrp03.c: Disable fre.
>         * gcc.dg/tree-ssa/ssa-fre-100.c: New test.
>         * gcc.dg/tree-ssa/ssa-fre-101.c: New test.
>         * gcc.dg/tree-ssa/ssa-fre-102.c: New test.
>         * gcc.dg/tree-ssa/ssa-pre-34.c: New test.
>