[PING] [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

Di Zhao OS dizhao@os.amperecomputing.com
Mon Nov 8 03:21:18 GMT 2021


Hi,

Gentle ping on this.

Di Zhao

-----Original Message-----
From: Di Zhao OS 
Sent: Monday, October 25, 2021 3:03 AM
To: Richard Biener <richard.guenther@gmail.com>
Cc: gcc-patches@gcc.gnu.org
Subject: RE: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

Hi,

Attached is a new version of the patch, mainly for improving performance
and simplifying the code.

First, regarding the comments:

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Friday, October 1, 2021 9:00 PM
> To: Di Zhao OS <dizhao@os.amperecomputing.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH v2] tree-optimization/101186 - extend FRE with
> "equivalence map" for condition prediction
> 
> On Thu, Sep 16, 2021 at 8:13 PM Di Zhao OS
> <dizhao@os.amperecomputing.com> wrote:
> >
> > Sorry about updating on this after so long. It took me much time to work out a
> > new plan and pass the tests.
> >
> > The new idea is to use one variable to represent a set of equal variables at
> > some basic-block. This variable is called a "equivalence head" or "equiv-head"
> > in the code. (There's no-longer a "equivalence map".)
> >
> > - Initially an SSA_NAME's "equivalence head" is its value number. Temporary
> >   equivalence heads are recorded as unary NOP_EXPR results in the vn_nary_op_t
> >   map. Besides, when inserting into vn_nary_op_t map, make the new result at
> >   front of the vn_pval list, so that when searching for a variable's
> >   equivalence head, the first result represents the largest equivalence set at
> >   current location.
> > - In vn_ssa_aux_t, maintain a list of references to valid_info->nary entry.
> >   For recorded equivalences, the reference is result->entry; for normal N-ary
> >   operations, the reference is operand->entry.
> > - When recording equivalences, if one side A is constant or has more refs, make
> >   it the new equivalence head of the other side B. Traverse B's ref-list, if a
> >   variable C's previous equiv-head is B, update to A. And re-insert B's n-ary
> >   operations by replacing B with A.
> > - When inserting and looking for the results of n-ary operations, insert and
> >   lookup by the operands' equiv-heads.
> > ...
> >
> > Thanks,
> > Di Zhao
> >
> > --------
> > Extend FRE with temporary equivalences.
> 
> Comments on the patch:
> 
> +  /* nary_ref count.  */
> +  unsigned num_nary_ref;
> +
> 
> I think a unsigned short should be enough and that would nicely
> pack after value_id together with the bitfield (maybe change that
> to unsigned short :1 then).

Changed num_nary_ref to unsigned short and moved after value_id.

> @@ -7307,17 +7839,23 @@ process_bb (rpo_elim &avail, basic_block bb,
>             tree val = gimple_simplify (gimple_cond_code (last),
>                                         boolean_type_node, lhs, rhs,
>                                         NULL, vn_valueize);
> +           vn_nary_op_t vnresult = NULL;
>             /* If the condition didn't simplfy see if we have recorded
>                an expression from sofar taken edges.  */
>             if (! val || TREE_CODE (val) != INTEGER_CST)
>               {
> -               vn_nary_op_t vnresult;
> 
> looks like you don't need vnresult outside of the if()?

vnresult is reused later to record equivalences generated by PHI nodes.

> +/* Find predicated value of vn_nary_op by the operands' equivalences.  Return
> + * NULL_TREE if no known result is found.  */
> +
> +static tree
> +find_predicated_value_by_equivs (vn_nary_op_t vno, basic_block bb,
> +                                vn_nary_op_t *vnresult)
> +{
> +  lookup_equiv_heads (vno->length, vno->op, vno->op, bb);
> +  tree result
> +    = simplify_nary_op (vno->length, vno->opcode, vno->op, vno->type);
> 
> why is it necessary to simplify here?  It looks like the caller
> already does this.

In the new patch, changed the code a little to remove redundant calculation.

> I wonder whether it's valid to always perform find_predicated_value_by_equivs
> from inside vn_nary_op_get_predicated_value instead of bolting it to only
> a single place?

Removed function find_predicated_value_by_equivs and inlined the code.

Because lookup_equiv_head uses vn_nary_op_get_predicated_value, so I left
vn_nary_op_get_predicated_value unchanged. Instead, operands are set to
equiv-heads in init_vn_nary_op_from_stmt. So altogether, predicates are always
inserted and searched by equiv-heads.

> +
> +static vn_nary_op_t
> +val_equiv_insert (tree op1, tree op2, edge e)
> +{
> 
> +  if (is_gimple_min_invariant (lhs))
> +    std::swap (lhs, rhs);
> +  if (is_gimple_min_invariant (lhs) || TREE_CODE (lhs) != SSA_NAME)
> +    /* Possible if invoked from record_equiv_from_previous_cond.  */
> +    return NULL;
> 
> Better formulate all of the above in terms of only SSA_NAME checks since...
> 
> +  /* Make the hand-side with more recorded n-ary expressions new
> +   * equivalence-head, to make fewer re-insertions.  */
> +  if (TREE_CODE (rhs) == SSA_NAME
> +      && VN_INFO (rhs)->num_nary_ref < VN_INFO (lhs)->num_nary_ref)
> +    std::swap (lhs, rhs);
> 
> here LHS needs to be an SSA_NAME.

Tried to fix this in the new patch.

> +  /* Record equivalence as unary NOP_EXPR.  */
> +  vn_nary_op_t val
> +    = vn_nary_op_insert_pieces_predicated_1 (1, NOP_EXPR, TREE_TYPE (lhs),
> +                                            &lhs, rhs, 0, bb);
> 
> so you are recording an equivalence of lhs to rhs as (NOP_EXPR)lhs
> with value rhs?
> Presumably you think that's good enough to not overlap with entries
> for "real" IL?

By checking related codes, it seems to me there's no overlap as long as
equivalences are inserted and searched as predicated_values. It should be easy
to replace this with other preferred opcode.

> In particular why did you choose to _not_ use (2, EQ_EXPR, boolean_type_node,
> &[lhs, rhs], boolean_true_node, ...) like I think what would be
> inserted for by the
> existing code in process_bb via vn_nary_op_insert_pieces_predicated?
> It would probably felt more natural if the "new" code bolted on the case where
> that is called with an equivalence?

The "equivalence head"s actually implement Disjoint-sets. So by a hashtable
search in the nary-map, the set of equivalences of a variable can be found.
Also it's easy to find out whether two variables are equal. And the 
disjoint-sets are path-compressed by updating "equivalence head" and
re-inserting predicates.

In the new attached patch, predicates like "A==B is true" and "A!=B is false"
are not inserted, as they can be computed from equivalences. Same with the
predicates derived from them. This reduces insertions of predicates by a lot (
counting the equivalences themselves and re-inserted predicates). Below are
test results on SPEC intrate, comparing with trunk (f45610a4, Oct 19):

              |more bb |more bb |more stmt|more stmt|less      |less
              |removed |removed |removed  |removed  |predicates|predicates
              |at fre1 |at fre1 |at fre1  |at fre1  |inserted  |inserted
--------------------------------------------------------------------------
500.perlbench_r| 64    | 1.56%  | 102    | 0.18%   | 51928    | 53.44%
502.gcc_r      | 679   | 4.88%  | 290    | 0.21%   | 149968   | 65.5%
505.mcf_r      | 5     | 35.71% | 9      | 1.40%   | 349      | 27.48%
520.omnetpp    | 132   | 5.36%  | 48     | 0.14%   | 28192    | 58.38%
523.xalancbmk_r| 231   | 3.15%  | 308    | 0.35%   | 65131    | 58.95%
525.x264_r     | 4     | 1.36%  | 27     | 0.11%   | 10401    | 40.68%
531.deepsjeng_r| 1     | 3.45%  | 2      | 0.14%   | 1412     | 53.81%
541.leela_r    | 2     | 0.62%  | 5      | 0.06%   | 3874     | 48.56%
548.exchange2_r| 0     | 0.00%  | 3      | 0.04%   | 159      | 3.33%
557.xz_r       | 0     | 0.00%  | 3      | 0.07%   | 1894     | 52.39%

Considering the copying and unwinding work saved, this patch could reduce
compile time rather than increase it.

> Btw, I didn't find any condition that prevents equivalences being used for
> non-integer types?  Are you sure this is all valid for FP compares without
> further restrictions?

I think this approach follows sccvn's previous decisions on FP comparisons,
for equivalences are inserted only when EQ_EXPRs were evaluated to be true.
The patch is tested on regression tests and SPEC fprate cases.

> 
> @@ -4224,36 +4404,345 @@ vn_nary_op_insert_pieces_predicated (unsigned
> int length, enum tree_code code,
>        fprintf (dump_file, " == %s\n",
>                integer_zerop (result) ? "false" : "true");
>      }
> -  vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id);
> -  init_vn_nary_op_from_pieces (vno1, length, code, type, ops);
> -  vno1->predicated_values = 1;
> -  vno1->u.values = (vn_pval *) obstack_alloc (&vn_tables_obstack,
> -                                             sizeof (vn_pval));
> -  vno1->u.values->next = NULL;
> -  vno1->u.values->result = result;
> -  vno1->u.values->n = 1;
> -  vno1->u.values->valid_dominated_by_p[0] = pred_e->dest->index;
> -  return vn_nary_op_insert_into (vno1, valid_info->nary, true);
> +  /* Insert by operands' equiv-heads.  */
> +  tree new_ops[3];
> +  lookup_equiv_heads (length, ops, new_ops, pred_e->dest);
> +  /* Skip if the result is known.  */
> +  tree simplified = simplify_nary_op (length, code, new_ops, type);
> +  if (simplified)
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       {
> +         fprintf (dump_file, "Result is known: ");
> +         print_generic_expr (dump_file, simplified, TDF_NONE);
> +         fprintf (dump_file, ", skipped.\n");
> +       }
> +      return NULL;
> +    }
> 
> soo - how do the existing predicated expressions change to be handled here?
> Why would they ever be simplified?  Should this simplification not better
> happen
> upthread when visiting the conditional?  Maybe that needs to use the equivs
> already?

This happens when updating variable A's equiv_head from B to C, and 
re-inserting former predicates of B with C. For example, "x < 0" is recorded,
then x is assigned with new equiv_head -1 at some location, then -1 < 0 can
simplify. It shouldn't happen when inserting new predicates.

> 
> You re-use the predicated value list which I think is great, can you explain
> how "latest" is meant in the following?
> 
> @@ -4145,7 +4264,12 @@ vn_nary_op_insert_into (vn_nary_op_t vno,
> vn_nary_op_table_type *table,
>               next = &(*next)->next;
>             }
>           if (!found)
> -           *next = nval;
> +           {
> +             /* Insert new value at the front, so that lookup_equiv_head can
> +              * find the latest result.  */
> +             nval->next = vno->u.values;
> +             vno->u.values = nval;
> +           }

It meant the "most general" result. (I picked the wrong word. The description
is changed in the new patch file. )

There can be multiple valid results at the same time. As we traverse in RPO and
update equiv-heads (i.e. unite disjoint-sets), the last inserted result
represents the most complete set of equivalences of the variable.

(In the last patch version, when "Appending predicate to value", the new result
is not moved to the front. That violates the property of equiv-heads and some
opportunities will be lost. Fixed this in the new patch file.)

> 
> OK, I have to stop now but will try to understand and review further next week.
> Sorry for the delays but the patch is quite complex to understand :/
> 
> Richard.
> 
> > 2021-09-16  Di Zhao  <dizhao@os.amperecomputing.com>
> >

Other changes:
- Fixed some errs in comments.
- On recording temporary equivalences generated by PHI nodes, altered the
  logic, hope it can be a bit more clear.
  1. When a conditional expression's true/false edge E is assessed to be 
     executable, search for conflicting predicates for E to be taken. If a
     conflicting predicate is found at L, and L's single predecessor P 
     dominates current BB, then we know P->L cannot be taken if E is taken.
  2. Process phi_bb (immediately dominated by P, and dominates BB), rule out
     PHI arguments (x2) that can only come from P->L. If there's a single 
     PHI argument (x1) left, record the equivalence of x1 and PHI result x
     at edge E.
          -----
          | P | 
          -----
           |  \
           |   -----
           |   | L |  <== conflicting predicate's location
           |   -----
           |    /
  -------------------------
  | x = PHI<x1(P), x2(L)> | phi_bb
  -------------------------
             |
            ....
             |
           -----
           | BB | <== current bb
           -----
          /     \ edge E
        ...     -------
                | dest |
                -------

- Fixed a correctness problem when recording equivalences from PHI in previous
  version. Also test case ssa-fre-95.c:k in previous version was incorrect,
  fixed that.

Thanks,
Di Zhao

---
Extend FRE with temporary equivalences.

2021-10-24  Di Zhao  <dizhao@os.amperecomputing.com>

gcc/ChangeLog:
        PR tree-optimization/101186
        * tree-ssa-sccvn.c (VN_INFO): remove assertions (there could be a
        predicate already).
        (dominated_by_p_w_unex): Moved upward.
        (vn_nary_op_get_predicated_value): Moved upward.
        (is_vn_valid_at_bb): Check if vn_pval is valid at BB.
        (lookup_equiv_head): Lookup the "equivalence head" of given node.
        (lookup_equiv_heads): Lookup the "equivalence head"s of given nodes.
        (vn_tracking_edge): Extracted utility function.
        (init_vn_nary_op_from_stmt): Insert and lookup by "equivalence head"s.
        (vn_nary_op_insert_into): Insert new value at the front.
        (vn_nary_op_insert_pieces_predicated_1): Insert as predicated values
        from pieces.
        (fold_const_from_equiv_heads): Fold N-ary expression of equiv-heads.
        (push_new_nary_ref): Insert a back-reference to vn_nary_op_t.
        (val_equiv_insert): Record temporary equivalence.
        (vn_nary_op_insert_pieces_predicated): Record equivalences instead of
        some predicates; insert back-refs.
        (record_equiv_from_prev_phi_1): Record temporary equivalences generated
        by PHI nodes.
        (record_equiv_from_prev_phi): Given an outgoing edge of a conditional
        expression taken, record equivalences generated by PHI nodes.
        (visit_nary_op): Add lookup previous results of N-ary operations by
        equivalences.
        (insert_related_predicates_on_edge): Some predicates can be computed
        from equivalences, no need to insert them.
        (process_bb): Add lookup predicated values by equivalenc~/tmp/time-frees.
        (struct unwind_state): Unwind state of back-refs to vn_nary_op_t.
        (do_unwind): Unwind the back-refs to vn_nary_op_t.
        (do_rpo_vn): Update back-reference unwind state.
        * tree-ssa-sccvn.h (struct nary_ref): hold a lists of references to the
        nary map entries.

gcc/testsuite/ChangeLog:

        * gcc.dg/tree-ssa/pr68619-2.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-1.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-2.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-3.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-5.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-7.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-8.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-9.c: Disable fre.
        * gcc.dg/tree-ssa/vrp03.c: Disable fre.
        * gcc.dg/tree-ssa/ssa-fre-100.c: New test.
        * gcc.dg/tree-ssa/ssa-fre-101.c: New test.
        * gcc.dg/tree-ssa/ssa-fre-102.c: New test.
        * gcc.dg/tree-ssa/ssa-pre-34.c: New test.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: v3-tree-optimization-101186.patch
Type: application/octet-stream
Size: 37568 bytes
Desc: v3-tree-optimization-101186.patch
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20211108/d340ae18/attachment-0001.obj>


More information about the Gcc-patches mailing list