[PATCH] generalized range_query class for multiple contexts

Martin Sebor msebor@gmail.com
Thu Sep 24 21:51:47 GMT 2020


On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote:
> As part of the ranger work, we have been trying to clean up and 
> generalize interfaces whenever possible.  This not only helps in 
> reducing the maintenance burden going forward, but provides mechanisms 
> for backwards compatibility between ranger and other providers/users of 
> ranges throughout the compiler like evrp and VRP.
> 
> One such interface is the range_query class in vr_values.h, which 
> provides a range query mechanism for use in the simplify_using_ranges 
> module.  With it, simplify_using_ranges can be used with the ranger, or 
> the VRP twins by providing a get_value_range() method.  This has helped 
> us in comparing apples to apples while doing our work, and has also 
> future proofed the interface so that asking for a range can be done 
> within the context in which it appeared.  For example, get_value_range 
> now takes a gimple statement which provides context.  We are no longer 
> tied to asking for a global SSA range, but can ask for the range of an 
> SSA within a statement.  Granted, this functionality is currently only 
> in the ranger, but evrp/vrp could be adapted to pass such context.
> 
> The range_query is a good first step, but what we really want is a 
> generic query mechanism that can ask for SSA ranges within an 
> expression, a statement, an edge, or anything else that may come up.  We 
> think that a generic mechanism can be used not only for range producers, 
> but consumers such as the substitute_and_fold_engine (see get_value 
> virtual) and possibly the gimple folder (see valueize).
> 
> The attached patchset provides such an interface.  It is meant to be a 
> replacement for range_query that can be used for vr_values, 
> substitute_and_fold, the subsitute_and_fold_engine, as well as the 
> ranger.  The general API is:
> 
> class value_query
> {
> public:
>    // Return the singleton expression for NAME at a gimple statement,
>    // or NULL if none found.
>    virtual tree value_of_expr (tree name, gimple * = NULL) = 0;
>    // Return the singleton expression for NAME at an edge, or NULL if
>    // none found.
>    virtual tree value_on_edge (edge, tree name);
>    // Return the singleton expression for the LHS of a gimple
>    // statement, assuming an (optional) initial value of NAME.  Returns
>    // NULL if none found.
>    //
>    // Note this method calculates the range the LHS would have *after*
>    // the statement has executed.
>    virtual tree value_of_stmt (gimple *, tree name = NULL);
> };
> 
> class range_query : public value_query
> {
> public:
>    range_query ();
>    virtual ~range_query ();
> 
>    virtual tree value_of_expr (tree name, gimple * = NULL) OVERRIDE;
>    virtual tree value_on_edge (edge, tree name) OVERRIDE;
>    virtual tree value_of_stmt (gimple *, tree name = NULL) OVERRIDE;
> 
>    // These are the range equivalents of the value_* methods.  Instead
>    // of returning a singleton, they calculate a range and return it in
>    // R.  TRUE is returned on success or FALSE if no range was found.
>    virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) = 0;
>    virtual bool range_on_edge (irange &r, edge, tree name);
>    virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL);
> 
>    // DEPRECATED: This method is used from vr-values.  The plan is to
>    // rewrite all uses of it to the above API.
>    virtual const class value_range_equiv *get_value_range (const_tree,
>                                gimple * = NULL);
> };
> 
> The duality of the API (value_of_* and range_on_*) is because some 
> passes are interested in a singleton value 
> (substitute_and_fold_enginge), while others are interested in ranges 
> (vr_values).  Passes that are only interested in singletons can take a 
> value_query, while passes that are interested in full ranges, can take a 
> range_query.  Of course, for future proofing, we would recommend taking 
> a range_query, since if you provide a default range_of_expr, sensible 
> defaults will be provided for the others in terms of range_of_expr.
> 
> Note, that the absolute bare minimum that must be provided is a 
> value_of_expr and a range_of_expr respectively.
> 
> One piece of the API which is missing is a method  to return the range 
> of an arbitrary SSA_NAME *after* a statement.  Currently range_of_expr 
> calculates the range of an expression upon entry to the statement, 
> whereas range_of_stmt calculates the range of *only* the LHS of a 
> statement AFTER the statement has executed.
> 
> This would allow for complete representation of the ranges/values in 
> something like:
> 
>      d_4 = *g_7;
> 
> Here the range of g_7 upon entry could be VARYING, but after the 
> dereference we know it must be non-zero.  Well for sane targets anyhow.
> 
> Choices would be to:
> 
>    1) add a 4th method such as "range_after_stmt", or
> 
>    2) merge that functionality with the existing range_of_stmt method to 
> provide "after" functionality for any ssa_name.  Currently the SSA_NAME 
> must be the same as the LHS if specified.  It also does not need to be 
> specified to allow evaluation of statements without a LHS, (if (x < 1) 
> has no LHS, but will return [0,0] [1,1] or [0,1] as there is a boolean 
> "result" to the statement.
> 
> Also, we've debated whether to use the two classes (value_query and 
> range_query) separately as above, or perhaps join them into one class. I 
> personally like the separation of powers.  Some users only care for 
> singletons, no sense in polluting their implementations with talk of 
> ranges.  But we're open to suggestions.
> 
> The patchset is divided into 3 parts:
> 
> 1. Generic implementation of class.
> 
> 2. Conversion of vr_values and substitute_and_fold_engine to class.
> 
> The conversion of the substitute_and_fold_engine class involved changing 
> all calls to get_value() to a corresponding method providing context. 
> For example, when the edge or statement is available, we pass these to 
> the appropriate method.
> 
> With this, we've been able to plug in the ranger into our hybrid evrp, 
> which can then get better ranges.  Our hybrid evrp has moved a lot of 
> optimizations that were happening much later into the early vrp stage.
> 
> 3. Conversion of sprintf/strlen pass to class.
> 
> This is a nonfunctional change to the sprintf/strlen passes.  That is, 
> no effort was made to change the passes to multi-ranges.  However, with 
> this patch, we are able to plug in a ranger or evrp with just a few 
> lines, since the range query mechanism share a common API.

Thanks for doing all this!  There isn't anything I don't understand
in the sprintf changes so no questions from me (well, almost none).
Just some comments:

The current call statement is available in all functions that take
a directive argument, as dir->info.callstmt.  There should be no need
to also add it as a new argument to the functions that now need it.

The change adds code along these lines in a bunch of places:

+	  value_range vr;
+	  if (!query->range_of_expr (vr, arg, stmt))
+	    vr.set_varying (TREE_TYPE (arg));

I thought under the new Ranger APIs when a range couldn't be
determined it would be automatically set to the maximum for
the type.  I like that and have been moving in that direction
with my code myself (rather than having an API fail, have it
set the max range and succeed).

Since that isn't so in this case, I think it would still be nice
if the added code could be written as if the range were set to
varying in this case and (ideally) reduced to just initialization:

   value_range vr = some-function (query, stmt, arg);

some-function could be an inline helper defined just for the sprintf
pass (and maybe also strlen which also seems to use the same pattern),
or it could be a value_range AKA irange ctor, or it could be a member
of range_query, whatever is the most appropriate.

(If assigning/copying a value_range is thought to be too expensive,
declaring it first and then passing it to that helper to set it
would work too).

In strlen, is the removed comment no longer relevant?  (I.e., does
the ranger solve the problem?)

-      /* The range below may be "inaccurate" if a constant has been
-	 substituted earlier for VAL by this pass that hasn't been
-	 propagated through the CFG.  This shoud be fixed by the new
-	 on-demand VRP if/when it becomes available (hopefully in
-	 GCC 11).  */

I'm wondering about the comment added to get_range_strlen_dynamic
and other places:

+	  // FIXME: Use range_query instead of global ranges.

Is that something you're planning to do in a followup or should
I remember to do it at some point?

Otherwise I have no concern with the changes.

Thanks
Martin

> 
> ---------------------
> 
> We think this interface is generic enough to be used in any place that 
> queries ranges or singletons, basically any place we have *valueize* in 
> the compiler.  It can also be used to replace range generators at will, 
> or even combine them to get the best of both worlds.  For example, 
> places that currently have a value_* interface because they work with 
> constants, like CCP, could have an alternate valueizer plugged in and if 
> a range folds to a constant, CCP could then propagate that constant.
> 
> Tested on x86-64 Linux.
> 
> Aldy



More information about the Gcc-patches mailing list