[PATCH] generalized range_query class for multiple contexts

Mon Sep 28 15:27:20 GMT 2020

On 9/25/20 11:41 AM, Andrew MacLeod wrote:
> On 9/23/20 7:53 PM, Martin Sebor via Gcc-patches wrote:
>> On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote:
>>> As part of the ranger work, we have been trying to clean up and 
>>> generalize interfaces whenever possible. This not only helps in 
>>> reducing the maintenance burden going forward, but provides 
>>> mechanisms for backwards compatibility between ranger and other 
>>> providers/users of ranges throughout the compiler like evrp and VRP.
>>>
>>> One such interface is the range_query class in vr_values.h, which 
>>> provides a range query mechanism for use in the simplify_using_ranges 
>>> module.  With it, simplify_using_ranges can be used with the ranger, 
>>> or the VRP twins by providing a get_value_range() method.  This has 
>>> helped us in comparing apples to apples while doing our work, and has 
>>> also future proofed the interface so that asking for a range can be 
>>> done within the context in which it appeared.  For example, 
>>> get_value_range now takes a gimple statement which provides context. 
>>> We are no longer tied to asking for a global SSA range, but can ask 
>>> for the range of an SSA within a statement. Granted, this 
>>> functionality is currently only in the ranger, but evrp/vrp could be 
>>> adapted to pass such context.
>>>
>>> The range_query is a good first step, but what we really want is a 
>>> generic query mechanism that can ask for SSA ranges within an 
>>> expression, a statement, an edge, or anything else that may come up. 
>>> We think that a generic mechanism can be used not only for range 
>>> producers, but consumers such as the substitute_and_fold_engine (see 
>>> get_value virtual) and possibly the gimple folder (see valueize).
>>>
>>> The attached patchset provides such an interface.  It is meant to be 
>>> a replacement for range_query that can be used for vr_values, 
>>> substitute_and_fold, the subsitute_and_fold_engine, as well as the 
>>> ranger.  The general API is:
>>>
>>> class value_query
>>> {
>>> public:
>>>    // Return the singleton expression for NAME at a gimple statement,
>>>    // or NULL if none found.
>>>    virtual tree value_of_expr (tree name, gimple * = NULL) = 0;
>>>    // Return the singleton expression for NAME at an edge, or NULL if
>>>    // none found.
>>>    virtual tree value_on_edge (edge, tree name);
>>>    // Return the singleton expression for the LHS of a gimple
>>>    // statement, assuming an (optional) initial value of NAME. Returns
>>>    // NULL if none found.
>>>    //
>>>    // Note this method calculates the range the LHS would have *after*
>>>    // the statement has executed.
>>>    virtual tree value_of_stmt (gimple *, tree name = NULL);
>>> };
>>>
>>> class range_query : public value_query
>>> {
>>> public:
>>>    range_query ();
>>>    virtual ~range_query ();
>>>
>>>    virtual tree value_of_expr (tree name, gimple * = NULL) OVERRIDE;
>>>    virtual tree value_on_edge (edge, tree name) OVERRIDE;
>>>    virtual tree value_of_stmt (gimple *, tree name = NULL) OVERRIDE;
>>>
>>>    // These are the range equivalents of the value_* methods. Instead
>>>    // of returning a singleton, they calculate a range and return it in
>>>    // R.  TRUE is returned on success or FALSE if no range was found.
>>>    virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) 
>>> = 0;
>>>    virtual bool range_on_edge (irange &r, edge, tree name);
>>>    virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL);
>>>
>>>    // DEPRECATED: This method is used from vr-values.  The plan is to
>>>    // rewrite all uses of it to the above API.
>>>    virtual const class value_range_equiv *get_value_range (const_tree,
>>>                                gimple * = NULL);
>>> };
>>>
>>> The duality of the API (value_of_* and range_on_*) is because some 
>>> passes are interested in a singleton value 
>>> (substitute_and_fold_enginge), while others are interested in ranges 
>>> (vr_values).  Passes that are only interested in singletons can take 
>>> a value_query, while passes that are interested in full ranges, can 
>>> take a range_query.  Of course, for future proofing, we would 
>>> recommend taking a range_query, since if you provide a default 
>>> range_of_expr, sensible defaults will be provided for the others in 
>>> terms of range_of_expr.
>>>
>>> Note, that the absolute bare minimum that must be provided is a 
>>> value_of_expr and a range_of_expr respectively.
>>>
>>> One piece of the API which is missing is a method  to return the 
>>> range of an arbitrary SSA_NAME *after* a statement.  Currently 
>>> range_of_expr calculates the range of an expression upon entry to the 
>>> statement, whereas range_of_stmt calculates the range of *only* the 
>>> LHS of a statement AFTER the statement has executed.
>>>
>>> This would allow for complete representation of the ranges/values in 
>>> something like:
>>>
>>>      d_4 = *g_7;
>>>
>>> Here the range of g_7 upon entry could be VARYING, but after the 
>>> dereference we know it must be non-zero.  Well for sane targets anyhow.
>>>
>>> Choices would be to:
>>>
>>>    1) add a 4th method such as "range_after_stmt", or
>>>
>>>    2) merge that functionality with the existing range_of_stmt method 
>>> to provide "after" functionality for any ssa_name. Currently the 
>>> SSA_NAME must be the same as the LHS if specified.  It also does not 
>>> need to be specified to allow evaluation of statements without a LHS, 
>>> (if (x < 1) has no LHS, but will return [0,0] [1,1] or [0,1] as there 
>>> is a boolean "result" to the statement.
>>>
>>> Also, we've debated whether to use the two classes (value_query and 
>>> range_query) separately as above, or perhaps join them into one 
>>> class. I personally like the separation of powers.  Some users only 
>>> care for singletons, no sense in polluting their implementations with 
>>> talk of ranges.  But we're open to suggestions.
>>>
>>> The patchset is divided into 3 parts:
>>>
>>> 1. Generic implementation of class.
>>
>> Just a few comments on the new APIs in part 1.
>>
>> Here I would again recommend preventing the copying and assigning
>> the new range_query class.  Ditto for equiv_allocator unless its
>> base takes care of that.  For value_query, since it has a pure
>> virtual function, I would suggest to declare its default ctor
>> protected (and = default).  That way, if someone tries to define
>> an object of it, they'll get a nice compilation error as opposed
>> to a confusing linker error.
>>
>> The range_query default ctor uses the new epression to allocate
>> and construct its equiv_allocator object but the range_query dtor
>> just calls release on the object without invoking the delete
>> expression on it.  Unless the release() function does that (I'd
>> hope not but I have seen it) that seems like a leak.
>>
> 
> looks like an oversight, definitely call delete.
> 
>> Also, defining virtual functions to take default arguments tends
>> to be tricky and is generally discouraged.  Usually the default
>> argument value from the base class is used, unless the overridden
>> virtual function is called directly (either via Derived::foo()
>> or via ptr_to_derived->foo()).  To avoid surprises all overridden
>> virtual functions need to have the same defaults as the base,
>> which is easy to get wrong without compiler warnings.
>>
> 
>> One way to get around this is to define a non-virtual public
>> function with defaults in the base class and have it call
>> a protected virtual without the defaults.
>>
>>
> I'm not overly concerned about this.. If there is a default, its always 
> NULL, and it is simply to bypass having multiple versions of the routine 
> which come with varies degreees of confusion of its own.  I experimented 
> with multiple approaches, including that one.. I found each thing was a 
> little awkward one way or another.
> 
> Given a NULL default is going to be pretty consistent, I'd leave it be 
> for now.   Its primarily for legacy code, and when we nuke those uses 
> down the road, we could consider removing the default, forcing the 
> client to explicitly provide the NULL indicating the know they are 
> looking for a global value, not a contextual one.

I agree that the risk with the null is lower than with other
default values.  I also don't have a sense of how common deriving
from these classes will be so if it's expected to be rare it's
probably not something to be concerned about.

> Since you have replied to this thread, whats your opinion whether there 
> should be an extra API entry point for range/value_after_stmt or whether 
> that should be rolled into  the range_of_stmt routine, and any ssa-name 
> specified would be the it's range after the stmt..   perhaps renaming it 
> to something like "range_post_stmt" and if no ssa-name is specified, it 
> applies to the stmt result.

I prefer API parameterization over distinct names.  It makes APIs
easier for me to reason about, and when implementing them, often
lets me avoid duplicating code between.  A good rule of thumb is
to ask if it might ever make sense to defer the decision to call
one or the other alternative until runtime.  If the answer isn't
an unequivocal "no" then a single entry point with a parameter is
probably a better choice.

By the way, your question about API design makes me think about
something else.  A range is a generalization of a singleton value
(it's a collection of singletons, which is particularly true with
the new multi-range design).  Conversely, a singleton value is
a specialization of a range.  It always makes sense to ask for
the range of an expression, even if it evaluates to a constant.
It only makes sense to ask for its value if it's know to evaluate
to a constant.  So if it's expected to be useful to make
a distinction between the two concepts in the type system I would
think it natural to model the relationship by having value_query
derive from range_query rather than the other way around.

Martin