irange-best-practices

Introduction

irange is a class for storing and manipulating multi-ranges of integers. It is meant to be a replacement for value_range's, and can currently inter-operate seamlessly with them.

Original value_range's can contain a range of integers, say from 10 to 20 inclusive, represented as [10, 20]. It also has a way of representing an anti-range, which is the inverse of a range. For example, everything except 10 to 20 is represented as an anti-range of ~[10, 20]. This is really, shorthand for the union of [MIN_INT, 9] U [21, MAX_INT].

The value_range representation has the limitation that higher granularity is not representable without losing precision. For example, you cannot specify the range of [10, 15] U [20, 30], instead it is represented ambiguously with with a range of [10, 30].

On the other hand, multi-ranges with the irange class, can represent an arbitrary number of sub-ranges. More formally, multi-ranges have 0 or more non-intersecting sub-ranges with integral bounds. For example, you can specify a range containing the numbers of [10, 15] and [20, 30] with an irange of [10, 15][20, 30]. With irange, instead of using anti-ranges for ~[10, 20] the underlying number of ranges can be represented accurately with [MIN_INT, 9][21, MAX_INT].

Multi-ranges are not limited to 1 or 2 sub-ranges. Instead, you can specify any number of sub-ranges. For example:

int_range<5> five_pairs;
int_range<2> two_pairs;
int_range<1> legacy_value_range;
int_range_max huge_irange;  // currently 255 sub-ranges.

The special case of int_range<1> provides legacy support for value_range. Currently, value_range is just a typedef for int_range<1>.

The specially named int_range_max is used for "unlimited" sub-ranges, and is meant to be used when calculating intermediate results. Currently it is a large number (255), but could be changed without prior notice. Note that "widest" does not have anything to do with the range of the underlying integers, but the maximum amount of sub-range pairs available for calculation.

Here are some examples of calculations with different sub-range granularity:

// Assume:
tree n10 = build_int_cst (integer_type_node, 10);
tree n20 = build_int_cst (integer_type_node, 20);
tree n30 = build_int_cst (integer_type_node, 30);
tree n40 = build_int_cst (integer_type_node, 40);
tree n50 = build_int_cst (integer_type_node, 50);
tree n60 = build_int_cst (integer_type_node, 60);

// int_range<1> (aka value_range):

int_range<1> r1 (n10, n20);
int_range<1> r2 (n30, n40);
r1.union_ (r2);
r1.dump ();

// Outputs: [10, 40]

// Note: Since result doesn't fit in an int_range<1>, it is
// conservatively truncated.

// int_range<2>:

int_range<2> r1 (n10, n20);
int_range<2> r2 (n30, n40);
r1.union_ (r2);
r1.dump ();

// Outputs: [10, 20][30, 40]

int_range<2> r3 (n50, n60);
r1.union_ (r3);
r1.dump ();

// Outputs: [10, 20][30, 60]

// Note: Unioning [10, 20][30, 40] and [50, 60] doesn't fit
// into an int_range<2>, so it is truncated.

int_range<1> small;
small = r1;  // r1 is [10,20][30,60] as above.
small.dump ();

// Outputs: [10, 60]

// Note: Copying a larger range into a smaller range will
// result in truncating the result.

// int_range_max:

int_range_max r;
for (i = 0; i <= 40; i += 10) {
        tree low = build_int_cst (integer_type_node, i);
        tree high = build_int_cst (integer_type_node, i + 1);
        int_range<2> tmp (low, high);
        r.union_ (tmp);
}
r.dump ();

// Outputs: [0, 1][10, 11][20, 21][30, 31][40, 41]

Write your code for infinite sub-ranges

Your code should be agnostic to the sub-ranges there-in. For example, if you wanted to check if the numbers 5 through 10 are in a range R, avoid looking at min/max/kind and having to special case VR_RANGE, VR_ANTI_RANGE, and VR_VARYING. Instead do something like this:

int_range_max my_range (build_int_cst (5, integer_type_node),
                        build_int_cst (10, integer_type_node));
my_range.intersect (R);
if (R == my_range)
        return goodness;

If on the other hand, if you want to check if there is ANY of the numbers from 5 through 10 are in the range, you could do:

my_range.intersect (R);
if (!R.undefined_p ())
        return goodness;

First, don't be afraid to build throw-away ranges. They're cheap, light-weight, and don't require memory allocations.

Also, you could also have declared my_range as containing a max of 3 sub-ranges with:

int_range<3> my_range;

However, the result of the intersect code above will be limited to 3 sub-ranges. Sometimes 3 is all you need, but we suggest that intermediate results be done in int_range_max (currently 255 sub-ranges) and then the result be copied to whatever range you wish to return.

For example:

void some_calculation(irange &result, const irange &a, const irange &b)
{
        int_range_max intermediate = a;
        do_stuff (intermediate, b);
        result.intersect (intermediate);
}

Note that operations on ranges are done in the granularity of the range itself, not its argument. For example, BIG.intersect(SMALL) is done in the granularity of BIG. On the other hand, if SMALL is an int_range<1> and BIG is an int_range<5>, SMALL.intersect(BIG) will be squished down to the granularity of SMALL.

Example of iterating through sub-ranges in a range:

        if (r.undefined_p ())
                return;
        for (unsigned i = 0; i < r.num_pairs (); ++i) {
                wide_int x = r.lower_bound (i);
                wide_int y = r.upper_bound (i);
                // do stuff with x/y
        }

        // Note: undefined_p() is syntactic sugar for num_pairs() == 0.

Functions should take in irange's and stick to the API

All functions taking in value_range's should be rewritten to take in an irange instead. Taking in the irange base class guarantees your code works with any range.

You should keep to the main API which is listed in value-range.h, while avoiding the methods marked with DEPRECATED, especially kind (), min (), max (), symbolic_p (), which are all slated to be removed.

For example, symbolic_p should be avoided, because 9 out of 10 times passes don't need symbolics. Exceptions to these are the VRP twins (evrp / VRP) which need it internally. But virtually all passes that currently use the evrp_range_analyzer, never do anything with a symbolic range returned by get_value_range. Also, get_range_info() returns global range information, and does not use symbolics.

The API normalizes symbolics into integers, so you can use it and assume the range contains only numbers. For example, a symbolic of [5,X] is normalized into [5,MAX] when accessed through the API. That is, the lower bound of the range is 5, and the upper bound is TYPE_MAX_VALUE for the type. Similarly, a symbolic of [A, B] is normalized to [MIN,MAX].

If your functions take in an irange and limit themselves to the non-deprecated API, your code will be guaranteed to work with either one or an infinite amount of sub-ranges when available.

Note that if your pass needs symbolics and you're not evrp/vrp (really, you don't), your ability to make use of multi-ranges is limited to operations involving 2 constant ranges, and most operations will be truncated down to a value_range, thus defeating the purpose. If your pass is like 90% of passes, just stick to the API, use iranges, and you'll be fine.

Keep in mind that passing a multi-range into the EVRP eco-system won't buy you anything, as it is written entirely with value_range's. You need to pass a multi-range to the ranger, or range-ops to get anything with multi-ranges back.

anti-ranges

Anti ranges don't really exist. Although they currently exist in the internal representation of the legacy code, you shouldn't be checking for them, and you shouldn't be treating them specially. You should be asking for the bounds of a range or sub-range, not for anti-range in particular.

For example, ~[5,10] is really [0..4][11,MAX], and should be treated as such.

Cumbersome operations that were previous done with anti-ranges are much easier now. For example, previously if you were to write a predicate to check if X is in range R you would have to:

check that R is not undefined_p() check that R is not varying_p() check that R is either a VR_RANGE or a VR_ANTI_RANGE check that R is a constant_p (that is, is not a symbolic) check that R.kind()==VR_RANGE && X >= R.min() && X <= R.max() or....that R.kind()==VR_ANTI_RANGE && X < R.min() && X > R.max()

With the new API you check membership with:

R.contains_p(X)

VARYING and UNDEFINED ranges

The old VARYING nomenclature is now just syntactic sugar for a range of [MIN,MAX]. You can safely assume that any range (with the exception of an undefined range) has bounds. So the lower_bound() of a range that has a varying is the minimum value for the type. You shouldn't have to special case handling varying ranges. They're just ranges that span the entire domain of the underlying type.

Undefined ranges are syntactic sugar for the empty set (that is R.num_pairs() == 0). Since you obviously can't ask for the bounds of an undefined_p() range, you may have to check that a range is not the empty set first before asking for its bounds.

Returning range results from a function

Note, that functions that return ranges should do so by passing a reference (or pointer, though we prefer references). The reason is that you want the caller to decide what sub-range granularity they desire. Otherwise you end up returning by value an int_range<5>, when all the user wanted was an int_range<2>. Or worse, you truncate the results when the user wants a higher precision.

There are various setters that are quicker than full-on copying. You should use them when possible. For example,

instead of:

result = int_range<1> (build_zero_cst
(integer_type_node), build_zero_cst (integer_type_node));

do:

result.set_zero (integer_type_node);

instead of:

result = int_range<1> (a, b);

do:

result.set (a, b);

value_range's are not evil, but are severely limited

Note that int_range<1> and value_range are the same thing. In fact, value_range is typedef'd to be an int_range<1>. However, we prefer int_range<1>, as value_range is deprecated because it implies legacy support. Do not that your functions should take in irange's, not int_range<> or value_range, as this this ensures that it will work with any passed range.

Persistent irange's

If you wish to persistently keep iranges beyond the scope of your function, we will provide an irange_pool class when the ranger is contributed. It will provide a mechanism for dynamically allocating ranges. More details will follow, but the general gist is:

irange_pool pool;

// Allocate an irange of 5 sub-ranges.
irange *p = pool.allocate (5);

// Allocate an irange of 3 sub-ranges.
irange *q = pool.allocate (3);

// Allocate an irange with as many sub-ranges as are currently
// used in "some_other_range".
irange *r = pool.allocate (some_other_range);

In the above snippets, the "pool" object will deallocate any ranges at destruction.