[range-ops] patch 01/04: types for VR_UNDEFINED and VR_VARYING

Wed Jul 24 17:06:00 GMT 2019

On July 24, 2019 6:00:04 PM GMT+02:00, Jeff Law <law@redhat.com> wrote:
>On 7/23/19 3:42 AM, Richard Biener wrote:
>> On Tue, Jul 23, 2019 at 1:59 AM Jeff Law <law@redhat.com> wrote:
>>>
>>> On 7/16/19 12:37 PM, Andrew MacLeod wrote:
>>>> On 7/9/19 5:56 AM, Richard Biener wrote:
>>>>> On Tue, Jul 9, 2019 at 9:28 AM Aldy Hernandez <aldyh@redhat.com>
>wrote:
>>>>>>
>>>>>>
>>>>>> On 7/4/19 6:33 AM, Richard Biener wrote:
>>>>>>> On Wed, Jul 3, 2019 at 2:17 PM Aldy Hernandez <aldyh@redhat.com>
>wrote:
>>>>>>>> On 7/3/19 7:08 AM, Richard Biener wrote:
>>>>>>>>> On Wed, Jul 3, 2019 at 11:19 AM Aldy Hernandez
><aldyh@redhat.com>
>>>>>>>>> wrote:
>>>>>>>> How about we keep VARYING and UNDEFINED typeless until right
>before we
>>>>>>>> call into the ranger.  At which point, we have can populate
>min/max
>>>>>>>> because we have the tree_code and the type handy.  So right
>before we
>>>>>>>> call into the ranger do:
>>>>>>>>
>>>>>>>>           if (varying_p ())
>>>>>>>>             foo->set_varying(TYPE);
>>>>>>>>
>>>>>>>> This would avoid the type cache, and keep the ranger happy.
>>>>>>> you cannot do set_varying on the static const range but instead
>>>>>>> you'd do
>>>>>>>
>>>>>>>     value_range tem (*foo);
>>>>>>>     if (varying_p ())
>>>>>>>      tem->set_full_range (TYPE);
>>>>>>>
>>>>>>> which I think we already do in some places.  Thus my question
>_where_
>>>>>>> you actually need this.
>>>>>> Basically, everywhere.  By having a type for varying/undefined,
>we don't
>>>>>> have to special case anything.  Sure, we could for example,
>special case
>>>>>> the invert operation for undefined / varying.  And we could
>special case
>>>>>> everything dealing with ranges to handle varying and undefined,
>but why?
>>>>>>    We could also pass a type argument everywhere, but that's just
>ugly.
>>>>>> However, I do understand your objection to the type cache.
>>>>>>
>>>>>> How about the attached approach?  Set the type for
>varying/undefined
>>>>>> when we know it, while avoiding touching the CONST varying.  Then
>right
>>>>>> before calling the ranger, pass down a new varying node with
>min/max for
>>>>>> any varyings that were still typeless until that point.
>>>>>>
>>>>>> I have taken care of never adding a set_varying() that was not
>already
>>>>>> there.  Would this keep the const happy?
>>>>>>
>>>>>> Technically we don't need to set varying/undef types for every
>instance
>>>>>> in VRP, but we need it at least for the code that will be shared
>with
>>>>>> range-ops (extract_range_from_multiplicative_op, union,
>intersect, etc).
>>>>>>    I just figured if we have the information, might as well set
>it for
>>>>>> consistency.
>>>>>>
>>>>>> If you like this approach, I can rebase the other patches that
>depend on
>>>>>> this one.
>>>>> OK, so I went ant checked what you do for class irange which has
>>>>> a type but no kind member (but constructors with a kind).  It also
>>>>> uses wide_int members for storage.  For a pure integer constant
>>>>> range representation this represents somewhat odd choices;  I'd
>>>>> have elided the m_type member completely here, it seems fully
>>>>> redundant.  Only range operations need to be carried out in a
>>>>> specific type (what I was suggesting above).  Even the precision
>>>>> encoded in the wide_int members is redundant then (I'd have
>>>>> expected widest_int here and trailing-wide-ints for optimizing
>>>>> storage).
>>>>
>>>> What irange contains internally is a bit arbitrary.  It's really an
>API
>>>> for managing a set of subranges.  We could have used trees
>internally
>>>> just as easily, then we wouldnt need a type field. Since we were
>doing
>>>> lots of operations, rather than going back and forth from trees all
>the
>>>> time, we just used the underlying wide_int directly.   we could
>have
>>>> fiddle farted around with HOST_WIDE_INT or whatever, but wide_int
>is
>>>> there, has all the operations, and it works fine. so thats what it
>>>> currently is on the branch.
>>>>
>>>> We are treating a range object as a unique self contained object.
>>>> Therefore, the range has a type so we know how to print it, and can
>>>> confirm before any operation that the ranges being operated on are
>>>> appropriately matched.  We found and opened bugzillas over the past
>>>> couple years for places where our code caught bugs because a range
>was
>>>> created and then operated on in a way that was not compatible with
>>>> another range.  I think there is a still an open one against ada(?)
>>>> where the switch and case  are different precision.
>>>>
>>>> From my point of view, a range object is similar to a tree node. A
>tree
>>>> node has the bits to indicate what the value is, but also
>associates a
>>>> type with those bits within the same object.  This is less error
>prone
>>>> than passing around the bits and the type separately.  As ranges
>are
>>>> starting to be used in many places outside of VRP, we should do the
>same
>>>> thing with ranges.  WIth value_range it would actually be free
>since
>>>> there is already a tree for the bounds already which contains the
>type.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> to fold_range/op_range?  The API also seems to be oddly
>>>>> constrained to binary ops.  Anyway, the way you build
>>>>> the operator table requires an awful lot of global C++ ctor
>>>>> invocations, sth we generally try to avoid.  But I'm getting
>>>>> into too many details here.
>>>>
>>>> Its "oddly constrained" because what you are looking at  is just
>the
>>>> standardized unary/binary ops code.. ie the equivalent of
>>>> extract_range_from_binary_expr() and
>extract_range_from_unary_expr().
>>>> The other ops we care about have specialized requirements, like
>PHIs
>>>> and the arbitrary numbers of parameters in a call, or anything less
>>>> common than one or two operands.    You are just not seeing those
>parts.
>>>>
>>>>>
>>>>> So - to answer your question above, I'd like you to pass down
>>>>> a type to operations.  Because that's what is fundamentally
>>>>> required - a range doesn't have a "type" and the current
>>>>> value_range_base doesn't fall into the trap of needing one.
>>>>>
>>>>> Richard.
>>>>>
>>>>
>>>> Why is having a type associated with the data a "trap"?   Perhaps
>the
>>>> old VRP lattice idea didn't need a type with the UNDEFINED and
>VARYING
>>>> lattice values, but we're moving past lattice values and into a
>realm
>>>> where we have ranges as useful things outside of VRP, and trying to
>>>> shoehorn lattice values does not seem appropriate anymore.
>>>>
>>>> I looked at implementing range-ops without a type in the range, and
>we
>>>> end up passing 2 parameters everywhere each time we do anything
>with a
>>>> range.  This doubled the number of parameters in most routines, and
>when
>>>> we had chains of calls, we were simply passing the type along with
>the
>>>> range.  It seems archaic to be constantly passing information
>around
>>>> instead of associating it with the range itself.  Its just another
>place
>>>> for an error to creep in.. Aldy found a place where we were
>creating
>>>> varying nodes for floats IIRC.. the type checking in the range
>>>> operations caught it precisely because the range returned wasn't
>the
>>>> type it was assumed to be.
>>>>
>>>> That said. I believe we can do away with the need for a type with
>an
>>>> 'UNDEFINED' range.  That is not too onerous, and there doesn't
>really
>>>> seem to be too many ripple effect from have a typeless undefined
>range.
>>>> I think we can contain that, and prevents us from having to add a
>hack
>>>> to value_range for that situation.
>>>>
>>>> VARYING  is another situation completely.  We adopted the term
>'varying'
>>>> for ease of compatibility with VRP, but varying is simply a range
>going
>>>> from MIN to MAX for a type.  Removing the type from that would then
>>>> require we always pass a type in with every range which gets back
>to
>>>> doubling the number of of parameters everywhere for no good reason.
>>>>
>>>> If we standardize value_range so that MIN and MAX are set for
>varying,
>>>> then everything works very smoothly, we can make value_range and
>irange
>>>> interchangeable and facilitate getting the range ops code into
>trunk.
>>>>
>>>> It seems like the only reason we cant do this right now is the
>CONST
>>>> varying nodes that are returned from get_value_range().
>>>> Looking at that routine, it seems there are only 2 cases where that
>can
>>>> be returned
>>>>  1) we ask for an ssa-name beyond the end of the local ssa-name
>vector
>>>>  2) once values_propagated is set  to true and an ssa-name has no
>entry.
>>>>
>>>> Both seem pretty straightforward to fix...
>>>> 1)  if we ask for an ssa-Name beyond the end, simply reallocate the
>>>> vector to be big enough.   I doubt this will trigger a lot, and if
>we
>>>> initially allocate it with room for an extra 10% names it'll
>probably
>>>> never trigger.  Or pick whatever number seems appropriate.
>>>> 2) if values_propagated is true, simply allocate a node for the
>>>> ssa-name,  set it to varying and return it. THIs accomplishes the
>same
>>>> thing.
>>>>
>>>> the number of additional nodes we allocate will be pretty minimal,
>and
>>>> it will never exceed the number of ssa-names. Then we never have to
>>>> worry about having CONST set for a varying node either. I see
>places
>>>> where there is special processing to avoid calling set_varying 
>because
>>>> we dont know if the vbalue_range in hand is in constant memory and
>would
>>>> cause the compiler to trap. This seems like a really bad situation,
>and
>>>> it would eliminate that issue.  We can also then eliminate the
>places
>>>> where VARYING is expanded to min/max for a given type.. instead we
>can
>>>> just pick up min/max directly.    It seems much cleaner overall.
>>>>
>>>> Something like the simple attached patch would resolve that issue,
>and
>>>> remove any lurking concerns/bugs with the CONST code.
>>>>
>>>> Then we can associate a type with varying, canonicalize it
>consistently,
>>>> and set MIN/MAX appropriately.   This will give us full
>>>> interchangeability between irange and value_range, establish a
>>>> solid/consistent API,  and we can move on to the next thing :-)
>>>>
>>>> Does this not seem reasonable?
>>> One might even claim that this patch in and of itself is a step
>forward
>>> independent of all the other work going on.
>>>
>>> THe first time I saw that code when I copied it into vr-values I
>thought
>>> it was ripe for fixing, but never got to it.  As it stands, it's a
>hack,
>>> no more, no less and it's a hack that we can easily remove and do
>>> something better.
>> 
>> It's not a hack - it's an optimization and a _sanitization_.  Iff you
>> want to remove it then
>> make it return NULL (no range) and make all callers deal with that.
>We can do better here.  Callers shouldn't have to cope with the fact
>that SSA_NAMEs and thus ranges can be created on the fly (gimple-fold)
>and they should be able to get a valid range object that is no
>different
>than any other range object.   NULL is no better than the constant
>varying node.
>
>If you really think we need some kind of special case here for a
>client,
>can you please elaborate which client and why?  Clearly it's something
>you feel strongly about and I'm open to the possibility there's a case
>I'm not aware of.
>
>
>
>> 
>> That we return a pointer to read-only storage makes sure callers do
>not
>> change the value-range.  That's better than the alternative below
>which
>> is the appropriate "fix" for the existing code if you want to get rid
>of the
>> statically allocated constant varying and the "hack" of CONST_CASTing
>it.
>But I'd claim that if callers are required not to change these ranges,
>then the callers are fundamentally broken.  I'm not sure what the
>"sanitization" is really buying you here.  Can you point to something
>specific?
>
>> 
>> But you lose the sanitizing that nobody can change it and the
>> changed info leaks to other SSA vars.
>> 
>> As said, fix all callers to deal with NULL.
>> 
>> But I argue the current code is exactly optimal and safe.
>ANd I'd argue that it's just plain broken and that the sanitization
>you're referring to points to something broken elsewhere,  higher up in
>the callers.

Another option is to make get_value_range return by value and the only way to change the lattice to call an appropriate set function. I think we already do the latter in all cases (but we use get_value_range in the setter) and returning by reference is just eliding the copy. 

Richard. 

>
>jeff