This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: On-Demand range technology [2/5] - Major Components : How it works
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Andrew MacLeod <amacleod at redhat dot com>
- Cc: GCC <gcc at gcc dot gnu dot org>, Jeff Law <law at redhat dot com>, Aldy Hernandez <aldyh at redhat dot com>
- Date: Tue, 4 Jun 2019 17:37:44 +0200
- Subject: Re: On-Demand range technology [2/5] - Major Components : How it works
- References: <908dbc60-b7b9-c609-f999-cc7264af0c6e@redhat.com> <CAFiYyc2a_h7AVa44G+wrAic=op+qbaGgLHbb+d8wGc3bFZdDaQ@mail.gmail.com> <a6a5c7ea-f4a1-c671-5ff4-cbd5183a7606@redhat.com> <CAFiYyc3AGjzJpYJR4FFrN-J1yfa++aQ_9MJYiiZhJ4MS4Nw2qw@mail.gmail.com> <91bfd08e-1431-d8ba-f9b5-e8822aaf62e2@redhat.com> <CAFiYyc2Qvk2NJ0Akwakm8CEUz0ZdOFc4BbK1_xdQYpvdej=hgA@mail.gmail.com> <dbdfa6db-ddcf-cad9-69ed-0e514a7aea68@redhat.com> <CAFiYyc3TEHVfvFfdTc5-GskyQsn2YdhmN8qYnU9g7Q6uvv-ZDg@mail.gmail.com>
On Tue, Jun 4, 2019 at 5:26 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, May 31, 2019 at 5:40 PM Andrew MacLeod <amacleod@redhat.com> wrote:
> >
> > On 5/29/19 7:15 AM, Richard Biener wrote:
> > > On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod <amacleod@redhat.com> wrote:
> > >> On 5/27/19 9:02 AM, Richard Biener wrote:
> > >>> On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod <amacleod@redhat.com> wrote:
> > >>>>> The above suggests that iff this is done at all it is not in GORI because
> > >>>>> those are not conditional stmts or ranges from feeding those. The
> > >>>>> machinery doing the use-def walking from stmt context also cannot
> > >>>>> come along these so I have the suspicion that Ranger cannot handle
> > >>>>> telling us that for the stmt following above, for example
> > >>>>>
> > >>>>> if (_5 != 0)
> > >>>>>
> > >>>>> that _5 is not zero?
> > >>>>>
> > >>>>> Can you clarify?
> > >>>> So there are 2 aspects to this. the range-ops code for DIV_EXPR, if
> > >>>> asked for the range of op2 () would return ~[0,0] for _5.
> > >>>> But you are also correct in that the walk backwards would not find this.
> > >>>>
> > >>>> This is similar functionality to how null_derefs are currently handled,
> > >>>> and in fact could probably be done simultaneously using the same code
> > >>>> base. I didn't bring null derefs up, but this is a good time :-)
> > >>>>
> > >>>> There is a separate class used by the gori-cache which tracks the
> > >>>> non-nullness property at the block level. It has a single API:
> > >>>> non_null_deref_p (name, bb) which determines whether the is a
> > >>>> dereference in any BB for NAME, which indicates whether the range has an
> > >>>> implicit ~[0,0] range in that basic block or not.
> > >>> So when we then have
> > >>>
> > >>> _1 = *_2; // after this _2 is non-NULL
> > >>> _3 = _1 + 1; // _3 is non-NULL
> > >>> _4 = *_3;
> > >>> ...
> > >>>
> > >>> when a on-demand user asks whether _3 is non-NULL at the
> > >>> point of _4 = *_3 we don't have this information? Since the
> > >>> per-BB caching will only say _1 is non-NULL after the BB.
> > >>> I'm also not sure whether _3 ever gets non-NULL during
> > >>> non-NULL processing of the block since walking immediate uses
> > >>> doesn't really help here?
> > >> presumably _3 is globally non-null due to the definition being (pointer
> > >> + x) ... ie, _3 has a global range o f ~[0,0] ?
> > > No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and
> > > you cannot arrive at NULL by pointer arithmetic from a non-NULL pointer.
> >
> > I'm confused.
> >
> > _1 was loaded from _2 (thus asserting _2 is non-NULL). but we have no
> > idea what the range of _1 is, so how do you assert _1 is [~0,0] ?
>
> Bug on my part - it should offset _2:
>
> _1 = *_2; // after this _2 is non-NULL
> _3 = _2 + 1; // _3 is non-NULL
> _4 = *_3;
>
> > The only way I see to determine _3 is non-NULL is through the _4 = *_3
> > statement.
>
> So how do you derive a ~[0,0] _2 and thus know that *_3 does not trap?
>
> >
> > >
> > >>> So this seems to be a fundamental limitation [to the caching scheme],
> > >>> not sure if it is bad in practice.
> > >>>
> > >>> Or am I missing something?
> > >>>
> > >> Not missing anything The non-nullness property is maintains globally at
> > >> the basic block level. both _1 and _3 are flagged as being non-null in
> > >> the block. Upon exit, its a bit check. If the global information does
> > >> not have the non-nullness property, then when a request is made for
> > >> non-nullness and the def and the use are both within the same block,
> > >> and its flagged as being non-null in that block, then the request is
> > >> forced back to a quick walk between the def and the use to see if there
> > >> is any non-nulless introduced in between. Yes, that makes it a linear
> > >> walk, but its infrequent, and often short. to the best of our knowledge
> > >> at this point anyway :-)
> > > So with the clarification above do we ever see that _3 is non-NULL?
> > > I suppose the worker processing _3 = _1 + 1 would ask for
> > > _1 non-nullness but we do not record any non-NULL-ness of _1 in
> > > this basic-block (but only at its end). Consider stmts
> > >
> > > _4 = (uintptr_t) _2;
> > > _5 = _6 / _4;
> > > _1 = *_2;
> > > ...
> > >
> > > here at _1 we know _2 is not NULL. But if we ask for non-NULLness
> > > of _2 at the definition of _4 we may not compute ~[0, 0] and thus
> > > conclude that _6 / _4 does not trap.
> > EVRP must look backwards to figure this out since the forward walk will
> > process _5 = _6 / _4 before it sees the dereference to _2... so how does
> > it know that _4 is non-zero without looking backwards at things after it
> > sees the dereference?? Does it actually do this?
>
> It actually doesn't do this for divisions (but I have a patch lying somewhere).
> During the forward walk when coming along _5 = _6 / _4; it would
> in evrp_range_analyzer::record_ranges_from_stmt via infer_value_range
> push a new value-range for _4. To infer the same for _2 it would need to
> look backward which I think it doesn't do right now but it easily could
> (I'm just trying to make up examples to understand ranger better).
> When visiting _1 = *_2 we then know _2 is not NULL.
>
> There may be more "useful" cases, for example we could derive
> ranges for shift operands based on undefinedness and their range
> may be useful for simplifying adjacent stmts.
>
> I'm trying to discover how ranger appearantly recording ranges only
> at BB boundaries interacts with ranges that become "live" at
> some point inside the BB.
>
> > >
> > > stmt-level tracking of ranges are sometimes important. This is
> > > something the machinery cannot provide - correct? At least not
> > > optimistically enough with ranges derived about uses.
> >
> > Maybe I'm the one missing something, but in the absence of statement
> > level exception throwing via 'can_throw_non_call_exceptions' being true,
> > any assertion made anywhere in the block to an ssa_name applies to the
> > entire block does it not? ie it doesn't matter if the deference
> > happens first thing in the block or last thing, its not going to change
> > its value within the block.. its going to be non-null throughout the
> > entire block.
>
> Well, one complication I had with the division by zero patch is that
> for _2 = _1 / _3 you can derive that _3 is not zero _after_ the stmt
> but you may of course not assume it is not zero during evaluation
> of the stmt because that may lead you to assume it may not trap
> and thus move it to a place it was not executed previously.
>
> > so if one statement in the block asserts that references to _2 are
> > non-null, we can assert that all references to _2 in the block are
> > non-null. Meaning we get all these cases by knowing that the specified
> > name is non-zero through-out the block. This also means we could know
> > things earlier in the block than a forward walk would provide.
> >
> > So with the 2 examples:
> >
> > _1 = *_2; // after this _2 is non-NULL
> > _3 = _1 + 1;
> > _4 = *_3;
> >
> >
> >
> > both _2 and _3 are flagged as non-null in the block due to the
> > de-references. Im not sure what we know about _1 from above, but we do
> > know
> > ~[0,0] = _1 + 1 . so whatever that tells you , if anything, about
> > _1 we know. it seems to me _1 is ~[-1,-1] based on that...
> >
> > Regardless, I think we know all the same things EVRP does.
> >
> > likewise
> >
> > _4 = (uintptr_t) _2;
> > _5 = _6 / _4;
> > _1 = *_2;
> >
> > _2 will be non-null in the entire block, so _4 must also be non-null
> > and we can conclude that the divide does not trap.
>
> And move the divide up like so?
>
> _4 = (uintptr_t) _2;
> _5 = _6 / _4;
> if (flag)
> {
> _1 = *_2;
> ...
> }
>
> most definitely not.
>
> >
> >
> > now, when we set the ' can_throw_non_call_exceptions' flag, then we'd
> > have to resort to statement walks, and we cannot determine that _5 does
> > not trap anyway. EVRP is in the same boat.. It doesn't know its not
> > going to trap either because we may never get to the *_2..
>
> The point is not that we know *_2 does not trap - we don't! We just know
> that if the program reaches the next stmt it did not and thus _2 was not NULL
> _for the stmts following the dereference_.
>
> Note I'm not sure it will make a difference in practice since any transform
> on a stmt after *_2 relying on the above creates constraints on stmt ordering
> which we cannot represent in the IL and thus the result is likely going to be
> very fragile.
>
> As said, I was just wondering if the non-NULL machinery you have
> integrates with the range machinery or is really independent
> (thus the pointer derived from the non-NULL one due to the dereference
> case above).
>
> > >>>> yes, compile-time complexity is from empirical speed timings and
> > >>>> theory-crafting from the algorithms, and that the on-entry cache
> > >>>> prevents multiple passes over things.
> > >>>>
> > >>>> we have not done a memory analysis yet, not done anything to see if we
> > >>>> can improve it.
> > >>>> It makes very heavy use of bitmaps, which are typically quite sparse.
> > >>>> The on-entry cache is a vector of pointers to a range, initially 0, and
> > >>>> we allocate ranges as needed. There will be an on-entry range entry
> > >>>> for any ssa-name which has been queried between the query point and the
> > >>>> definition.
> > >>> So that's similar to LIVE-on-entry thus a SSA name with a range
> > >>> will have an on-entry range entry on each BB it dominates?
> > >>> That makes storage requirement quadratic in the worst case.
> > >> Yes, assuming a request has been made for all ssa-names everywhere they
> > >> are live.
> > > You did propose to replace [E]VRP with a walk over the whole function
> > > querying ranges and simplifying stmts, did you?
> > yes, for the case of EVRP. But not all use cases query every range
> > everywhere.
> >
> > its also a bit lower than that since we cache certain types of ranges.
> > we cache a range for varying (or range for type if you prefer) of
> > whatever the ssa-name type is. so if the range is varying everywhere,
> > we actually only have once instance of the range rather than N of them.
> > So any name that doesn't have a range reduction anywhere will only
> > create a single range instance for the entire CFG.
>
> But you still have a reference to the range in evry BB dominated by the
> definition?
Btw, I was thinking of
unsigned char foo(long);
void baz(unsigned char c)
{
try{
long i = c;
i += foo(i);
i += foo(i);
i += foo(i);
i += foo(i);
... repeat N times ...
alloca(i); // query range of i
}
catch (...)
{
}
}
where the single(!) query of the range of i on the alloca call
will populate N BBs caches with the Nth having ranges for
all SSA defs of i running into quadraticness in N.
If you actually try you probably will have to find a persisting
stmt that some user of on-demand query looks for.
You converted some of the warning passes so choose
a stmt that triggers it from there.
Note we've run into "interesting" testcases mostly from
machine generated code which we still want to be able
to optimize at -O1 at least with reasonable use of
time and memory (and quadraticnesses are to be avoided
like a plague - not that we don't have some).
> > I think thats the
> > most common value, so that should reduce a large number of them. I've
> > also considering caching ranges like we cache tree constants... but I
> > haven't delved into that. I figured if memory turns out to be a
> > problem, then we'll look at it then.
> >
> > Andrew
> >