This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
- From: David Malcolm <dmalcolm at redhat dot com>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: Richard Biener <richard dot guenther at gmail dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 16 Sep 2015 16:21:52 -0400
- Subject: Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
- Authentication-results: sourceware.org; auth=none
- References: <1441916913-11547-1-git-send-email-dmalcolm at redhat dot com> <1441916913-11547-8-git-send-email-dmalcolm at redhat dot com> <CAFiYyc1_i55DkKTwxr5VF3gyEf89bAcMkA6h-kK_DLXo5DUePQ at mail dot gmail dot com> <20150915102045 dot GB1847 at tucnak dot redhat dot com> <CAFiYyc2c0W6ZQD0C+OZFN-onWG-GN_7sMO39r2LOkG8PH64EUg at mail dot gmail dot com> <20150915104844 dot GC1847 at tucnak dot redhat dot com>
On Tue, 2015-09-15 at 12:48 +0200, Jakub Jelinek wrote:
> On Tue, Sep 15, 2015 at 12:33:58PM +0200, Richard Biener wrote:
> > On Tue, Sep 15, 2015 at 12:20 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> > > On Tue, Sep 15, 2015 at 12:14:22PM +0200, Richard Biener wrote:
> > >> > diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
> > >> > index 760467c..c7558a0 100644
> > >> > --- a/gcc/cp/parser.h
> > >> > +++ b/gcc/cp/parser.h
> > >> > @@ -61,6 +61,8 @@ struct GTY (()) cp_token {
> > >> > BOOL_BITFIELD purged_p : 1;
> > >> > /* The location at which this token was found. */
> > >> > location_t location;
> > >> > + /* The source range at which this token was found. */
> > >> > + source_range range;
> > >>
> > >> Is it just me or does location now feel somewhat redundant with range? Can't we
> > >> compress that somehow?
> > >
> > > For a token I'd expect it is redundant, I don't see how it would be useful
> > > for a single preprocessing token to have more than start and end locations.
> > > But generally, for expressions, 3 locations make sense.
> > > If you have
> > > abc + def
> > > ~~~~^~~~~
> > > then having a range is useful. In any case, I'm surprised that the ranges aren't encoded in
> > > location_t (the data structures behind it, where we already stick also
> > > BLOCK pointer).
> >
> > Probably lack of encoding space ... I suppose upping location_t to
> > 64bits coud solve
> > some of that (with its own drawback on increasing size of core structures).
>
> What I had in mind was just add
> source_location start, end;
> to location_adhoc_data struct and use !IS_ADHOC_LOC locations to represent
> just plain locations without block and without range (including the cases
> where the range has both start and end equal to the locus) and IS_ADHOC_LOC
> locations for the cases where either we have non-NULL block, or we have
> some other range, or both. But I haven't spent any time on that, so just
> wondering if such an encoding has been considered.
I've been attempting to implement that.
Am I right in thinking that the ad-hoc locations never get purged? i.e.
that once we've registered an ad-hoc location, then is that slot within
location_adhoc_data_map is forever associated with that (locus, block)
pair? [or in the proposed model, the (locus, src_range, block)
triple?].
If so, it may make more sense to put the ranges into ad-hoc locations,
but only *after tokenization*: in this approach, the src_range would be
a field within the tokens (like in patch 07/22), in the hope that the
tokens are short-lived (which AIUI is the case for libcpp and C, though
not for C++), presumably also killing the "location" field within
tokens. We then stuff the range into the location_t when building trees
(maybe putting a src_range into c_expr to further delay populating
location_adhoc_data_map).
That way we avoid bloating the location_adhoc_data_map during lexing,
whilst preserving the range information, and we can stuff the ranges
into the 32-bit location_t within tree/gimple etc (albeit paying a cost
within the location_adhoc_data_map).
Thoughts? Hope this sounds sane.
Dave