This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Linemap and pph

From: Gabriel Charette <gchare at google dot com>
To: Gabriel Charette <gchare at google dot com>, Tom Tromey <tromey at redhat dot com>, Diego Novillo <dnovillo at google dot com>, Lawrence Crowl <crowl at google dot com>, gcc at gcc dot gnu dot org, Collin Winter <collinwinter at google dot com>, Cary Coutant <ccoutant at google dot com>
Date: Tue, 26 Jul 2011 11:10:58 -0700
Subject: Re: Linemap and pph
References: <CAJTZ7LJ7Pt=BC5Y-Tu2uL90Hvbj+vaqTh0Ovq3t=e9dfBD_yqg@mail.gmail.com> <m3ipquvylt.fsf@fleche.redhat.com> <CAJTZ7L+VPzfrLvnR6ZDmCd29hm2OFHo+taQfLZxDqMBgwqBg1w@mail.gmail.com> <m3ipqtj8ra.fsf@seketeli.org> <CAJTZ7LLqiOZnPX8nXHC358SjCDsWn2REH8oPkn3xTPY0+z0XfA@mail.gmail.com> <CAJTZ7LKkw5kpbYnAT=CTdhiK8BU7WZVjOOkFoWW-PcFX3GDvbg@mail.gmail.com> <m339htgu6x.fsf@seketeli.org>

On Tue, Jul 26, 2011 at 4:25 AM, Dodji Seketeli <dodji@seketeli.org> wrote:
> Gabriel Charette <gchare@google.com> a écrit:
>
>> Alright, so after looking even more at the linemap code, here is what
>> I'm thinking now:
>>
>> So the main problem is:
>> with the linemap logic is that linemap_line_start adds a new table
>> entry if line_delta < 0 (I don't understand why this is needed, my
>> assumption is that it is because the rest of the logic depends on
>> set->highest_location, thus we want to make sure we are higher than
>> the highest so far? can someone clarify why we need this?)
>
> Here is how I understand this. ?Sorry if it's too obvious.
>
> The linemap_line_start model follows that of a parser that parses tokens
> from left to right, and from top to bottom. ?Let's imagine a main CU
> called M which has this layout:
>
> <---- start, locus #0
>
> [... tokens ...]
>
> #include "A" ?<--- locus #1
>
> [... tokens ...]
>
> There are going to be at least three line maps (instances of struct
> line_map) created and added to the line map set (struct line_maps) of M:
> one for loci from #0 to right before #1, at least one for loci coming
> from the included header A (there can be several line maps here, if A
> itself includes other headers) and another one from loci coming right
> after #1.
>
> A hard invariant here is that source_locations yielded by a given line
> map must be less than the struct line_map::start_location of the next
> line map. ?This is (partly) so that line map lookup (when we want to get
> the line map that corresponds to a given source_location) can be done
> using a binary search, which is faster than just doing a linear search.
> A side effect of this is that source_locations handed out by
> linemap_line_start increase monotonically.
>
> With these assumptions in mind, when the parser code calls
> linemap_line_start to get the source_location corresponding to a new
> line at column 0, the only case where the new line is less than the
> preceding line (so that line_delta < 0) is if the parser is entering a
> new file. ?E.g, it is entering the header A, coming from M. ?Or, it is
> getting back to M, leaving A. ?In that case, it seems required for
> linemap_line_start to create the new line map for A or for M.
>

Thanks for this explanation, that's pretty much how I had understood
the linemap code, but you clarified some of the things I wasn't clear
about.

>> My solution:
>> I will add a boolean flag to linemap_line_start's parameters,
>> allowEarlierStartLineStart, which when true, will not create a new
>> entry in the line_table even if the line_delta < 0.
>
> If I understand correctly, you are acting like if you were parsing from
> right to left and from bottom to top, for the content of A.
>

Right.

> In that context, you need to keep the hard invariant I talked about
> above, so that line map lookup keeps working, at least.
>
> In other words, when you generate a source_location for a token T0 that
> comes topologically /before/ the token T1 you generated source_location
> for at the previous step, you need to make sure that source_location of
> T0 is less than source_location of T1, and that the struct
> line_map::start_location of the current map is less than T0. ?If T0
> comes from a different header than T1 (suppose both T1 and T0 comes from
> headers that are included by A) then a new line map (different from the
> line map for T1) needs to be created for T0. ?Now I am not sure if in
> your design, there is going to be only one pph for A, or if there is
> going to be one pph for each header included by A as well. ?I would have
> imagined the later case to be more appropriate.
>

Correct we are using the multiple pph approach (one for each included
header in the header).

This actually makes the problem harder than I had originally pictured
it yesterday. I'm now thinking using serialization is going to be a
better solution because of that.

>> Instead of relying on highest_location to find the current line's
>> source location, it will use the start_location and add to it the
>> delta between to_line and map->to_line).
>
> If the map->start_location was allocated correctly, I guess this could
> work assuming the constraints I raised above are respected.
>
> Note however that there are corner cases like having so long lines in
> the source code being parsed that we fill the column-space of the line
> map so quickly that we need to allocate more line maps for a given file.
> These are corner cases taken care of by linemap_line_start today that
> you could miss if you are trying to walk in reverse order like this.
> Those would have been addressed by serializing the original line map set
> of A that we knew was correct, barring the value of their
> map->start_location and map->to_line.
>

One more reason for serialization!

>> I will also add a new linemap function,
>> linemap_position_for_line_and_column, which will be just like
>> linemap_position_for_column, except that it won't assume the
>> highest_line in the set is the one we are currently on (this function
>> will thus allow us to get a source_location without depending on the
>> current assumptions on the order these functions are called in).
>
> Heh. ?I have added a function with that exact name in my patch set to
> track locations across macro expansions. ?Look at the source code at
> http://tinyurl.com/3egxp47, search for
> linemap_position_for_line_and_column (obviously). ?The implicit
> assumption made by that function is that the {line,column} pair which
> you are requesting a source_location for belongs to the current line
> map, though.
>

Good to know, I had made the same assumption in mine that we were
looking for {line, column} in the current line map.
----
So, I think serializing the linemap and tweaking it on the way in is
going to be easier, in particular because of nested pph (header
including headers).
I'll put down my strategy for serializing the linemap and send a
follow up email soon.

Thanks,
Gabriel

Follow-Ups:
- Re: Linemap and pph
  - From: Gabriel Charette

References:
- Linemap and pph
  - From: Gabriel Charette
- Re: Linemap and pph
  - From: Tom Tromey
- Re: Linemap and pph
  - From: Gabriel Charette
- Re: Linemap and pph
  - From: Dodji Seketeli
- Re: Linemap and pph
  - From: Gabriel Charette
- Re: Linemap and pph
  - From: Gabriel Charette
- Re: Linemap and pph
  - From: Dodji Seketeli

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]