This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Linemap and pph

From: Gabriel Charette <gchare at google dot com>
To: Gabriel Charette <gchare at google dot com>, Tom Tromey <tromey at redhat dot com>, Diego Novillo <dnovillo at google dot com>, Lawrence Crowl <crowl at google dot com>, gcc at gcc dot gnu dot org, Collin Winter <collinwinter at google dot com>, Cary Coutant <ccoutant at google dot com>
Date: Tue, 26 Jul 2011 14:21:10 -0700
Subject: Re: Linemap and pph
References: <CAJTZ7LJ7Pt=BC5Y-Tu2uL90Hvbj+vaqTh0Ovq3t=e9dfBD_yqg@mail.gmail.com> <m3ipquvylt.fsf@fleche.redhat.com> <CAJTZ7L+VPzfrLvnR6ZDmCd29hm2OFHo+taQfLZxDqMBgwqBg1w@mail.gmail.com> <m3ipqtj8ra.fsf@seketeli.org> <CAJTZ7LLqiOZnPX8nXHC358SjCDsWn2REH8oPkn3xTPY0+z0XfA@mail.gmail.com> <CAJTZ7LKkw5kpbYnAT=CTdhiK8BU7WZVjOOkFoWW-PcFX3GDvbg@mail.gmail.com> <m339htgu6x.fsf@seketeli.org> <CAJTZ7LKrf1TEtkn+KRSpHA-ZqS5X3CgjW-Kv7g-xdJViYXoURg@mail.gmail.com>

As mentionned in my previous email, I'm now thinking of serializing
the linemap from the header by pph'ed, here is how I think this could
work:

From what I understand, the source_locations allocated for everything
in a given set of headers (from the LC_ENTER for the header in the
line_table up to, and including everything in between, the
corresponding LC_LEAVE) is dependent on only one thing; the value of
line_table->highest_location when the header was inserted (i.e. if in
two different contexts we happen to have the same
line_table->highest_location when inserting header A.h, all of the
tokens for A.h in each context will have the same source_location).

If the previous statement is true, then we calculate an offset between
the line_table->highest_location as it was in the serialized version
of the header and as it is in the C file in which we are about to
de-serialize the line table.

We then need to update some things based on that offset:
  for each line_map entry in the line_table: start_location
  for each source_location field on tokens coming in from the header

Some other things need to be updated with simple logic:
  for each line_map entry in the line_table: included_from
  for the line_table itself: highest_location, highest_line, max_column_hint

Some things just stay as is when de-serialized:
  for each line_map entry in the line_table: reason, sysp, to_file,
to_line, column_bits

After doing this we should have the same line_map we would have had
when doing a regular compile (non-pph), and the tokens coming from the
header file should have the correct source_location, simply by adding
the calculated offset to the one they had previously (this also allows
us to only serialize the source_location for each token as opposed to
serializing it's expanded location, saving space in the pph).

Let me know what you think,
Gabriel

On Tue, Jul 26, 2011 at 11:10 AM, Gabriel Charette <gchare@google.com> wrote:
> On Tue, Jul 26, 2011 at 4:25 AM, Dodji Seketeli <dodji@seketeli.org> wrote:
>> Gabriel Charette <gchare@google.com> a écrit:
>>
>>> Alright, so after looking even more at the linemap code, here is what
>>> I'm thinking now:
>>>
>>> So the main problem is:
>>> with the linemap logic is that linemap_line_start adds a new table
>>> entry if line_delta < 0 (I don't understand why this is needed, my
>>> assumption is that it is because the rest of the logic depends on
>>> set->highest_location, thus we want to make sure we are higher than
>>> the highest so far? can someone clarify why we need this?)
>>
>> Here is how I understand this. ?Sorry if it's too obvious.
>>
>> The linemap_line_start model follows that of a parser that parses tokens
>> from left to right, and from top to bottom. ?Let's imagine a main CU
>> called M which has this layout:
>>
>> <---- start, locus #0
>>
>> [... tokens ...]
>>
>> #include "A" ?<--- locus #1
>>
>> [... tokens ...]
>>
>> There are going to be at least three line maps (instances of struct
>> line_map) created and added to the line map set (struct line_maps) of M:
>> one for loci from #0 to right before #1, at least one for loci coming
>> from the included header A (there can be several line maps here, if A
>> itself includes other headers) and another one from loci coming right
>> after #1.
>>
>> A hard invariant here is that source_locations yielded by a given line
>> map must be less than the struct line_map::start_location of the next
>> line map. ?This is (partly) so that line map lookup (when we want to get
>> the line map that corresponds to a given source_location) can be done
>> using a binary search, which is faster than just doing a linear search.
>> A side effect of this is that source_locations handed out by
>> linemap_line_start increase monotonically.
>>
>> With these assumptions in mind, when the parser code calls
>> linemap_line_start to get the source_location corresponding to a new
>> line at column 0, the only case where the new line is less than the
>> preceding line (so that line_delta < 0) is if the parser is entering a
>> new file. ?E.g, it is entering the header A, coming from M. ?Or, it is
>> getting back to M, leaving A. ?In that case, it seems required for
>> linemap_line_start to create the new line map for A or for M.
>>
>
> Thanks for this explanation, that's pretty much how I had understood
> the linemap code, but you clarified some of the things I wasn't clear
> about.
>
>>> My solution:
>>> I will add a boolean flag to linemap_line_start's parameters,
>>> allowEarlierStartLineStart, which when true, will not create a new
>>> entry in the line_table even if the line_delta < 0.
>>
>> If I understand correctly, you are acting like if you were parsing from
>> right to left and from bottom to top, for the content of A.
>>
>
> Right.
>
>> In that context, you need to keep the hard invariant I talked about
>> above, so that line map lookup keeps working, at least.
>>
>> In other words, when you generate a source_location for a token T0 that
>> comes topologically /before/ the token T1 you generated source_location
>> for at the previous step, you need to make sure that source_location of
>> T0 is less than source_location of T1, and that the struct
>> line_map::start_location of the current map is less than T0. ?If T0
>> comes from a different header than T1 (suppose both T1 and T0 comes from
>> headers that are included by A) then a new line map (different from the
>> line map for T1) needs to be created for T0. ?Now I am not sure if in
>> your design, there is going to be only one pph for A, or if there is
>> going to be one pph for each header included by A as well. ?I would have
>> imagined the later case to be more appropriate.
>>
>
> Correct we are using the multiple pph approach (one for each included
> header in the header).
>
> This actually makes the problem harder than I had originally pictured
> it yesterday. I'm now thinking using serialization is going to be a
> better solution because of that.
>
>>> Instead of relying on highest_location to find the current line's
>>> source location, it will use the start_location and add to it the
>>> delta between to_line and map->to_line).
>>
>> If the map->start_location was allocated correctly, I guess this could
>> work assuming the constraints I raised above are respected.
>>
>> Note however that there are corner cases like having so long lines in
>> the source code being parsed that we fill the column-space of the line
>> map so quickly that we need to allocate more line maps for a given file.
>> These are corner cases taken care of by linemap_line_start today that
>> you could miss if you are trying to walk in reverse order like this.
>> Those would have been addressed by serializing the original line map set
>> of A that we knew was correct, barring the value of their
>> map->start_location and map->to_line.
>>
>
> One more reason for serialization!
>
>>> I will also add a new linemap function,
>>> linemap_position_for_line_and_column, which will be just like
>>> linemap_position_for_column, except that it won't assume the
>>> highest_line in the set is the one we are currently on (this function
>>> will thus allow us to get a source_location without depending on the
>>> current assumptions on the order these functions are called in).
>>
>> Heh. ?I have added a function with that exact name in my patch set to
>> track locations across macro expansions. ?Look at the source code at
>> http://tinyurl.com/3egxp47, search for
>> linemap_position_for_line_and_column (obviously). ?The implicit
>> assumption made by that function is that the {line,column} pair which
>> you are requesting a source_location for belongs to the current line
>> map, though.
>>
>
> Good to know, I had made the same assumption in mine that we were
> looking for {line, column} in the current line map.
> ----
> So, I think serializing the linemap and tweaking it on the way in is
> going to be easier, in particular because of nested pph (header
> including headers).
> I'll put down my strategy for serializing the linemap and send a
> follow up email soon.
>
> Thanks,
> Gabriel
>

Follow-Ups:
- Re: Linemap and pph
  - From: Dodji Seketeli

References:
- Linemap and pph
  - From: Gabriel Charette
- Re: Linemap and pph
  - From: Tom Tromey
- Re: Linemap and pph
  - From: Gabriel Charette
- Re: Linemap and pph
  - From: Dodji Seketeli
- Re: Linemap and pph
  - From: Gabriel Charette
- Re: Linemap and pph
  - From: Gabriel Charette
- Re: Linemap and pph
  - From: Dodji Seketeli
- Re: Linemap and pph
  - From: Gabriel Charette

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]