This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Designs for better debug info in GCC


On Nov 12, 2007, Mark Mitchell <mark@codesourcery.com> wrote:

> (We may already have lost some information, though.  For example, given:

>   i = 3;
>   f(i);
>   i = 7;
>   i = 2;
>   g(i);

> we may well have lost the "i = 7" assignment, so "i" might appear to
> have the value "3" right before we assign "2" to it, if we were to
> generate debug information right then.)

Yup.  And even if we could somehow preserve that information, there
wouldn't be any code to attach that information to.  There might be
uses for empty-range locations in debug information, but I can't think
of any.  Can anyone?  It's something we could try to preserve, and
with my design it would be quite easy to do so, but unless it's useful
for some purpose, I think we could just do away with it.

> The reason I want to make that assumption is that the part of this where
> the representation is in question is once we reach RTL, right?

I'm not sure what is in question at all.  I've proposed a design to
preserve debug information throughout compilation.  Other designs on
the table differ both in tree and rtl levels, and in the potential
quality and correctness of the debug information they can produce.

> I guess I still don't really understand what you're doing at the RTL
> level.

It's no different, except that instead of a DEBUG_STMT it's a
DEBUG_INSN, with the TREE exprssion converted to an RTL expression.

/me mumbles something about the silliness of keeping two completely
different yet nearly-isomorphic internal representations for
statements/instructions.

> What I don't understand is how it's actually going to work.  What
> are the notes you're inserting?

They're always of the form

  DEBUG user-variable = expression

where DEBUG stands for a DEBUG_STMT or a DEBUG_INSN, user-variable is
a tree that represents the user variable, and expression is a TREE or
RTL (depending on which representation we're in) that evaluates to the
value the user-variable is expected to hold at that point in the
program.

> Do they just say "here is an RTL expression for computing the value of
> user-variable V at this point in the program"?

In RTL, yes.

> Why does it make sense to have that, rather than notes on
> instructions that say what affect the instruction has on user
> variables?

Few instructions need such notes, so the proposal of growing SET by
33% doesn't quite appeal to me.  And then, optimizations move
instructions around, but I don't think they should move the assignment
notes around, for they should reflect the structure of the source
program, rather than the mangled representation that the optimizers
turn it into.

That said, growing SET to add to it a list of variables (or components
thereof) that the variable is assigned to could be made to work, to
some extent.  But when you optimize away such a set, you'd still have
to keep the note around, so it's not clear to me that adding code all
over to maintain the notes in place when the SETs go away or are
juggled around would bring us any advantage.  It would be just a
redundant notation for what the note would already convey, so it just
brings complexity for no actual advantage.

To make it concrete, consider that your example above could have become:

(set (reg i) (const_int 3)) ;; assigns to i
(set (reg P1) (reg i))
(call (mem f))
(set (reg i) (const_int 7)) ;; assigns to i
(set (reg i) (const_int 2)) ;; assigns to i
(set (reg P1) (reg i))
(call (mem g))

could have been optimized to:

(set (reg P1) (const_int 3))
(call (mem f))
(set (reg P1) (const_int 2))
(call (mem g))

and then you wouldn't have any debug information left for variable i.

whereas with the notes I propose, you'd be left with:

(debug i (const_int 3))
(set (reg P1) (const_int 3))
(call (mem f))
(debug i (const_int 7)) ;; may be dropped, as discussed above
(debug i (const_int 2))
(set (reg P1) (const_int 2))
(call (mem g))

even if no register at all ends up allocated for i.  And if there were
uses of i that followed the assignment to 7, to which the constant
could be propagated, you'd still be left with the annotation to
indicate that i has a new value at the correct point.

> As a meta-question, have you or anyone else on the list looked at the
> literature (IEEE/ACM, etc.) or how other compilers handle these problems?

I couldn't find much information about other compilers, but I've see a
number of (mostly dated) articles and US patents.  In fact, I'm
particularly concerned that US Patent 6091896 covers the design
proposed by Richi, that involves annotating the instructions
themselves.  I believe the independent, stand-alone annotations I
propose escape the patent claims.

That said, if anyone knows of articles that could be of use, I'd love
to hear about them.  It's not like my research was exhaustive.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]