Designs for better debug info in GCC

Tue Dec 18 22:43:00 GMT 2007

On 12/18/07, Alexandre Oliva <aoliva@redhat.com> wrote:

> Then, we let tree optimizers do their jobs.  Whenever they rename,
> renumber, coalesce, combine or otherwise optimize a variable, they
> will automatically update debug statements that mention them as well.
>
Speaking only about the tree level, in this entire email
I make no representations about the RTL level ;)

This is much harder than you give it credit for, unless you plan on
throwing out all the info at elimination points.

Consider PRE alone, which makes new statements that are combinations
of old ones, and eliminate tons of variables in favor of it.

If your debug statement strategy is "move debug statements when we
insert code that is equivalent", it won't work, because our
equivalence is based on value equivalence, not location equivalence.
We only guarantee it has the same value as the whatever it is a copy
of at that point, not that it has the same location.

So you will lose info every time PRE makes an insertion, unless you
make serious modifications to PRE.

This is not to mention the data you lose if you just throw it away at
elimination points.

Let's take another problem.

How do i say debug info for some variable is now dead, we have no idea
what it is right now?
How do I figure out which debug statements need to be modified when
you introduce new memory operations?

When you pass something by address, you get vops.
The vops are not variables, and have no relation to the original
variable (they can be partitions containing more vairables).

If i have

DEBUG(x, x_3)
x_3 = x; // Read from global

y = x_3;
....

If i insert a new call
DEBUG(x, x_3): 1
x_3 = x

foo() // May modify x and *&x)

y = x_3

Now you have two problems.

It is no longer true that at the point of y = x_3, that DEBUG (x, x_3) is true
In act, x_3 may no longer have any relation to x.
You have three choices:
1. Either destroy the DEBUG(x, x_3) losing valuable and correct info
2. Add a new DEBUG (x, unknown)
3. Figure out which debug statement are reached by your call

#3 is a dataflow problem, and not something you want to do every time
you insert a call.

If your answer is #1 or #2, then what you are really doing is
computing roughly the same dataflow problem var-location does, except
on trees and with a different meet-operation.

var-location generates incorrect info not because it represents
something fundamentally different than you are (it doesn't), it falls
down because it uses union as the meet operation.

It says "oh, i don't know which of these locations is right, it must
be both of them".

If you changed the meet operation to "oh, i don't know which of these
locations is right, it must be none of them", and did a little more
work you would inference the same info as yours *at the tree level*

Nothing you have proposed is fundamentally going to give you better info.
All you have done is annotated the IR in some places to make explicit
some bits in the dataflow problem that you could inference anyway.  It
is provable you can inference them with a simple lattice and
associated value, *unless you are going to start guessing* (which you
have said you don't want to do because it can generate incorrect
info).

There is absolutely no reason what you are trying to do needs to
modify the tree IR at all to achieve exactly the same accuracy of
debug info as your design proposes at the tree level.  You could
simply compute the global dataflow problem.

The RTL level is harder, of course.