Designs for better debug info in GCC

Wed Dec 19 06:07:00 GMT 2007

On Dec 18, 2007, "Daniel Berlin" <dberlin@dberlin.org> wrote:

> Consider PRE alone,

> If your debug statement strategy is "move debug statements when we
> insert code that is equivalent"

Move?  Debug statements don't move, in general.  I'm not sure what you
have in mind, but I sense some disconnect here.

> because our equivalence is based on value equivalence, not location
> equivalence.  We only guarantee it has the same value as the
> whatever it is a copy of at that point, not that it has the same
> location.

This sounds perfect to me.  I'm concerned about values.  Locations are
an implementation detail.  The thing to keep in mind is that what was
originally a single user variable may end up mangloptimized into
multiple stack slots, registers, with multiple simultaneously-live
versions.  Trying to pretend that any of these represent the user
variable sounds like a recipe for madness to me.  So I focus on values
instead, and then on trying to recover locations based on binding and
sharing of values.

> How do i say debug info for some variable is now dead, we have no idea
> what it is right now?

For annotations, look for VAR_DEBUG_VALUE_NOVALUE in tree.h and
VAR_LOC_UNKNOWN_P in rtl.h, in the VTA branch.

For dwarf location lists, you just refrain from emitting locations for
a given range.

> How do I figure out which debug statements need to be modified when
> you introduce new memory operations?

None.  By definition, debug annotations are only about variables that
are not addressable.  Those that are are fixed at a single location,
so there's no reason to track them in a fancy way.

> If i insert a new call
> DEBUG(x, x_3): 1
> x_3 = x

> foo() // May modify x and *&x)

> y = x_3

> Now you have two problems.

You're talking about a real problem, but your example is misguided.
Let me give you a real problem scenario.

(set (reg <T>) (<whatever>))
(var_location x (reg <T>))
(set (mem <addr>) (reg <T>))
(set (reg <T>) (<somethingelse>))
(call (mem (symbol_ref foo)))

So, at the var_location debug_insn, we know that x is in reg <T>.
That's stored at *addr, so now we might be able to use it as an
additional location for x.  And then, when reg is modified, we remove
T from the equivalence class, and then only location holding the value
of x is *addr.  Then, a function call, that might modify *addr.

So, do we decide that x is no longer available after the call, or do
we hope *addr still represents it?

The thing to remember is that the annotations are only about gimple
regs.  This means calls don't modify them, ever.  But we still have to
decide whether *addr represents x or not.

My thoughts are leaning towards looking at the memory address or other
memory attributes to tell whether it's an addressable stack slot or
not.  If it's addressable, remove it from the equivalence class at the
call, so the equivalence class becomes empty, and the variable is
regarded as dead.  If it's not addressable (a pseudo assigned to
memory), then we can keep it, even if x is actually dead past the
call.

What we'll see is that, if x is not dead after the call, the compiler
will arrange to preserve its value in one such local non-addressable
stack slot, and it will probably extend the equivalence class again
after the call, as the pseudo is restored.  Or the pseudo will be
temporarily assigned to a call-saved register, which, for being
call-saved, won't be removed from equivalence classes at call
instructions.  Whereas, if x is dead and its value was just copied to
some random memory location, then we may as well flag it as dead at
the call site, where the memory location may be modified.

So, it all works out nicely, because we know we're only dealing with
gimple regs.

volatile asms make this slightly trickier, because they're totally
unpredictable.  I'm thinking it's safe to simply remove addressable
memory locations from equivalence classes at them, just for safety,
but I don't have it completely figured out.

> #3 is a dataflow problem, and not something you want to do every time
> you insert a call.

I'm not sure what you mean by "inserting calls".  We don't do that.
Calls are present in the source code (even when implied by stuff like
TLS, OpenMP or builtins such as memcpy), and they're either kept
around, eliminated or inlined.

(disgression intended to be funny: this "inserting a call" discussion
reminds me of those impossible initial conditions in electromagnetism
textbook exercises, such as uniform magnetic fields in which charged
particle suddenly appear ;-)

> If your answer is #1 or #2, then what you are really doing is
> computing roughly the same dataflow problem var-location does, except
> on trees and with a different meet-operation.

I am actually computing the same dataflow problem of var-tracking.
That's the whole point.  But I'm giving it more information, to enable
it to track more variables.  And it needs to deal with multiple
concurrent locations for the same variable, and multiple variables in
the same locations, which are "slight" complications.  But you're
right, in the end it's the same problem.

But I'm not computing that in trees.  I'm just collecting and
maintaining data points for var-tracking, all the way from the tree
level.

> var-location generates incorrect info not because it represents
> something fundamentally different than you are (it doesn't), it falls
> down because it uses union as the meet operation.

> It says "oh, i don't know which of these locations is right, it must
> be both of them".

However, it can't deal with parallel locations, so this is at odds
with your statement.  I haven't got 'round to studying the exact
dataflow algorithm var-tracking uses, I just figured I needed to do
something along these lines.  Maybe it does need tweaking, if I end up
using it.  I'm not sure yet it's going to make sense to use it for the
more detailed tracking of copying that I'm going to have to do.

> If you changed the meet operation to "oh, i don't know which of these
> locations is right, it must be none of them", and did a little more
> work you would inference the same info as yours *at the tree level*

Intersection sounds like the right approach to me.  I assumed
var-tracking did this, except for unknowns.  It's a bit trickier than
this because var-tracking has to deal with a lot of incomplete
information.  But at least for vta values, we are going to have a
complete picture, so we can be stricter when it comes to gimple reg
variables.

Now, whether the fact that we could infer the very same values at the
tree level is relevant, I don't know.  The tree level is neither
source level nor the final executable code, so unless we can establish
useful mappings from the tree level to both source level and final
executable code, this information is of little use, no matter how true
it is.

> Nothing you have proposed is fundamentally going to give you better info.

Except for what tree transformations currently discard, such as the
points of the program in which variables are bound to values.  This is
indeed the one of the elements that the annotations are trying to
preserve, that the compiler has not cared about preserving.  (The
other being expressions that end up not computed at run time, but that
could still be computed by a debugger based on state available
elsewhere)

> All you have done is annotated the IR in some places to make explicit
> some bits in the dataflow problem that you could inference anyway.

Now, this is not true.  I could infer values, yes, but I couldn't
infer the variables they relate to, nor the point of binding.  And
debug information is not just about the values, it's about mapping
variables to values and locations.  So, we can't infer all the
information we need.

> There is absolutely no reason what you are trying to do needs to
> modify the tree IR at all to achieve exactly the same accuracy of
> debug info as your design proposes at the tree level.

So far these claims have been unconvincing.  I still get the feeling
that you're missing some aspects of the problem, but I invite you to
show me how the information available in the current IR could be used
to generate accurate debug information for the two examples in the
design document.  Even if we leave the RTL aspect of it aside for a
moment.  I certainly wouldn't mind having to generate annotations only
when we move from Trees to RTL, but I can't imagine how we'd
reintroduce bindings at points that are not marked in the tree level,
for variables that are (partially or entirely) gone from the tree IR.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member         http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}