This is the mail archive of the
mailing list for the GCC project.
Re: PR 23551: why should we coalesce inlined variables?
On Sun, 2007-07-08 at 22:24 -0300, Alexandre Oliva wrote:
> On Jul 8, 2007, Jan Hubicka <email@example.com> wrote:
> > I think it could work well enough if we was just collecting for each
> > variable list of other variables that has been coalesced into it.
> So you (hopefully) see using variable names as part of SSA names is
> just a distraction that hampers optimization. There's no codegen
The only optimization I know of that cares about the base name is out
of ssa when it builds conflict graphs for coalescing, and it manages to
keep that to a reasonable size by only attempting to coalesce things
with the same name. (ie, all the a_xxx ssa_names are in a graph of
their own, all the b_ are in a graph of their own, etc.) The experiments
which put ALL ssa names in the same conflict graph used horrendous
amount of memory on a large program, and bought us very ,very little.
since a_x and b_x will never be coalesced, adding conflicts for them is
a waste of memory. So using the same anonymous base name for variables
DOES remove useful optimization information.
copyrename is used to avoid this hampering when it sees opportunities
for a coalecse that mihgt be useful, and it is handcuffed currently for
user variables because we have not segregated out debug info. The
existence of copyrename DOES serve a purpose. if we remove the
restriction of user variable because we can provide debug info outside
of the base name, then we'll get (should anyway :-) exactly the same
results as your anonymous naming, but I can keep the conflict graph
sizes to something rational.
It is a completely orthogonal issue and should be taken up separately
once the debug issue is resolved and we have info we might be able to
utilize in out of ssa and when printing out our listings. Right now, if
I was looking at a function and the ssa_names were all value_X instead
of their symbolic values, I'd get very frustrated very quickly trying to
figure out what is what. And Im not interested in adding and maintaining
a bunch of annotations to help out, especially when I don't think it is
> reason to retain any resemblance whatsoever between SSA names and user
> variables, just like there's no codegen reason to retain any
> resemblance between user variables and registers or stack slots
> they're assigned to. These are only simplifying conventions that
> enable us to emit some useful debug information while still generating
> reasonable code.
> I think we may have to revisit this decision if we want to get better
> optimization and also to emit better debug info, and enabling the
> disentanglement between SSA names and user variables would be a step
> in this direction.
The problem we have is that we don't separate out the debug info name
from the ssa_variable name. That is the first step. I think we should
follow something like the following:
1 - Add a link for each ssa name to the debug symbol. Either in a side
table, or in the ssa-name itself. Doesn't matter. It should only be
set/examined via access routines/macros so the underlying implementation
is initially unimportant.
This step would also involve adding the machinery in out-of-ssa to
emit debug info for an ssa_name range instead of letting it fall to the
symbol like it does now. This is important.
Right now, one of the biggest contributors to loss of debug is out of
ssa when it does live range splitting.
a = X
a = Y
causes 'a' to become 2 different symbols. 'a' and 'a.0' or somesuch
thing. This would trivially solve that problem.
It would also solve the problems that copyrename introduces when the
base variable is changed to help with coalescing. It would no longer
affect debug info (which is the only reason we avoid copyrenaming 2 user
variables right now). All those restrictions could be removed.
This first stage should improve our debug info beyond what we have now
quite a bit, and is in fact a bit of useful infrastructure for the day
when out of ssa subsumes expand to produce RTL directly. I envisioned
the ssa_names themselves becoming the symbols in RTL rather than the
names that get made up for them, like 'a.0' or 'tmp.6'. It would make
looking at the rtl listing map closer to the ssa tree listings. Its been
on a to-do list for some time.
2 - Once that is working, start looking at when coalesces are actually
performed. Anytime a coalesce is performed, a routine needs to be
called to take care of updating info. The only case that adds
complication is when the 2 peices of debug info represent 2 user
variable are being coalesced. The rest of the time, is seems like a
pretty straightforward 'change the debug info for the other ssa-name to
the user variable' rule.
For 2 user variable, we now have to deal with overlaps. generally
speaking, the coalesce happens because the 2 variables contain the same
value. This may require a addition to the table that allows extra debug
names to be attached to an ssa_name. so instead of a single debug entry,
there are 2 or more.
THis will in turn lead to complications where we next try to coalesce
a 2 names, one has a single debug name, the other have 2 debugs names.
For me, at this point, we are getting into the research area of
debugging optimized code, and Im sure we can figure something out to
handle this case when it happens if it turns out to be incredibly
important. That remains to be seen.
I think doing starting with the first step will make more of an
improvement in debug info without affecting anything else than the path
you are currently proceeding down. It will pick up a lot of the common
cases we currently blow. Once that is done, we can look at other
non-intrusive approaches to improving the overlap situations.