This is the mail archive of the
mailing list for the GCC project.
Re: Fix RTL sharing found during PPC bootstrap
> I'd be interested to see if you could investigate this further as
> I really think we'll generate better code. Imagine if this line
> wasn't in expand_mult_const, we'd re-use a memory again and again,
> or if this was a SUBREG to re-interpret a floating point constant
> as an integer, etc... It's the role of CSE, to spot such common
> sub-expressions and factor them out of repeated code, so I doubt it
> would ever decide to replace multiple uses of a "pseudo" (which has
> the cheapest rtx_cost you can get) with SUBREGs which may require
> code generation. And let's not forget all the other possible values
> of op0 which are neither REGs, SUBREGs or MEMs. It's because CSE,
> GCSE and cselib only track the values of pseudos, the RTL expanders
> try and place repeated expressions in their own registers, otherwise
> they wouldn't be able to detect that "SUBREG:SI (REG:SF)" was used
Well, I wonder if we should be seeing the expensive subregs nested in
expression operands at pre-combine time at all:
it is always a win to copy the expensive subreg into register at
expansion time since we want to see them CSEed out (only when not
optimizing, it is probably faster to keep it in, to speed up regalloc).
Only combine for now is in charge to put them into instructions
themselves knowing they are used just once. We don't use expansion
But it is probably better to not mess too much with this can of worms
while messing with another one, so as I mentioned, I don't feel too bad
about this patch ;)
> There is the possibility that fwprop may make some poor/good un-CSE
> decisions, but without a concrete example to stare at or some resulting
> assembly to ponder over, this is all theoretical. As for Junk RTL,
> you're the man with the extensive memory checking scripts.
> Thanks for all of your good work detecting and correcting RTL sharing
> violations. Historically, limited sharing has been "acceptable" (i.e.
> hasn't broken anything), but the current rule is that we shouldn't
> share trees or RTL, and having verifiers that enforce this is a
> good thing. However, its often the case that there's a better way
> to fix a sharing problem than copy_rtx. Blind use of copy_rtx can
> itself leak junk RTL, and once in the source it's difficult to verify
> later that we can safely remove a call to copy_rtx.
Yes, I am aware of that and I am also not at all happy with all the
copy_rtx noise. It is clear that we will se a lot of unwanted sharing,
this is also why I was holding the patch for so long.
I've discussed some alternatives with David on IRC yesterday, since the
amount of copying needed is pretty high and would probably just end up
with very many usually noop copy_rtx calls that is not going to make GCC
faster or source code prettier either. There are multiple special case
scenarios we can consider.
While the tricks of inventing new temporary generally works for
expansion, for splitting/peepholing that happens with no_new_pseudos
set, we need wole a lot of copy_rtx calls.
Ironically, the first incarnation of expand_synth_mult patch also simply
did force subregs into regs as it does now, but then I concluded that I
can't think of valid sequence of events when this extra copy instruction
should survive, so it is better and rather easy to not produce it at
first place :)
> [As a software architect I think there are significant memory savings
> that can be realized through canonicalization and *enforced* sharing
> of immutable directed-graph data structures, but that's not the design
> that GCC follows, and therefore all sharing violations are bugs].
Yep, I also experimented with idea sharing at least the SUBREGs. It is
tricky to do as well, since reload and others rewrite them in place and
fixes the problem just partially - we then see similar failures on other
valid operand RTXes.
> Thanks again for fixing these issues,