This is the mail archive of the
mailing list for the GCC project.
Re: Combine performance regression (was: Fix PR target/18701)
- From: Ulrich Weigand <uweigand at de dot ibm dot com>
- To: hp at bitrange dot com (Hans-Peter Nilsson)
- Cc: Ulrich Weigand <uweigand at de dot ibm dot com>, roger at eyesopen dot com, gcc-patches at gcc dot gnu dot org
- Date: Fri, 21 Jan 2005 00:16:20 +0100 (CET)
- Subject: Re: Combine performance regression (was: Fix PR target/18701)
Hans-Peter Nilsson wrote:
> On Thu, 20 Jan 2005, Ulrich Weigand wrote:
> > A crucial optimization for the bzip2 case is thus to combine a
> > mem-to-reg load into such an AND and form a mem-to-reg zero-extend
> > pattern from it. However, the AND in question necessarily uses a
> > paradoxical subreg, so we have this situation:
> No, I don't think it's necessary, or at least that the bugs
> that make it seem necessary should be fixed.
If you use AND to express zero-extend (which is b.t.w. the default
method optab uses), then you necessarily need to use a paradoxical
subreg (of a REG). I do not see how this constitutes a bug ...
> > Combine would now substitute the (mem ...) for (reg 79) in
> > insn 226, and further optimization would detect that the
> > (and (subreg (mem))) pattern is actually equivalent to a
> > zero-extend.
> And that's my beef: it should have been expressed as a
> zero_extend here too.
Well, it would have *been* simplified to a zero_extend, during
this very invocation of try_combine, if your patch hadn't prevented
that simplification from being performed!
> > Thus, I do not like this patch ;-) In fact, I rather disagree
> > with the reasoning leading up to it.
> What reasoning? Do you mean that it's ok to widen MEM accesses
> or do you mean that paradoxical mem subregs are a valid method
> to express sign- and zero extension?
I mean the reasoning that a paradoxical subreg of a MEM necessarily
implies "widening a MEM access". As I said, a paradoxical MEM
subreg (in the absence of LOAD_EXTEND_OP) is supposed to have exactly
the same meaning as loading that MEM (in its original mode) into a
register, and then accessing a paradoxical subreg of that register.
> > After combine has finished the whole simplifcation process it
> > will re-recognize the insn. If a target doesn't want to accept
> > this particular construct, it should be rejected *here*, if it
> > is still present, not earlier.
> Well, the widening of a mem access must be stopped at the point
> where the widening, the change to a real access happens; it
> can't be done later, because then we no longer know that it's
> widened. :-)
Well, as I see it, simply forming a paradoxical MEM subreg
constitutes neither a widening nor any other change to the
'real access' to memory, so nothing needs to be stopped.
If some *later* pass thinks to convert a paradoxical MEM subreg
into an actual widened MEM access, then *that* pass needs to verify
whether this is OK. (E.g. as reload does when it converts paradoxical
subregs of spilled pseudos to widened MEMs.)
> > (Alternatively, it might be
> > accepted, and later on cleaned up by reload -- there is code in
> > reload to do so, have you investigated why it didn't fix your
> > problem?)
> This suggestion suggests that you did not look at the whole of
> the analysis. This was a secondary patch, for a bug discovered
> through the *other* bug.
I see I had misinterpreted the somewhat lengthy discussion; I
had thought that after the gen_lowpart_for_combine patch (which I
agree is correct) was applied, you were still seeing a bug.
> The whole point of *this* patch is to
> stop a paradoxical mem subreg from being *widened*. If the
> s390x port relies on that happening, I suggest that's incorrect.
The only thing s390 relies upon is that combine is able to
combine a mem->reg load with a reg->reg zero-extend (expressed
using AND) into a mem->reg zero-extend (expressed as ZERO_EXTEND).
> I don't see your argument why actually widening is ok. Perhaps
> I missed the forest for the trees?
Actually widening the MEM is *not* OK. B.t.w. this is also what
gen_lowpart_for_combine did wrong before you patch: it would
replace a mem:HI with a mem:SI in your example. This is *wrong*,
even if it had computed the address properly. After your
patch it should now return a (subreg:SI (mem:HI)), which is
just fine in my opinion. This construct can then be either
rejected in recog, or else fixed up by reload.
Dr. Ulrich Weigand
Linux on zSeries Development