This is the mail archive of the
mailing list for the GCC project.
Re: Serious performance regression -- some tree optimizer questions
- From: Zdenek Dvorak <rakdver at atrey dot karlin dot mff dot cuni dot cz>
- To: Ulrich Weigand <Ulrich dot Weigand at de dot ibm dot com>
- Cc: Zdenek Dvorak <dvorakz at suse dot de>, gcc at gcc dot gnu dot org,Michael Matz <matz at suse dot de>
- Date: Mon, 3 Jan 2005 18:15:32 +0100
- Subject: Re: Serious performance regression -- some tree optimizer questions
- References: <20041229160630.GA13438@atrey.karlin.mff.cuni.cz> <OFE35FE4D3.C4B180C6-ON41256F7E.005D050F-41256F7E.005DD34D@de.ibm.com>
> Zdenek Dvorak <firstname.lastname@example.org> wrote on 12/29/2004
> 05:06:30 PM:
> > > Together with --param iv-consider-all-candidates-bound=100
> > > I'm now getting quite good code for the resid routine.
> > >
> > > The reason why I still need that param appears to be that
> > > ivopts does not recognize that an IV like &A[i] is related
> > > to a use like &A[i+1]. It appears that add_address_candidates
> > > would be supposed to handle this, but it doesn't -- it only
> > > sees though array references with a constant offset, not those
> > > with an offset that itself has a variable and a constant part.
> > >
> > > I've tried to add code recognizing that case there, but then
> > > the candidates still aren't chosen because they get assigned
> > > very high cost; this is because fold-const is unable to
> > > determine that &A[i+1] - &A[i] is a constant ...
> > try including these patches:
> > http://gcc.gnu.org/ml/gcc-patches/2004-12/msg01381.html
> > I think they might help with some of the problems you mention
> > above.
> Indeed, the patch #7 on that mail does generate the correct
> candidates in add_address_candidates, and it also assigns them
> low cost so they are chosen. In fact, with your sign-extend
> patch and patch #7 I'm getting exactly the optimal set of
> IVs selected.
> Unfortunately, when ivopts then tries to adapt the uses of the
> form &A[i+1] to an IV representing &A[i], it simply emits code
> of the form IV + &A[i+1] - &A[i], and expects fold to clean
> this up to IV + sizeof (A) -- which doesn't happen.
> I've added a hack to get_computation_at to *also* call
> strip_offset (just like get_computation_cost_at now does
> when patch #7 is applied), and now I'm getting good code ...
this is a bit dangerous -- I am fairly sure that strip_offset is
sometimes wrong. This is OK in the current use (when it is used just
inside the heuristics to choose the right candidates), but using it
for code generation would almost surely lead to misscompilations.
I will try to come up with some solution for the problem.