This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Gcc 3.1 performance regressions with respect to 2.95.3

From: Jan Hubicka <jh at suse dot cz>
To: Jason Merrill <jason at redhat dot com>
Cc: Jan Hubicka <jh at suse dot cz>, Peter Schmid <schmid at snake dot iap dot physik dot tu-darmstadt dot de>, gcc at gcc dot gnu dot org, libstdc++ at gcc dot gnu dot org
Date: Sun, 17 Mar 2002 14:53:36 +0100
Subject: Re: Gcc 3.1 performance regressions with respect to 2.95.3
References: <Pine.LNX.4.30.0203120018400.27249-100000@snake.iap.physik.tu-darmstadt.de> <20020312102937.GA2673@atrey.karlin.mff.cuni.cz> <wvlpu23gvdo.fsf@prospero.cambridge.redhat.com>

> >> Gcc 3.1 is slower in the areas: E exception handling, L loop
> >> overhead, G io and S Stepanov.
> 
> The loop/Stepanov slowdown is an optimizer bug which also affects C.  For
> the loop in
> 
>   double accumulate(double* first, double* last, double result)
>   {
>     for (; first != last; ++first)
>       result += *first;
>     return result;
>   }
> 
> we used to produce
> 
>         cmpl %edx,%eax
>         je .L4
> .L6:
>         faddl (%eax)
>         addl $8,%eax
>         cmpl %edx,%eax
>         jne .L6
> .L4:
> 
> but now we produce
> 
> .L9:
>         cmpl    %edx, %eax
>         je      .L8
>         faddl   (%eax)
>         addl    $8, %eax
>         jmp     .L9
> .L8:
> 
> which is one insn shorter but about 20% slower on i686.  This pessimization
> seems to be performed by the flow2 pass; until that point the rtl looks
> like the old form.
Uhm, yes, it is the crossjumping guilty in this case.  I am aware of this,
just hoped it is not that important, as it hardly shows in spec2000 tests.
I do have code to solve this properly on the cfg branch that does re-duplicate
the compare in bb-reorder pass or does rotate the loop to get jump before
loop instead of after loop depending on the situation - expected number of
iterations of loop and whether loop is in the hot spot of program.

I don't know if this can be solved for branch. We probably should add switch to
disable crossjumping, that is desirable in other cases as well (it is rather
costy optimization and may produce slower code in some cases like this one).

Old crossjumping optimization didn't hit this case more by an mistake than
design.  Can we come up with solution that will effectivly disable crossjumping
in this case and do not pesimize other too much?

Perhaps I can prohibit crossjumping for edges whose frequency differs too much,
so we won't crossjump outside loop, but that will result in missing other benefiting
cases as well.  Or I can take into account loop depths and do not crossjump out.
THat sounds most plausible, but will hit problems with the fact that loop depth
is not completely up-to-date at that point.

I will send patch for evaulation later today.

Other sollution that comes into mind is to try to use LOOP notes to figure out
whether the loop has duplicated entry test.

Honza
> 
> Jason

Follow-Ups:
- Re: Gcc 3.1 performance regressions with respect to 2.95.3
  - From: Peter Schmid

References:
- Gcc 3.1 performance regressions with respect to 2.95.3
  - From: Peter Schmid
- Re: Gcc 3.1 performance regressions with respect to 2.95.3
  - From: Jan Hubicka
- Re: Gcc 3.1 performance regressions with respect to 2.95.3
  - From: Jason Merrill

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]