Gcc 3.1 performance regressions with respect to 2.95.3

Jan Hubicka jh@suse.cz
Sun Mar 17 05:54:00 GMT 2002


> >> Gcc 3.1 is slower in the areas: E exception handling, L loop
> >> overhead, G io and S Stepanov.
> 
> The loop/Stepanov slowdown is an optimizer bug which also affects C.  For
> the loop in
> 
>   double accumulate(double* first, double* last, double result)
>   {
>     for (; first != last; ++first)
>       result += *first;
>     return result;
>   }
> 
> we used to produce
> 
>         cmpl %edx,%eax
>         je .L4
> .L6:
>         faddl (%eax)
>         addl $8,%eax
>         cmpl %edx,%eax
>         jne .L6
> .L4:
> 
> but now we produce
> 
> .L9:
>         cmpl    %edx, %eax
>         je      .L8
>         faddl   (%eax)
>         addl    $8, %eax
>         jmp     .L9
> .L8:
> 
> which is one insn shorter but about 20% slower on i686.  This pessimization
> seems to be performed by the flow2 pass; until that point the rtl looks
> like the old form.
Uhm, yes, it is the crossjumping guilty in this case.  I am aware of this,
just hoped it is not that important, as it hardly shows in spec2000 tests.
I do have code to solve this properly on the cfg branch that does re-duplicate
the compare in bb-reorder pass or does rotate the loop to get jump before
loop instead of after loop depending on the situation - expected number of
iterations of loop and whether loop is in the hot spot of program.

I don't know if this can be solved for branch. We probably should add switch to
disable crossjumping, that is desirable in other cases as well (it is rather
costy optimization and may produce slower code in some cases like this one).

Old crossjumping optimization didn't hit this case more by an mistake than
design.  Can we come up with solution that will effectivly disable crossjumping
in this case and do not pesimize other too much?

Perhaps I can prohibit crossjumping for edges whose frequency differs too much,
so we won't crossjump outside loop, but that will result in missing other benefiting
cases as well.  Or I can take into account loop depths and do not crossjump out.
THat sounds most plausible, but will hit problems with the fact that loop depth
is not completely up-to-date at that point.

I will send patch for evaulation later today.

Other sollution that comes into mind is to try to use LOOP notes to figure out
whether the loop has duplicated entry test.

Honza
> 
> Jason



More information about the Gcc mailing list