Out-of-sight compile times for calculate_loop_depth

Brad Lucier lucier@math.purdue.edu
Tue Feb 8 20:08:00 GMT 2000


Michael Hayes
> Brad Lucier writes:
>  > I think that code like this is reasonable, but we've differed on this
>  > point before ...
> 
> Lets's say that is atypical---it is certainly several standard
> deviations from the mean program.

Well, it's extreme even by my standards, but it is no more than 5
times as big as the typical programs I compile, which themselves do not
compile quickly.  Code like this is typical if you want to support
continuations and complete tail-call elimination, things that are
not supported directly in C (so the code implements a trampoline).

> I had to run the compiler overnight on your testcase to get to the bit
> that interested me.  This is what I found:  

I can run some tests myself if that will help you.  Or I can find
you a smaller test case.

> We probably need a bailout mechanism if we detect a CFG with many
> shared loop headers.  These will never be optimised very well anyway.

I don't feel so great about this option.  Some optimizations are already
left out in -O2 if the CFG is big enough, if I remember correctly. That
doesn't affect me directly, because I generally don't use -O2; but I don't
think it's a good idea to bury "problem" optimizations that take a lot
of time and don't produce good results on some programs, because then
there is very little chance that they will ever be improved.

We went through this with the register allocator last fall.  Early mail
from Jeff Law says, essentially, "Your code is atypical, the new register
allocator is much better than the old one, it's reasonable that it takes
883 seconds on your weird code, changing it will likely make it slower/
take more memory on more typical code."  (He said it more politely
than this, of course.)  A quadratic algorithm was used
there, too.  Then Joern weighed in with, "OK, I can change the code to use 
EXECUTE_IF_SET_IN_BITMAP instead of checking each and every bit in
the bitmap in a loop; this will still be quadratic, but it will fall
through very quickly if all the bits in a word are zero."  And then
Jeff made a change to the algorithm that did not unnecessarily record some
conflicts.

In the end, the register allocator code ran in 16 seconds instead of 883
seconds.  The only algorithmic change was the one that Jeff made at the
end, which cut it from 33 seconds to the final 16 seconds.  The rest of
the speed-up happened to come from using the macros that were available.
And it took several people a good bit of time.

So don't turn it off in the development mainline yet.  I don't see a
release in sight; perhaps something fruitful can be done between now and then.

Brad


More information about the Gcc mailing list