Fibonacci and performance

Boehm, Hans
Tue May 1 09:57:00 GMT 2001

> From: Tom Tromey [ ]
> Hans> Isn't there a fundamental problem here?  The testing of the
> Hans> "initialized" flag may not be reordered with respect to variable
> Hans> references, and hence should be treated as volatile (and
> Hans> possibly requiring a memory barrier) by the back end.  But if
> Hans> it's treated as volatile, you can't optimize out redundant
> Hans> tests?
> Jeff> I think AG is describing something else.  His method-local flag
> Jeff> serves to eliminate redundant calls to _Jv_InitClass at compile
> Jeff> time.
I'm confused.  Is the method-local flag static?  I had assumed so.
If not, then the reordering issues go away, but you've only solved part of
problem (which is fine for now).
Tom> But Hans is saying that in AG's scenario the check of the method-local
Tom> flag could be reordered with respect to access to class variables,
Tom> unless the flag is volatile.
Tom> I'm not sure this can actually happen.  The test will look like:
Tom>     if (! method-local-flag) { _Jv_InitClass (...); m-l-f = true; }
Tom> Will the compiler really pull a class variable access before this?  I
Tom> don't see how it could do that.  It seems to me that if the compiler
Tom> could do that then our current approach of always calling
Tom> _Jv_InitClass is also broken.
After I posted the earlier message, I temporarily arrived at the same
conclusion as Tom just did.  But after thinking about it some more, I think
the compiler reordering may not happen yet in gcc, but there are other
compilers that might already do so, maybe even with good reason.  And gcc is
likely to start doing this sort of thing.

Let s_f be a static field, and x be a local variable.

I claim that (at least on something with more than 8 registers) the compiler
should usually transform 

  if (! m_l_f) { _Jv_InitClass (...); m_l_f = true; }
  x = s_f;
  ... x ...

to something like
  x = s_f;
  if (! m_l_f) { _Jv_InitClass (...); m_l_f = true; x = s_f; }
  ... x ...

This hides more of the poential load latency for s_f along the fast path.
(Though both loads should happen before the conditional, I suspect that
usually the actual load of m_l_f should still precede the load of s_f in the
final schedule, since it's used first.  But since they're now in the same
basic block, and there is no dependency, I'd be unwilling to bet on that.
The outcome may depend on other considerations.)

Probably the more important issue is hardware reordering.  Treating m_l_f as
volatile deals with that issue on Itanium, but it's not sufficient on Alpha.

None of this applies if m_l_f is an automatic variable.


More information about the Java mailing list