This is the mail archive of the
mailing list for the GCC project.
Re: Some statement counts for gcc
> On Mon, Aug 26, 2002 at 08:18:15AM -0500, Brad Lucier wrote:
> > The problem is that gcc's fp code generator on x87 is broken enough that
> > you can get different results for the same expression, hence the use
> > of the simulator which does not use extended-precision arithmetic
> > by default.
> > I'm not really sure that the simulator is unreasonably slow.
> 33% of user time spent in branch prediction seems like something worth
> at least looking at, to me.
We got a plan to implement the userlevel .h file that will do FP on
volatile arguments, (poor mans -fcaller-save) and replace current
emulation. I guess it will speed up by at least 10% making the problem
mood. I didn't have time to implement this yet, but I will try to do so
for faster compiler branch.
What happends is that the code recursivly traces the loop graph and the
Brad's testcase contains a lot of nested loops.
> > > If I remember correctly this code has a very complicated flow graph,
> > > and branch prediction may not help much; perhaps the right thing is
> > > to detect code like this and disable that optimization.
> > This has been the response to several of my recent observations about
> > gcc's algorithms, etc. I'd prefer that if there are problems they be
> > fixed rather than papered over by a -fbrad's_code_don't_optimize flag.
> In general I agree with you. However, do I remember correctly that
> this is the inner loop of a threaded (Forth sense) interpreter? Lots
> of tiny blocks with computed-goto edges both in and out? There really
> isn't much good branch prediction can do on code like that.
You would be surprised, but especially in the case of large complex
functions the code to identify hot blocks on basic block branch
prediction works very well.
I was tracing code in natural very similar to Brad's testcase and it did
worked with about 90% sucess (ie 90% of hot blocks of the function
(based on profile) were marked as hot by prediction code too).
THis is because the programs largery just obey the loop tree.
> > Although the GC statistics indicate not much memory use, this code took up
> > to 3.8 GB of swap when running.
> Treat the GC statistics with a pile of salt; they only count data
> still live at end of compilation. Also, I think ra.c has a lot of
> local data allocated with xmalloc, which GC doesn't see at all.