Ideas on basic block reordering location and frequency

Jason Eckhardt
Mon Feb 14 08:41:00 GMT 2000

Jeff Law and I have discussed ideas about when the basic block reordering
code should happen for maximum benefit. Currently, we run the full reordering
pass just before jump2. However, as Jeff's comments indicate (included below),
it might also be useful to run a reduced version of the pass earlier as well.
We would be interested in others comments or opinions on this.

------- Jeff:
My current thinking is that we actually want
to run it twice.  But I think this needs to be discussed on gcc-local and
or the external mailing list.  But the basic idea goes something like this..

The reordering code knows how to take code which is physically inside the
loop, but not part of the loop structure and physically remove it from the
loop.  We may want to run that part of the basic block reordering pass early
in the compiler.

A simple and classic example is the eqntott loop:

       for (i = 0; i < ninputs; i++) {
                aa = a[0]->ptand[i];
                bb = b[0]->ptand[i];
                if (aa == 2)
                        aa = 0;
                if (bb == 2)
                        bb = 0;
                if (aa != bb) {
                        if (aa < bb) {
                                return (-1);
                        else    {
                                return (1);

If code starting with (if aa != bb) will be physically inside the loop, even
though it only executes once (when the loop terminates).  Having this code
inside the loop is bad for a number of reasons:

	1. For machines which do simple static hardware prediction such
	   as BTFNT, we will get a mis-predicted branch in every iteration
	   of the loop.  We can avoid the mis-predicted branch if we can
	   remove the non-looping code from the loop.

	2. With that code physically inside the loop, the loop itself takes
	   more icache lines than is really necessary.  So removing it from
	   the loop will reduce the icache footprint of the loop.

	3. In some cases having non-loop code physically inside the loop
	   will pessimize GCC's loop optimizer.  Obviously this is bad.

[ FWIW, I don't actually care about eqntott, but it is a good simple
  example of this kind of problem.  It's a lot easier to work on that
  some of the other larger codes which exhibit the same kind of behavior. ]

So, if we can run part of the bb reordering pass and get the crud out of
the loop just before the loop optimizer runs, we'll probably derive some
noticeable benefits from bb reordering immediately.

I was thinking that the more general aspect of basic block reordering should
probably run just before or just after the final jump pass.  Though that's
just a gut feeling on my part.

More information about the Gcc mailing list