This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFA: Reduce iterations of elimination bookkeeping


Jeff Law wrote:
A few weeks ago I instrumented the top section of the main reload loop. Specifically, I was looking to see how often we have to do elimination bookkeeping.

For an x86 bootstrap, we call reload 101k times and we perform elimination bookkeeping 151k times. Obviously, we're iterating less than half the times we call reload -- that was somewhat of a surprise.

However, there's still 50k iterations of elimination bookkeeping that are worth looking at.

We will iterate on elimination bookkeeping any time the size of the frame changes. So if we spill a pseudo to memory, allocate a caller-save slot, align the stack, spill a memory address. We also iterate for a variety of other reasons such as unexpected changes in elimination offsets, a previously eliminable register is no longer eliminable, any spill code generation.

About 13k of the iterations occur because we allocated a stack slot for caller-save registers. Yup, that's right, 13k just because we allocated a slot for a caller-save. These iterations can be trivially avoided by allocating caller-save slots before we do the elimination bookkeeping.

About 26k iterations occur because of requested stack alignments. Yes, we allocate a slot to ensure stack alignment *after* elimination bookkeeping. Interestingly enough, we used to align prior to elimination bookkeeping, but the code was moved in response to pr29248 and pr28966. Fortunately, all that was really necessary to fix those PRs was to avoid aligning the stack if none had yet been allocated -- changing the sequencing of elimination bookkeeping & stack alignment was not necessary and obviously was making more work for reload than was necessary.

With those two fixes to sequencing, we can eliminate 38k (of the 50k) iterations of elimination bookkeeping. The remaining iterations are almost exclusively due to spilling.

From a code generation standpoint, these changes permute where objects land in the frame, so it's possible we can get minor code generation differences on targets with restricted displacements in reg+d addressing modes. We can also get some changes in cache behaviour. I would expect both effects to be neutral overall.

From a compile-time standpoint, we're clearly doing less work, so I'd expect some minor (possibly unmeasurable) compile-time improvements.

Jeff, sorry. I thought too that the patch makes compiler faster (and it was quite obvious for me). But valgrind --tool=lackey actually shows 0.3% more executed insn s for -O2 combine.i on x86 after applying the patch. I think the problem may be (at least partially) in bigger code generation, e.g. for combine.i

83082 0 792 83874 147a2 c0.o
83098 0 792 83890 147b2 c1.o <- after the patch


I see also that compiler allocates bigger stack space for many functions after applying the patch even in cases when all stack displacements are the same. I have no idea why is that.
Bootstrapped and regression tested on i686-pc-linux-gnu. I also verified pr29248 and pr28966 continue to generate the desired code.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]