The old RTL loop optimizer

Loop.c has now been removed from the source tree for 4.2 and above.

GCC currently has three different loop optimization frameworks:

  1. Loop optimizers that work on GIMPLE in the tree-ssa framework (including the

    • linear loop transformations for convenience).
  2. RTL loop optimizations written to use [gccsource:cfglayout.c] and other

    • recently added infrastructure such as the shared dataflow module. Each optimization is in a separate pass and a separate file.
  3. RTL loop optimizations in [gccsource:loop.c] and previously in unroll.c (which

    • is removed) that are part of the "[ancient world||olden times]" of optimizers that were written at the RTL level. This set of passes is collectively called the old loop optimizer.

The main issue with that old loop optimizer is that it completely destroys the higher level representations such as the control flow graph, and the loops_structure. In fact, amazingly the old loop optimizer does not evenknowabout the CFG. This means that before the old loop passes are run, the CFG must be dropped, along with most information contained in it such as profile information and branch predictions, and that after running the old loop optimizer the CFG must be reconstructed. The information in the CFG is not entirely lost, but stuffed into e.g. REG_NOTES and on-the-side tables. But obviously it would be better for accuracy, and speed, to just not destroy the CFG to begin with.

In addition to not being CFG-aware, the old loop optimizer has a number of other issues, including,

It has been suggested many times in recent years that the old loop optimizer should just die. Unfortunately, there has never been a real collective effort to get this rather large job done. Still, almost all the things loop.c does, are now done elsewhere too, so we should be very close to being able to kill loop.c

A comparison of the old loop optimizer with the new ones

This is an (incomplete!) list of things loop.c and unroll.c used to be able to do, that are now done elsewhere:

Things that the new loop optimizers can do, that were never in the old loop optimizer:

Things that the old loop optimizer can do, that still need a replacement:

  1. _GIV splitting_

    1.Loop prefetchingof arrays. 1.Loop reversal

For the first one, the webizer can fill in. But this is useless anyway without the register renaming (according to SPEC testing on PPC and IA-64). The latter two things should both be replaced with equivalent GIMPLE based passes. A new loop prefetching pass was proposed already a number of times on the =gcc-patches= mailing list (see below) but so far it remains unreviewed. Loop reversal causes us to compare to zero instead of an arbitrary loop upper bound, and possibly saves one register inside the loop. For example,

   for (i = 0; i < n; i++) a[i]=i;

   for (i = n - 1; i >= 0; i--) a[i]=i;

Implementing this in GIMPLE is trivial within the linear loop transformations framework. The problem is to decide when it is really useful: The code in question was designed to allow us to take advantage of dbCC instructions on VAXen m68ks HP-PA and other, well, old junk ;-) Such instructions were usually faster than separate decrement and branch instructions. But on modern CPUs, reversing loops this way may e.g. have adverse effects on hardware memory prefetching heuristics, which usually prefer loops looping forward through memory.

Effects on performance from disabling the old loop optimizer

Actually, SPEC scores improve slightly with the old loop optimizer disabled and a few patches patches for mainline. On the LNO branch it was already disabled by default and it is also disabled on the tree-profiling-branch For mainline, some SPEC numbers are reported in and its follow-ups, and more recently in

Some special cases have to be handled more carefully for mimicking the wisdom of the older loop optimizer. Issues holding up the removal of loop.c are being tracked in [gccbug:22366].

None: old_loop_optimizer (last edited 2008-01-10 19:38:37 by localhost)