The old RTL loop optimizer

Loop.c has now been removed from the source tree for 4.2 and above.

GCC currently has three different loop optimization frameworks:

Loop optimizers that work on GIMPLE in the tree-ssa framework (including the
- linear loop transformations for convenience).
RTL loop optimizations written to use [gccsource:cfglayout.c] and other
- recently added infrastructure such as the shared dataflow module. Each optimization is in a separate pass and a separate file.
RTL loop optimizations in [gccsource:loop.c] and previously in unroll.c (which
- is removed) that are part of the "[ancient world||olden times]" of optimizers that were written at the RTL level. This set of passes is collectively called the old loop optimizer.

The main issue with that old loop optimizer is that it completely destroys the higher level representations such as the control flow graph, and the loops_structure. In fact, amazingly the old loop optimizer does not evenknowabout the CFG. This means that before the old loop passes are run, the CFG must be dropped, along with most information contained in it such as profile information and branch predictions, and that after running the old loop optimizer the CFG must be reconstructed. The information in the CFG is not entirely lost, but stuffed into e.g. REG_NOTES and on-the-side tables. But obviously it would be better for accuracy, and speed, to just not destroy the CFG to begin with.

In addition to not being CFG-aware, the old loop optimizer has a number of other issues, including,

It is one of those infamous GCCI-do-everythingstyle passes, even in its current
- half-subsumed form.
It requiresloop notesto find loops (because it can't find loops in the CFG if
- it doesn't know about the CFG, obviously), so there is a separate pass just to construct loop notes.
The optimizations still implemented in [gccsource:loop.c] are notoriously buggy,
- and notoriously difficult to debug too.

It has been suggested many times in recent years that the old loop optimizer should just die. Unfortunately, there has never been a real collective effort to get this rather large job done. Still, almost all the things loop.c does, are now done elsewhere too, so we should be very close to being able to kill loop.c

A comparison of the old loop optimizer with the new ones

This is an (incomplete!) list of things loop.c and unroll.c used to be able to do, that are now done elsewhere:

1.Loop unrollingwas in the now-removed unroll.c Now RTL loop unrolling is done in
- [gccsource:loop-unroll.c], as part of modulo scheduling in [gccsource:modulo-sched.c], and in [gccsource:tree-ssa-loop-ivcanon.c] for GIMPLE loops that can be completely
  unrolled. Basic induction variables are split in loop-unroll.c There is no pass to split general induction variables ([gccbug:20376]), but [gccsource:web.c] could do this quite easily.
1.Induction variable optimizationsused to be done in loop.c and the code to do it
- is still there. But it should be subsumed by tree-ssa-loop-ivopts.c:IVopts. This includes things like induction variable selection, final value replacement, and strength reduction.
1.Loop invariant code motionis in [gccsource:loop-invariant.c]. There is a pending
- patch (see below) for this pass that is necessary to make its capabilities equivalent
  to the existing code in loop.c The only thing loop-invariant.c cannot do is moving libcall blocks. But libcall notes should go away too.
1.Doloop the decrement-and-branch instructions optimization, is in [gccsource:loop-doloop.c].

Things that the new loop optimizers can do, that were never in the old loop optimizer:

1.Loop unswitchingis easy when you use the CFG. GCC can unswitch RTL loops in
- [gccsource:loop-unswitch.c] and GIMPLE loops in [gccsource:tree-ssa-loop-unswitch.c].
1.VectorizationandHigh Level Loop Transformations(e.g. loop interchange). 1.Swing Modulo Scheduling while not strictly part of the loop optimizers, uses the
- infrastructure provided by the new RTL loop optimizers.

Things that the old loop optimizer can do, that still need a replacement:

_GIV splitting_
1.Loop prefetchingof arrays. 1.Loop reversal

For the first one, the webizer can fill in. But this is useless anyway without the register renaming (according to SPEC testing on PPC and IA-64). The latter two things should both be replaced with equivalent GIMPLE based passes. A new loop prefetching pass was proposed already a number of times on the =gcc-patches= mailing list (see below) but so far it remains unreviewed. Loop reversal causes us to compare to zero instead of an arbitrary loop upper bound, and possibly saves one register inside the loop. For example,

   for (i = 0; i < n; i++) a[i]=i;

would be replaced with

   for (i = n - 1; i >= 0; i--) a[i]=i;

Implementing this in GIMPLE is trivial within the linear loop transformations framework. The problem is to decide when it is really useful: The code in question was designed to allow us to take advantage of dbCC instructions on VAXen m68ks HP-PA and other, well, old junk Such instructions were usually faster than separate decrement and branch instructions. But on modern CPUs, reversing loops this way may e.g. have adverse effects on hardware memory prefetching heuristics, which usually prefer loops looping forward through memory.

Effects on performance from disabling the old loop optimizer

Actually, SPEC scores improve slightly with the old loop optimizer disabled and a few patches patches for mainline. On the LNO branch it was already disabled by default and it is also disabled on the tree-profiling-branch For mainline, some SPEC numbers are reported in http://gcc.gnu.org/ml/gcc/2004-09/msg01476.html and its follow-ups, and more recently in http://gcc.gnu.org/ml/gcc-patches/2005-03/msg02880.html.

Some special cases have to be handled more carefully for mimicking the wisdom of the older loop optimizer. Issues holding up the removal of loop.c are being tracked in [gccbug:22366].