# Code Factoring Optimizations

## Latest News

2005-08-11

Add sequence abstraction on tree, add sinking part of local factoring on Tree-SSA.

2005-02-02

Some more details about the algorithms and some preliminary results added. To do list published.

2004-11-16

The branch is now open.

## Introduction

Code factoring is the name of a class of useful optimization techniques developed especially for code size reduction. These approaches aim to reduce size by restructuring the code.

The goal of this project is to add a new extension for improving the code size optimization of GCC with code factoring methods (code motion and merging algorithms). The implementation currently resides on the branch.

## Contributing

Checkout the cfo-branch branch from our respository.

When posting to the development lists, please mark messages and patches with [cfo] in the subject. The usual contribution and testing rules apply. This branch is maintained by Gabor Loki.

## Documentation

The project includes the following two code factoring algorithms:

• Local code factoring: This is a code motion technique. It moves identical instructions from basic blocks to their common predecessor or successor, if they have any. The semantics of the program have to be preserved of course, therefore only those instructions may be moved, which neither invalidate any existing dependencies nor introduce new ones. To obtain the best size reduction some of the instructions are moved upward (code hoisting) to the common predecessor, while some are moved downward (code sinking) to the common successor.
C code example:
```// Original source            // After local factoring
{                             {
I1;   // <= HOIST
if ( cond )                   if ( cond )
{                             {
I0;                           I0;
I1;   // <= HOIST
I2;   // <= SINK
I3;   // <= SINK
}                             }
else                          else
{                             {
I1;   // <= HOIST
I4;                           I4;
I5;                           I5;
I2;   // <= SINK
I3;   // <= SINK
}                             }
I2;   // <= SINK
I3;   // <= SINK
}                             }
```
• Sequence abstraction: It is a size optimization method. Unlike local factoring, this works with whole basic blocks instead of single instructions. The main idea of this technique is to find identical sequences of code, which can be turned into procedures and then replace all occurrences with calls to the newly created subrutine. It is kind of an opposite of function inlining.
C code example:
```// Original source            // After sequence abstraction
{                             {
void *jump_label;
...                           ...
jump_label = &&exit_0;
entry_0:
I0;                           I0;
I1;                           I1;
I2;                           I2;
I3;                           I3;
goto *jump_label;
exit_0:
...                           ...
jump_label = &&exit_1;
goto entry_0;
I0;
I1;
I2;
I3;
exit_1:
...                           ...
jump_label = &&exit_2;
goto entry_0;
I0;
I1;
I2;
I3;
exit_2:
...                           ...
jump_label = &&exit_3;
goto entry_0;
I0;
I1;
I2;
I3;
exit_3:
...                           ...
}                             }
```

Both algorithms have an opportunity of working on two different levels (Tree and RTL). Both have their own advantages and disadvantages. This project holds both types for future investigations.

For more information about code factoring see the article in the GCC Summit Proceedings (2004).

## Features

Currently the following algorithms are implemented on the branch:

• Local factoring on RTL (`-frtl-lfact`)
• Local factoring on Tree-SSA (`-ftree-lfact`)
• Sequence abstraction on RTL (`-frtl-seqabstr`)
• Sequence abstraction on Tree (`-ftree-seqabstr`)

## Preliminary results

The following results have been prepared using the CSiBE benchmark with respect to the mainline at the last merge (2005-07-11).

Code size save Compilation time multiplier
Target arm-elf i686-linux arm-elf i686-linux
Sequence abstraction on RTL 1.07% 1.43% 1.94 1.26
Sequence abstraction on Tree 0.34% 0.18% 1.25 1.25
Local factoring on RTL 0.11% 0.19% 1.01 0.99
Local factoring on Tree-SSA 0.31% 0.24% 1.02 1.01
Overall 1.96% 1.94% 2.08 1.49

## To do

• Implement procedural abstraction on IPA (interprocedural version of sequence abstraction).
• Improve the compilation time of the sequence abstraction using fingerprinting.
• Improve algorithms with merging similar instead of identical instructions/sequences.