Hi,
We've found that constant splitting on ARM can be very inefficient, if
it's done inside a loop.
For example, the expression
a = a & 0xff00ff00;
will be translated into the following code (on ARM, only 8-bit values
shifted by an even number can be used as immediate arguments):
bic r0, r0, #16711680
bic r0, r0, #255
This makes perfect sense, unless this code is in a loop, and there are
many instructions using the same bit mask. In that case, we would
want to put 0xff00ff00 constant into a register, let
pass_rtl_move_loop_invariants put it outside a loop and reuse it for
every appropriate bitwise AND inside a loop.
This is a real-life example (from evas rasterization library), where
fixing this issue speeds up expedite test suite on average by 6% and
up to 20% on several tests.
Why the splitting happens?
On 4.4, the only problem was GCSE, which propagated separate pseudo
register with a constant into a consumer insn, i.e.
r123 = 0xff00ff00; r124 = r125 & r123
was transformed into
r124 = r125 & 0xff00ff00
After that, the constant within AND expression is not considered as
loop invariant any more, and is not moved outside a loop. This can be
fixed by checking whether the insn transformed by GCSE will require
splitting, and if it does, then the transformation should not be done
at earlier GCSE passes. We may check it by comparing rtx_cost of the
constant we're going to propagate with GCSE with rtx_cost of
const_int(1).
If moving loop invariant fails (e.g. due to register pressure), then
pass_combine still can propagate it inside AND, and in this case it
will result in the same code.
After this patch
http://gcc.gnu.org/ml/gcc-patches/2009-08/msg01032.html , such
constants are split as early as expand pass, so there's no chance for
loop invariant code motion pass to deal with them.
So, the questions are:
1) Is it really necessary to split constants on ARM at the time of
expand? At least, loop invariant code motion can work better if
splitting happens later.