[PATCH][combine] Do not call rtx costs on potentially unrecognisable rtxes in combine

Mon Apr 20 14:04:00 GMT 2015

Hi all,

I'm trying to reduce the cases where the midend calls the backend rtx costs on bogus rtl for which the backend
doesn't have patterns or ways of handling. Having to handle these kinds of rtxes sanely bloats those
functions and makes them harder to maintain.

One of the cases where this occurs is in combine and distribute_and_simplify_rtx in particular.
Citing the comment at that function:
" See if X is of the form (* (+ A B) C), and if so convert to
    (+ (* A C) (* B C)) and try to simplify.
  Most of the time, this results in no change.  However, if some of
    the operands are the same or inverses of each other, simplifications
    will result."

The problem is that after it applies the distributive law it calls rtx costs
to figure out whether the rtx became simpler. This rtx can get pretty complex.
For example, on arm I've seen it try to cost:
(plus:SI (mult:SI (plus:SI (reg:SI 232 [ m1 ])
             (const_int 1 [0x1]))
         (reg:SI 232 [ m1 ]))
     (plus:SI (reg:SI 232 [ m1 ])
         (const_int 1 [0x1])))

which is never going to match anything on arm anyway, so why should the costs function handle it?
In any case, I believe combine's design is such that it should first be attempting to call
recog and split on the rtxes, and only if that succeeds should it be making a target-specific
decision on which rtx to prefer. distribute_and_simplify_rtx goes against that by calling
rtx costs on an unverified rtx in attempt to gauge its complexity.

This patch remedies that by removing the call to rtx costs and instead manually performing
a relatively simple check on whether the resultant rtx was simplified. That is, using the example
from the comment, whether (+ (* A C) (* B C)) still has + at the top and * in the two operands.
This should give a good indication on whether any meaningful simplification was made (The '+' and '*'
operators in the example can be any operators that can be distributed over).

Initially, I wanted to just return the distributed version and let recog reject the invalid rtxes
but that caused some code quality regressions on arm where the original rtx would not recog but
would match a beneficial splitter, whereas the distributed rtx would not.

With this patch I saw almost no codegen differences on arm for the whole of SPEC2006.
The one exception was 416.gamess where it managed to merge a mul and an add into an mla
which resulted in a slightly better code sequence. That was in a pretty large file and I
don't speak Fortran'ese, so I couldn't really reduce a testcase for it, but my guess is that
before the patch the costs would return some essentially random value for an arbitrarily complex rtx
that it was passed to, which changed the decision in distribute_and_simplify_rtx on whether
to return the distributed rtx, which could have impacted further optimisations in combine.

I tried it on x86_64 as well. Again, there were almost no codegen differences. The exception
was tonto and wrf where a few instructions were eliminated, but no significant difference.
The resultant binaries for these two were a tiny bit smaller, with no impact on runtime.

Therefore I claim that this a safe thing to do, as it leaves the target-specific rtx cost
judgements in combine to be made only on valid recog-ed rtxes, and not having them cancel
optimisations early due to rtx costs not handling arbitrary rtxes well.

Bootstrapped on arm, x86_64, aarch64 (all linux). Tested on arm,aarch64.

Ok for trunk?

Thanks,
Kyrill

2015-04-20  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * combine.c (distribute_and_simplify_rtx): Do not check rtx costs.
     Look at the rtx codes to see if a simplification occured.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: combine-distribute-simplify.patch
Type: text/x-patch
Size: 1032 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20150420/937f4009/attachment.bin>