extend fwprop optimization

Wei Mi wmi@google.com
Mon Feb 25 23:32:00 GMT 2013


Hello,

I have a patch trying to extend the fwprop to propagate complex
expressions. I post it for discussion. Existing fwprop can propagate
the src of simple def insns (like: ra = const, ra = rb or ra =
subreg(rb)) to uses, but it cannot propagate the def insn like: ra =
rb + rc. Here is the motivation example below. Existing fwprop cannot
handle it because the def insn is not const/reg/subreg case. combine
phase also cannot handle the case because combine phase is based on
LINK_LOG and cannot handle single-def multiple down uses cases.

The motivational case:
http://gcc.gnu.org/ml/gcc/2013-01/msg00181.html

The extended fwprop iterates each def and tries to propagate the def
to multiple down uses, even if the def is a complex expression. The
propagation will create a series of change candidates, and we will
consider their costs in a group (Existing fwprop consider def-use pair
one by one). If all the uses for the def could be replaced, then
may_confirm_whole_group is true, which indicates the def insn could be
removed after all the changes are applied. The benefit of each change
is the subtract of the cost before the change and that after the
change. We also take insn splitting and peephole into consideration,
.i.e, the cost of the change is the cost after insn splitting and
peephole which may be applied to the insn changed. This is useful for
the motivational case, for which, the transformation from "a << (b &
63)" to "a << b" is done by insn splitting, so we need to consider the
cost after insn splitting.  total_benefit is the summation of the
benefits of all the changes. total_positive_benefit is the summation
of all the positive benefits. extra_benefit is the benefit to remove
the def insn if may_confirm_whole_group is true. If total_benefit +
extra_benefit >= total_positive_benefit, we choose to apply all the
changes and remove the def insn. If not, we choose to only apply the
positive benefit changes one by one.

Testing result:
a small number of regression failures caused by testcases limitation.
bootstrapped ok.

base: gcc r195411 -O2
test:   gcc r195411 + fwprop extension  -O2
dynamic insn number is got using "perf stat".

spec2000 O2 C/C++ benchmarks result.
CPU2000 INT   perf improvement (%)  dynamic insn number reduced (%)
164.gzip                        -0.27                               0.18
175.vpr                         0                                      0.22
176.gcc                         0.71                                0.06
181.mcf                         0.91                                -0.02
186.crafty                      0                                     0.23
197.parser                     -0.46                               0.32
252.eon                    *** 8.78                                 xxxx
253.perlbmk                  2.47                                 1.92
254.gap                         0                                      0.30
255.vortex                     2.17                                 0.22
256.bzip2                      -0.33                                1.37
300.twolf                       0.11                                  0.11

CPU2000 FP
177.mesa                      0.28                                 0.60
179.art                           0.64                                1.24
183.equake                   -0.38                                0.01
188.ammp                     0                                      0.09

spec2006 O2 C/C++ benchmarks result.
CPU2006 INT   perf improvement (%)  dynamic insn number reduced (%)
400.perlbench               2.47                                -0.06
401.bzip2                      1.28                                0.73
403.gcc                         0                                     0.10
429.mcf                         0                                     -0.06
445.gobmk                    0.68                                0.33
456.hmmer                   0.23                                 -0.01
458.sjeng                      -1.14                                0
462.libquantum       ***  7.52                                13.01
464.h264ref                   xxxx                                xxxx
471.omnetpp                -0.61                                0.06
473.astar                       0                                     0.45
483.xalancbmk              1.30                                0.02

CPU2006 FP
433.milc                         0                                     0.01
444.namd                      -0.25                               0
447.dealII                       xxxx                                xxxx
450.soplex                     0.84                                0.30
453.povray                     0                                     0.09
470.lbm                          -0.35                               0
482.sphinx3                    0.18                                0

*** Although eon and libquantum are improved a lot, the performance
improvement are not caused by fwpropext.  eon performance diff is
caused by code layout change. libquantum performance diff is because
after fwprop extension, a bad pre optimization is disabled in the
hottest loop.
*** I got endless running on 464.h264ref and compilation error on
447.deaIII for both with/without my changes. They are probably because
the spec configuration or options. So I just skipped those two tests
for simplicity.

Thanks,
Wei.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch
Type: application/octet-stream
Size: 42684 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20130225/af1f8824/attachment.obj>


More information about the Gcc-patches mailing list