[PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling
Kewen.Lin
linkw@linux.ibm.com
Thu Jan 16 09:41:00 GMT 2020
Hi,
As we discussed in the thread
https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00196.html
Original: https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00104.html,
I'm working to teach IVOPTs to consider D-form group access during unrolling.
The difference on D-form and other forms during unrolling is we can put the
stride into displacement field to avoid additional step increment. eg:
With X-form (uf step increment):
...
LD A = baseA, X
LD B = baseB, X
ST C = baseC, X
X = X + stride
LD A = baseA, X
LD B = baseB, X
ST C = baseC, X
X = X + stride
LD A = baseA, X
LD B = baseB, X
ST C = baseC, X
X = X + stride
...
With D-form (one step increment for each base):
...
LD A = baseA, OFF
LD B = baseB, OFF
ST C = baseC, OFF
LD A = baseA, OFF+stride
LD B = baseB, OFF+stride
ST C = baseC, OFF+stride
LD A = baseA, OFF+2*stride
LD B = baseB, OFF+2*stride
ST C = baseC, OFF+2*stride
...
baseA += stride * uf
baseB += stride * uf
baseC += stride * uf
Imagining that if the loop get unrolled by 8 times, then 3 step updates with
D-form vs. 8 step updates with X-form. Here we only need to check stride
meet D-form field requirement, since if OFF doesn't meet, we can construct
baseA' with baseA + OFF.
This patch set consists four parts:
[PATCH 1/4 GCC11] Add middle-end unroll factor estimation
Add unroll factor estimation in middle-end. It mainly refers to current
RTL unroll factor determination in function decide_unrolling and its
sub calls. As Richard B. suggested, we probably can force unroll factor
with this and avoid duplicate unroll factor calculation, but I think it
need more benchmarking work and should be handled separately.
[PATCH 2/4 GCC11] Add target hook stride_dform_valid_p
Add one target hook to determine whether the current memory access with
the given mode, stride and other flags have available D-form supports.
[PATCH 3/4 GCC11] IVOPTs Consider cost_step on different forms during unrolling
Teach IVOPTs to identify address type iv group with D-form preferred,
and flag dform_p of their derived iv cands. Considering unroll factor,
increase iv cost with (uf - 1) * cost_step if it's not a dform iv cand.
[PATCH 4/4 GCC11] rs6000: P9 D-form test cases
Add some test cases, mainly copied from Kelvin's patch.
Bootstrapped and regress tested on powerpc64le-linux-gnu.
I'll take two weeks leave soon, please expect late responses.
Thanks a lot in advance!
BR,
Kewen
------------
gcc/cfgloop.h | 3 +
gcc/config/rs6000/rs6000.c | 56 ++++++++++++++++-
gcc/doc/tm.texi | 14 +++++
gcc/doc/tm.texi.in | 4 ++
gcc/target.def | 21 ++++++-
gcc/testsuite/gcc.target/powerpc/p9-dform-0.c | 43 +++++++++++++
gcc/testsuite/gcc.target/powerpc/p9-dform-1.c | 55 +++++++++++++++++
gcc/testsuite/gcc.target/powerpc/p9-dform-2.c | 12 ++++
gcc/testsuite/gcc.target/powerpc/p9-dform-3.c | 15 +++++
gcc/testsuite/gcc.target/powerpc/p9-dform-4.c | 12 ++++
gcc/testsuite/gcc.target/powerpc/p9-dform-generic.h | 34 +++++++++++
gcc/tree-ssa-loop-ivopts.c | 84 +++++++++++++++++++++++++-
gcc/tree-ssa-loop-manip.c | 254 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
gcc/tree-ssa-loop-manip.h | 3 +-
gcc/tree-ssa-loop.c | 33 ++++++++++
gcc/tree-ssa-loop.h | 2 +
16 files changed, 640 insertions(+), 5 deletions(-)
More information about the Gcc-patches
mailing list