This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] type promotion pass

I have attached revamped version of Kugan's original patch for type promotion
rebased on r249469. The motivation of the pass is to minimize
generation of subregs
to avoid redundant zero/sign extensions by carrying out computations
as much as possible on tree-level.

* Working of pass
The pass is dominator-based, and tries to promote type of def to
PROMOTE_MODE in each gimple stmt. Before beginning domwalk, all the
default definitions are promoted to PROMOTE_MODE
in promote_all_ssa_defined_with_nop (). The patch adds a new tree code
SEXT_EXPR to represent sign-extension on tree level. CONVERT_EXPR is
either replaced by an explicit sign or zero extension depending on the
signedness of operands.

The core of the pass is in following two routines:
a) promote_ssa: This pass looks at the def of gimple_stmt, and
promotes type of def to promoted_type in-place if possible. If it
cannot promote def in-place, then
it transforms:
def = def_stmt
new_def = def_stmt
def = convert_expr new_def
where new_def is a clone of def, and type of def is set to promoted_type.

b) fixup_use: The main intent is to "fix" uses of a promoted variable
to preserve semantics
of the code, for instance if the variable is used in stmt where it's
original type is required.
Another case is when def is not promoted by promote_ssa, but some uses
could be promoted.

promote_all_stmts () is the driver function that calls fixup_use and
promote_ssa for each stmt
within the basic block. The pass relies extensively on dom and vrp to
remove redundancies generated by the pass and is thus enabled only if
vrp is enabled.

1] Pass ordering: type-promote pass generates too many redundancies
which can hamper other optimizations. One case I observed was on arm
when it inhibited path splitting optimization because the number of
stmts in basic block exceeded the value of
param-max-jump-thread-duplication-stmts. So I placed the pass just
before dom. I am not sure if this is a good approach. Maybe the pass
should avoid generating redundant statements and not rely too much on
dom and vrp ?

2] Redundant copies left after the pass: When it's safe, vrp optimzies
_1 = op1 sext op2
_1 = op1

which leaves redundant copies that are not optimized away at GIMPLE level.
I placed pass_copy_prop after vrp to eliminate these copies but not sure if that
is ideal. Maybe we should do this during vrp itself as this is the
only case that
generates redundant copies ?

3] Backward phi's: Since we traverse in dominated order, fixup_use()
gets called first on a ssa_name that is an argument of a backward-phi.
If it promotes the type and later if promote_ssa() decides that the
ssa_name should not be promoted, then we have to again walk the
backward phi's to undo the "incorrect" promotion, which is done by
emitting a convert_expr back to the original type from promoted_type.
While I suppose it shouldn't affect correctness, it generates
redundant casts and was wondering if there's a better approach to
handle this issue.

* SPEC2k6 benchmarking:

Results of  benchmarking the patch for aarch64-linux-gnu cortex-a57:

401.bzip2          +2.11%
459.GemsFDTD  +1.42%
464.h264ref       +1.96%
471.omnetpp      +1.05%
481.wrf                +0.99%

447.dealII            -1.34%
450.soplex          -1.54%
456.hmmer          -3.79%
482.sphinx3         -2.95%

The remaining benchmarks didn't have much difference. However there
was some noise
in the above run, and I suppose the numbers are not precise.
I will give another run with benchmarking. There was no significant
difference in code-size with and without patch for SPEC2k6.

* Validation

The patch passes bootstrap+test on x86_64-unknown-linux-gnu,
aarch64-linux-gnu and ppc64le-linux-gnu. On arm-linux-gnueabihf, and
aarch64-linux-gnu, there is following fallout:

1] gcc.dg/attr-alloc_size-11.c missing range info for signed char
(test for warnings, line 50)
The issue seems to be that we call reset_flow_sensitive_info(def) if
def is promoted and that invalidates value range which is probably
causing the regression.

2] gcc.dg/fold-narrowbopcst-1.c scan-tree-dump optimized " = _.* \\\\+ 156"
This requires adjusting the scan-dump.

On ppc64le-linux-gnu, I am observing several regressions in the
testsuite. Most of these seem to be because type-promotion is
interfering with other optimizations, especially widening_mul and
bswap pass. Also observed fallout for some cases due to interference
with strlen, tail-call and jump-threading passes. I suppose I am not
seeing these on arm/aarch64 because ppc64 defines
PROMOTE_MODE to be DImode and aarch64 defines it to be SImode ? I am
working to fix performance regressions. However I would be grateful to
receive feedback on the pass, if it's going in the right direction.


Attachment: latest-8.diff
Description: Text document

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]