This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][ARM][0/4] Fixing PR target/65932


Ping on the series:
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01717.html
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01714.html
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01716.html (already approved by Richi)
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01715.html

as well as
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02132.html
as described in the cover letter

Thanks,
Kyrill

On 22/01/16 09:52, Kyrill Tkachov wrote:
Hi all,

PR target/65932 is a wrong-code bug affecting arm and has manifested itself
when compiling the Linux kernel, so it's something that we really
ought to fix. The problem stems from the fact that PROMOTE_MODE and
TARGET_PROMOTE_FUNCTION_MODE don't match up on arm.
PROMOTE_MODE also marks the promotion as unsigned, whereas the
TARGET_PROMOTE_FUNCTION_MODE doesn't. This can lead to short variables
being wrongly zero-extended instead of sign-extended.  This also occurs
in PR target/67714.

Jim Wilson tried a few approaches and from the discussion
on the PR and on the ML (https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00814.html)
the preferred approach is to make PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE
match up. Changing TARGET_PROMOTE_FUNCTION_MODE to zero-extend would be an ABI
change so we don't want to do that.  Changing PROMOTE_MODE to not zero-extend
fixes both PR 65932 and 67714.  So Jim's patch is the first patch in this
series.

It has been posted at https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02132.html
and this series is based on top of the arm.h hunk of that patch.

There have been some concerns about the codegen quality fallout from Jim's
patch, which is what the remaining patches in this series address:

* 3 new arm testsuite failures: gcc.target/arm/wmul-[123].c.
wmul-1.c and wmul-2.c are cases where we no longer generate
sign-extend+multiply (and accumulate) instructions but instead generate
the normal full-width multiplies (the operands are sign-extended from
preceeding sign-extending loads).  This is a regression for some targets
on which the sign-extending form is faster. Patches 2 and 3 address this.
gcc.target/arm/wmul-3.c is a test where we actually end up generating
better code and so the testcase just needs to be adjusted.
Patch 4 deals with that.

* Sign-extending rather than zero-extending short values means we make more
use of the ldrsb and ldrsh arm instructions rather than the zero-extending
ldrb and ldrh.  On Thumb1 targets ldrsb and ldrsh have more limited addressing
modes (only REG + REG), which could hurt code size. However, the change also
means that we can now merge sequences of load-zero-extend followed by a sign-extend
into a single load-sign-extend.
So we'd turn a (ldrh ; sxth) into an (ldrsh).

I've built a few popular embedded benchmarks for a Cortex-M0 target (Thumb1) and looked
at the generated assembly for -Os and -O2. I did see both effects. That is,
ldrh instructions with an immediate being turned into two instructions:
    ldrh    r4, [r4, #12]
->
    movs    r5, #12
    ldrsh    r4, [r4, r5]

But I also observed the beneficial effect. Various register allocation
perturbations also featured in the changes
Overall, for every codebase I've looked at both -Os and -O2 the new code was
slightly smaller. Probably not enough to call this an outright win, but
definitely not a regression overall.

As for ARM and Thumb2 states, this series doesn't have a major impact.
SPEC2006 bencharks are slightly reduced in size (but nothing to write home
about) and there are no code quality regressions. And even the regressions
caught by the testsuite in the wmul-[12].c tests don't actually manifest
in practice in SPEC, but they are fixed anyway.

The series contains one target-independent change to CSE in patch 3 that
I'll explain there.

The series has been bootstrapped and tested on arm, aarch64 and x86_64.
Is this ok for trunk together with Jim's arm.h hunk from
g ?

P.S. This also fixes PR middlen-end/67295 which is a testsuite regression on arm.
Refer to the bugzilla entry there for the analysis of how this issue affects
that PR.

Thanks,
Kyrill



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]