[PATCH][ARM][0/4] Fixing PR target/65932
Thu Feb 4 19:35:00 GMT 2016
On Thu, Feb 4, 2016 at 1:34 AM, Kyrill Tkachov
> On 04/02/16 09:13, Ramana Radhakrishnan wrote:
>> On Fri, Jan 22, 2016 at 9:52 AM, Kyrill Tkachov
>> <firstname.lastname@example.org> wrote:
>>> Hi all,
>>> PR target/65932 is a wrong-code bug affecting arm and has manifested
>>> when compiling the Linux kernel, so it's something that we really
>>> ought to fix. The problem stems from the fact that PROMOTE_MODE and
>>> TARGET_PROMOTE_FUNCTION_MODE don't match up on arm.
>>> PROMOTE_MODE also marks the promotion as unsigned, whereas the
>>> TARGET_PROMOTE_FUNCTION_MODE doesn't. This can lead to short variables
>>> being wrongly zero-extended instead of sign-extended. This also occurs
>>> in PR target/67714.
>>> Jim Wilson tried a few approaches and from the discussion
>>> on the PR and on the ML
>>> the preferred approach is to make PROMOTE_MODE and
>>> match up. Changing TARGET_PROMOTE_FUNCTION_MODE to zero-extend would be
>>> change so we don't want to do that. Changing PROMOTE_MODE to not
>>> fixes both PR 65932 and 67714. So Jim's patch is the first patch in this
>>> It has been posted at
>>> and this series is based on top of the arm.h hunk of that patch.
>>> There have been some concerns about the codegen quality fallout from
>>> patch, which is what the remaining patches in this series address:
>>> * 3 new arm testsuite failures: gcc.target/arm/wmul-.c.
>>> wmul-1.c and wmul-2.c are cases where we no longer generate
>>> sign-extend+multiply (and accumulate) instructions but instead generate
>>> the normal full-width multiplies (the operands are sign-extended from
>>> preceeding sign-extending loads). This is a regression for some targets
>>> on which the sign-extending form is faster. Patches 2 and 3 address this.
>>> gcc.target/arm/wmul-3.c is a test where we actually end up generating
>>> better code and so the testcase just needs to be adjusted.
>>> Patch 4 deals with that.
>>> * Sign-extending rather than zero-extending short values means we make
>>> use of the ldrsb and ldrsh arm instructions rather than the
>>> ldrb and ldrh. On Thumb1 targets ldrsb and ldrsh have more limited
>>> modes (only REG + REG), which could hurt code size. However, the change
>>> means that we can now merge sequences of load-zero-extend followed by a
>>> into a single load-sign-extend.
>>> So we'd turn a (ldrh ; sxth) into an (ldrsh).
>>> I've built a few popular embedded benchmarks for a Cortex-M0 target
>>> and looked
>>> at the generated assembly for -Os and -O2. I did see both effects. That
>>> ldrh instructions with an immediate being turned into two instructions:
>>> ldrh r4, [r4, #12]
>>> movs r5, #12
>>> ldrsh r4, [r4, r5]
>>> But I also observed the beneficial effect. Various register allocation
>>> perturbations also featured in the changes
>>> Overall, for every codebase I've looked at both -Os and -O2 the new code
>>> slightly smaller. Probably not enough to call this an outright win, but
>>> definitely not a regression overall.
>>> As for ARM and Thumb2 states, this series doesn't have a major impact.
>>> SPEC2006 bencharks are slightly reduced in size (but nothing to write
>>> about) and there are no code quality regressions. And even the
>>> caught by the testsuite in the wmul-.c tests don't actually manifest
>>> in practice in SPEC, but they are fixed anyway.
>>> The series contains one target-independent change to CSE in patch 3 that
>>> I'll explain there.
>>> The series has been bootstrapped and tested on arm, aarch64 and x86_64.
>>> Is this ok for trunk together with Jim's arm.h hunk from
>>> g ?
>> The whole series is OK provided the middle-end hunk has been approved.
> Richi approved the midend hunk at:
More information about the Gcc-patches