This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH][AArch64] Fix wrong-code bug in right-shift SISD patterns
- From: Maxim Kuvyrkov <maxim dot kuvyrkov at linaro dot org>
- To: Kyrill Tkachov <kyrylo dot tkachov at arm dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Marcus Shawcroft <Marcus dot Shawcroft at arm dot com>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>
- Date: Wed, 18 Feb 2015 16:46:24 +0300
- Subject: Re: [PATCH][AArch64] Fix wrong-code bug in right-shift SISD patterns
- Authentication-results: sourceware.org; auth=none
- References: <54E4791D dot 4080401 at arm dot com> <AB30B1BC-2119-4069-ACEB-BAF187023C34 at linaro dot org> <54E496C4 dot 2050806 at arm dot com>
On Feb 18, 2015, at 4:42 PM, Kyrill Tkachov <email@example.com> wrote:
> On 18/02/15 12:32, Maxim Kuvyrkov wrote:
>> On Feb 18, 2015, at 2:35 PM, Kyrill Tkachov <firstname.lastname@example.org> wrote:
>>> Hi all,
>>> This patch fixes a wrong-code bug with the *aarch64_lshr_sisd_or_int_<mode>3
>>> pattern and its associated splitters. The problem is that for the 2nd
>>> alternative it will split a right-shift into a SISD left-shift by the negated
>>> amount to be shifted by (the ushl instruction allows such semantics).
>>> The splitter generates this RTL:
>>> (set (match_dup 2)
>>> (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
>>> (set (match_dup 0)
>>> (unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_USHL_2S))
>>> The problem here is that the shift amount register is negated without telling
>>> the register allocator about it (and it can't figure it out itself).
>>> So if you try to use the register that operand 2 is assigned to later on,
>>> you get the negated shift amount instead!
>>> The testcase in the patch demonstrates the simple code that can get miscompiled
>>> due to this behaviour.
>>> The solution in this patch is to negate the shift amount into the output
>>> operand (operand 0) and mark it as an earlyclobber in that alternative.
>>> This is actually exactly what the very similar
>>> *aarch64_ashr_sisd_or_int_<mode>3 pattern does below.
>>> I believe this is the safest and simplest fix at this stage.
>>> This bug was exposed on the Linaro 4.9 branch that happened to have the perfect
>>> storm of costs and register pressure and ended up miscompiling
>>> the TEST_BIT macro in ira-costs.c during a build of trunk by the generated
>>> Linaro compiler, generating essentially code like:
>>> neg d8, d8 //d8 negated!
>>> ushl v0.2s, v11.2s, v8.2s // shift right => shift left by neg amount
>>> fmov w0, s0
>>> <...irrelevant code...>
>>> b .L140
>>> fmov w0, s8 // s8/d8 used and incremented assuming it had not changed at L141
>>> add w0, w0, 1
>>> fmov s8, w0
>>> fmov w1, s10
>>> cmp w0, w1
>>> bne .L141
>>> Basically d8 is negated and later used as if it had not been at .L140 leading
>>> to completely wrong behaviour.
>>> With this patch that particular part of the assembly now contains at L141:
>>> neg d0, d8
>>> ushl v0.2s, v11.2s, v0.2s
>>> fmov w0, s0
>>> Leaving the original shift amount in d8 intact.
>>> This bug occurs on FSF trunk and 4.9 branch (not on 4.8 as the offending
>>> pattern was introduced for 4.9)
>>> Bootstrapped and tested on trunk and 4.9.
>>> Ok for trunk and 4.9?
>> First of all, applauses! I realize how difficult it was to reduce this problem.
>> Your patch looks OK to me, but I can't shake off feeling that it will pessimize cases when d8 is not used afterwards. In particular, your patch makes it impossible to use same register for output (operand 0) and inputs (operands 1 and 2).
>> Did you consider using SCRATCHes instead of re-using operand 0 with early clobber like in the attached [untested] patch? If I got it all correct, register allocator will get more freedom in deciding which register to use for negated shift temporary, while still allowing reusing register from operand 0 for one of the inputs.
> I considered it (but didn't try it) because we end up demanding a scratch register unnecessarily for the two alternatives that don't split which might pessimize register allocation.
That's not the case. The "X" constraint in (match_scratch) is special; it tells RA to not allocate register.
> For stage 4 I think my proposed fix is the minimal one and it keeps consistent with the other patterns in that area that were added all together with:
I think this approach is OK, as long as we revisit the possibility of using SCRATCHes in these and similar patterns at stage 1.