This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
- From: Christophe Lyon <christophe dot lyon at linaro dot org>
- To: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>
- Cc: Ramana Radhakrishnan <ramana dot gcc at googlemail dot com>, Jim Wilson <jim dot wilson at linaro dot org>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 18 Feb 2016 11:16:19 +0100
- Subject: Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
- Authentication-results: sourceware.org; auth=none
- References: <CABXYE2VXjy7=5Y=c1TCxLE8KuwLtwBYBhTB24xrWDvWAeiBwbQ at mail dot gmail dot com> <CAJA7tRbQ751-o+T+sx9Z0Uaq1YoEocR507y4KtYQmYurTht8AA at mail dot gmail dot com> <56C1B74D dot 4070009 at foss dot arm dot com> <CAKdteOaepm0zCPvPsT1E9TtjFxopmDyKHmWKwZcdayoY6qQ6KA at mail dot gmail dot com> <56C445F6 dot 6040004 at foss dot arm dot com> <CAKdteOZZzE+aixcoiyqPRQ7qfwZF_QLM=eW2iVLuC39XYs6KDA at mail dot gmail dot com> <56C449CE dot 2020800 at foss dot arm dot com>
On 17 February 2016 at 11:22, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> On 17/02/16 10:20, Christophe Lyon wrote:
>>
>> On 17 February 2016 at 11:05, Kyrill Tkachov
>> <kyrylo.tkachov@foss.arm.com> wrote:
>>>
>>> On 17/02/16 10:03, Christophe Lyon wrote:
>>>>
>>>> On 15 February 2016 at 12:32, Kyrill Tkachov
>>>> <kyrylo.tkachov@foss.arm.com> wrote:
>>>>>
>>>>> On 04/02/16 08:58, Ramana Radhakrishnan wrote:
>>>>>>
>>>>>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson <jim.wilson@linaro.org>
>>>>>> wrote:
>>>>>>>
>>>>>>> This is my suggested fix for PR 65932, which is a linux kernel
>>>>>>> miscompile with gcc-5.1.
>>>>>>>
>>>>>>> The problem here is caused by a chain of events. The first is that
>>>>>>> the relatively new eipa_sra pass creates fake parameters that behave
>>>>>>> slightly differently than normal parameters. The second is that the
>>>>>>> optimizer creates phi nodes that copy local variables to fake
>>>>>>> parameters and/or vice versa. The third is that the ouf-of-ssa pass
>>>>>>> assumes that it can emit simple move instructions for these phi
>>>>>>> nodes.
>>>>>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>>>>>>> forces QImode and HImode to unsigned, but a
>>>>>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not. So signed char and
>>>>>>> short parameters have different in register representations than
>>>>>>> local
>>>>>>> variables, and require a conversion when copying between them, a
>>>>>>> conversion that the out-of-ssa pass can't easily emit.
>>>>>>>
>>>>>>> Ultimately, I think this is a problem in the arm backend. It should
>>>>>>> not have a PROMOTE_MODE macro that is changing the sign of char and
>>>>>>> short local variables. I also think that we should merge the
>>>>>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>>>>>>> prevent this from happening again.
>>>>>>>
>>>>>>> I see four general problems with the current ARM PROMOTE_MODE
>>>>>>> definition.
>>>>>>> 1) Unsigned char is only faster for armv5 and earlier, before the
>>>>>>> sxtb
>>>>>>> instruction was added. It is a lose for armv6 and later.
>>>>>>> 2) Unsigned short was only faster for targets that don't support
>>>>>>> unaligned accesses. Support for these targets was removed a while
>>>>>>> ago, and this PROMODE_MODE hunk should have been removed at the same
>>>>>>> time. It was accidentally left behind.
>>>>>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it
>>>>>>> was
>>>>>>> converted to a function, the PROMOTE_MODE code was copied without the
>>>>>>> UNSIGNEDP changes. Thus it is only an accident that
>>>>>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree. Changing
>>>>>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>>>>>>> changes to resolve the difference are safe.
>>>>>>> 4) There is a general principle that you should only change
>>>>>>> signedness
>>>>>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>>>>>>> in extra conversion instructions that make code slower. The mips64
>>>>>>> hardware for instance requires that 32-bit values be sign-extended
>>>>>>> regardless of type, and instructions may trap if this is not true.
>>>>>>> However, it has a set of 32-bit instructions that operate on these
>>>>>>> values, and hence no conversions are required. There is no similar
>>>>>>> case on ARM. Thus the conversions are unnecessary and unwise. This
>>>>>>> can be seen in the testcases where gcc emits both a zero-extend and a
>>>>>>> sign-extend inside a loop, as the sign-extend is required for a
>>>>>>> compare, and the zero-extend is required by PROMOTE_MODE.
>>>>>>
>>>>>> Given Kyrill's testing with the patch and the reasonably detailed
>>>>>> check of the effects of code generation changes - The arm.h hunk is ok
>>>>>> - I do think we should make this explicit in the documentation that
>>>>>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
>>>>>> better still maybe put in a checking assert for the same in the
>>>>>> mid-end but that could be the subject of a follow-up patch.
>>>>>>
>>>>>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
>>>>>> the testsuite fallout separately.
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I'd like to backport the arm.h from this ( r233130) to the GCC 5
>>>>> branch. As the CSE patch from my series had some fallout on x86_64
>>>>> due to a deficiency in the AVX patterns that is too invasive to fix
>>>>> at this stage (and presumably backport), I'd like to just backport
>>>>> this arm.h fix and adjust the tests to XFAIL the fallout that comes
>>>>> with not applying the CSE patch. The attached patch does that.
>>>>>
>>>>> The code quality fallout on code outside the testsuite is not
>>>>> that gread. The SPEC benchmarks are not affected by not applying
>>>>> the CSE change, and only a single sequence in a popular embedded
>>>>> benchmark
>>>>> shows some degradation for -mtune=cortex-a9 in the same way as the
>>>>> wmul-1.c and wmul-2.c tests.
>>>>>
>>>>> I think that's a fair tradeoff for fixing the wrong code bug on that
>>>>> branch.
>>>>>
>>>>> Ok to backport r233130 and the attached testsuite patch to the GCC 5
>>>>> branch?
>>>>>
>>>>> Thanks,
>>>>> Kyrill
>>>>>
>>>>> 2016-02-15 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>>>
>>>>> PR target/65932
>>>>> * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
>>>>> xfail the scan-assembler test.
>>>>> * gcc.target/arm/wmul-2.c: Likewise.
>>>>> * gcc.target/arm/wmul-3.c: Simplify test to generate a single
>>>>> smulbb.
>>>>>
>>>>>
>>>> Hi Kyrill,
>>>>
>>>> I've noticed that wmul-3 still fails on the gcc-5 branch when forcing
>>>> GCC
>>>> configuration to:
>>>> --with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16
>>>> (target arm-none-linux-gnueabihf)
>>>>
>>>> The generated code is:
>>>> sxth r0, r0
>>>> sxth r1, r1
>>>> mul r0, r1, r0
>>>> instead of
>>>> smulbb r0, r1, r0
>>>> on trunk.
>>>>
>>>> I guess we don't worry?
>>>
>>>
>>> Hi Christophe,
>>> Hmmm, I suspect we might want to backport
>>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01714.html
>>> to fix backend the costing logic of smulbb.
>>> Could you please try that patch to see if it helps?
>>>
>> Ha indeed, with the attached patch, we now generate smulbb.
>> I didn't run a full make check though.
>
>
> Thanks for checking.
>
>> OK with a suitable ChangeLog entry?
>
>
> Can you please do a full test run when you get the chance?
Done, I noticed no other difference.
> Since the patch is mine we'll need an ok from another arm maintainer ;)
OK :)
> Thanks,
> Kyrill
>
>
>>
>> Christophe.
>>
>>> Thanks,
>>> Kyrill
>>>
>>>
>>>>>> regards
>>>>>> Ramana
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> My change was tested with an arm bootstrap, make check, and SPEC
>>>>>>> CPU2000 run. The original poster verified that this gives a linux
>>>>>>> kernel that boots correctly.
>>>>>>>
>>>>>>> The PRMOTE_MODE change causes 3 testsuite testcases to fail. These
>>>>>>> are tests to verify that smulbb and/or smlabb are generated.
>>>>>>> Eliminating the unnecessary sign conversions causes us to get better
>>>>>>> code that doesn't include the smulbb and smlabb instructions. I had
>>>>>>> to modify the testcases to get them to emit the desired instructions.
>>>>>>> With the testcase changes there are no additional testsuite failures,
>>>>>>> though I'm concerned that these testcases with the changes may be
>>>>>>> fragile, and future changes may break them again.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> If there are ARM parts where smulbb/smlabb are faster than mul/mla,
>>>>>>> then maybe we should try to add new patterns to get the instructions
>>>>>>> emitted again for the unmodified testcases.
>>>>>>>
>>>>>>> Jim
>>>>>
>>>>>
>