This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC][ARM]: Fix reload spill failure (PR 60617)
- From: Christophe Lyon <christophe dot lyon at linaro dot org>
- To: Ramana Radhakrishnan <ramrad01 at arm dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Patch Tracking <patch at linaro dot org>, Maxim Kuvyrkov <maxim dot kuvyrkov at linaro dot org>, doko at gcc dot gnu dot org
- Date: Fri, 26 Sep 2014 16:26:41 +0200
- Subject: Re: [RFC][ARM]: Fix reload spill failure (PR 60617)
- Authentication-results: sourceware.org; auth=none
- References: <CAJK_mQ2UXGt+zrJP_NLbn4V3rE0cK+3kaC0FnWXTT=3xidC1HA at mail dot gmail dot com> <CAJA7tRYgmNGLnB7yNX9yP6iKUO7Hp9xT+=SWwc3niYEQva1uVg at mail dot gmail dot com> <CAJK_mQ3vwTsPGtoL4bj2=cPDNggu10J3-G9FKPV6r-j7zbELgQ at mail dot gmail dot com> <CAJK_mQ3Lz-ntRBwyo54iQyCUW7HV0XPFFNAyw-0oa+yq+TSXdQ at mail dot gmail dot com>
Ramana,
On 7 July 2014 13:48, Venkataramanan Kumar
<venkataramanan.kumar@linaro.org> wrote:
> Hi Ramana/Maxim,
>
>
> On 18 June 2014 16:05, Venkataramanan Kumar
> <venkataramanan.kumar@linaro.org> wrote:
>> Hi Ramana,
>>
>> On 18 June 2014 15:29, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
>>> On Mon, Jun 16, 2014 at 1:53 PM, Venkataramanan Kumar
>>> <venkataramanan.kumar@linaro.org> wrote:
>>>> Hi Maintainers,
>>>>
>>>> This patch fixes the PR 60617 that occurs when we turn on reload pass
>>>> in thumb2 mode.
>>>>
>>>> It occurs for the pattern "*ior_scc_scc" that gets generated for the 3
>>>> argument of the below function call.
>>>>
>>>> JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst)));
>>>>
>>>>
>>>> (----snip---)
>>>> (insn 634 633 635 27 (parallel [
>>>> (set (reg:SI 3 r3)
>>>> (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>>>> r5 is registers gets assigned
>>>> (reg/v:SI 112 [ op2 ]))
>>>> (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>>>> (reg/v:SI 111 [ op1 ]))))
>>>> (clobber (reg:CC 100 cc))
>>>> ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300
>>>> {*ior_scc_scc
>>>> (----snip---)
>>>>
>>>> The issue here is that the above pattern demands 5 registers (LO_REGS).
>>>>
>>>> But when we are in reload, registers r0 is used for pointer to the
>>>> class, r1 and r2 for first and second argument. r7 is used for stack
>>>> pointer.
>>>>
>>>> So we are left with r3,r4,r5 and r6. But the above patterns needs five
>>>> LO_REGS. Hence we get spill failure when processing the last register
>>>> operand in that pattern,
>>>>
>>>> In ARM port, TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and
>>>> for thumb 2 mode there is mention of using LO_REG in the comment as
>>>> below.
>>>>
>>>> "Care should be taken to avoid adding thumb-2 patterns that require
>>>> many low registers"
>>>>
>>>> So conservative fix is not to allow this pattern for Thumb-2 mode.
>>>
>>> I don't have an additional solution off the top of my head and
>>> probably need to go do some digging.
>>>
>>> It sounds like the conservative fix but what's the impact of doing so
>>> ? Have you measured that in terms of performance or code size on a
>>> range of benchmarks ?
>>>
>>>>
>>
>> I haven't done any benchmark testing. I will try and run some
>> benchmarks with my patch.
>>
>>
>>>> I allowed these pattern for Thumb2 when we have constant operands for
>>>> comparison. That makes the target tests arm/thum2-cond-cmp-1.c to
>>>> thum2-cond-cmp-4.c pass.
>>>
>>> That sounds fine and fair - no trouble there.
>>>
>>> My concern is with removing the register alternatives and loosing the
>>> ability to trigger conditional compares on 4.9 and trunk for Thumb1
>>> till the time the "new" conditional compare work makes it in.
>>>
>>> Ramana
>
> I tested this conservative fix with Coremark (ran it on chromebook)and
> SPEC cpu2006 (cross compiled and compared assembly differences).
>
> With Coremark there are no performance issues. In fact there no
> assembly differences with CPU flags for A15 and A9.
>
> For SPEC2006 I cross compiled and compared assembly differences with
> and without patch (-O3 -fno-common).
> I have not run these bechmarks.
>
> There are major code differences and are due to conditional compare
> instructions not getting generated as you expected. This also results
> in different physical register numbers assigned in the generated code
> and also there are code scheduling differences when comparing it with
> base.
>
>
> I am showing a simple test case to showcase the conditional compare
> difference I am seeing in SPEC2006 benchmarks.
>
> char a,b;
> int i;
> int f( int j)
> {
> if ( (i >= a) ? (j <= a) : 1)
> return 1;
> else
> return 0;
> }
>
> GCC FSF 4.9
> -----------
>
> movw r2, #:lower16:a
> movw r3, #:lower16:i
> movt r2, #:upper16:a
> movt r3, #:upper16:i
> ldrb r2, [r2] @ zero_extendqisi2
> ldr r3, [r3]
> cmp r2, r3
> it le
> cmple r2, r0 <== conditional compare instrucion generated.
> ite ge
> movge r0, #1
> movlt r0, #0
> bx lr
>
>
> With patch
> ---------
>
> movw r2, #:lower16:a
> movw r3, #:lower16:i
> movt r2, #:upper16:a
> movt r3, #:upper16:i
> ldr r3, [r3]
> ldrb r2, [r2] @ zero_extendqisi2
> cmp r2, r3
> ite le
> movle r3, #0 <== Unoptimal moves generated.
> movgt r3, #1 <==
> cmp r2, r0
> ite lt
> movlt r0, r3
> orrge r0, r3, #1<==
> bx lr
>
> The following benchmarks have maximum number of conditional compare
> pattern differences and also code scheduling changes/different
> physical register numbers in generated code.
> 416.gamess/
> 434.zeusmp/
> 400.perlbench
> 403.gcc/
> 445.gobmk
> 483.xalancbmk/
> 401.bzip2
> 433.milc/
> 435.gromacs
>
> Assembly files differences seen in the below benchmarks buy are not very large.
>
> 436.cactusADM/
> 447.dealII
> 454.calculix
> 456.hmmer
> 453.povray/
> 465.tonto/
> 471.omnetpp
> 459.GemsFDTD
> 437.leslie3d
> 464.h264ref
>
>
> Based on the results, I am of the opinion that this patch would cause
> some performance hit if we have it up streamed on trunk or GCC 4.9.
>
> Another thing to be noted is that this bug occurs when only when we
> compile with -fno-tree-dce and -mno-lra on GCC 4.9.
>
> Also with current FSF trunk and GCC 4.9, we have LRA as default for
> ARM, which does not have this bug.
>
> My opinion is that reporter can use this fix to build the package that
> caused it and if it has to be built with -fno-tree-dce.
>
> I am not sure if we can do anything in machine descriptions level to
> prevent this bug. Also in reload pass I am not sure if it is easy to
> fix.
>
> Do you have any suggestions?
>
So this is a 4.8-only bug, or a 4.9/trunk when disabling LRA.
Ramana, do you think it's worth fixing it in 4.8, at the price of a
likely performance degradation, or leaving it open as "known bug" in
that branch?
Thanks,
Christophe.
> regards,
> Venkat.
>
>>
>> This bug does not occur when LRA is enabled. In 4.9 FSF and trunk, the
>> LRA pass is enabled by default now .
>> May be too conservative, but is there a way to enable this pattern
>> when we have LRA pass and prevent it we have old reload pass?
>>
>> regards,
>> Venkat.
>>
>>
>>
>>>
>>>>
>>>> Regression tested with gcc 4.9 branch since in trunk this bug is
>>>> masked revision 209897.
>>>>
>>>> Please provide your suggestion on this patch
>>>>
>>>> regards,
>>>> Venkat.