This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][ARM]: Fix reload spill failure (PR 60617)


Ramana,

On 7 July 2014 13:48, Venkataramanan Kumar
<venkataramanan.kumar@linaro.org> wrote:
> Hi Ramana/Maxim,
>
>
> On 18 June 2014 16:05, Venkataramanan Kumar
> <venkataramanan.kumar@linaro.org> wrote:
>> Hi Ramana,
>>
>> On 18 June 2014 15:29, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
>>> On Mon, Jun 16, 2014 at 1:53 PM, Venkataramanan Kumar
>>> <venkataramanan.kumar@linaro.org> wrote:
>>>> Hi Maintainers,
>>>>
>>>> This patch fixes the PR 60617 that occurs when we turn on reload pass
>>>> in thumb2 mode.
>>>>
>>>> It occurs for the pattern "*ior_scc_scc" that gets generated for the 3
>>>> argument of the below function call.
>>>>
>>>> JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst)));
>>>>
>>>>
>>>> (----snip---)
>>>> (insn 634 633 635 27 (parallel [
>>>>             (set (reg:SI 3 r3)
>>>>                 (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>>>> r5 is registers gets assigned
>>>>                         (reg/v:SI 112 [ op2 ]))
>>>>                     (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>>>>                         (reg/v:SI 111 [ op1 ]))))
>>>>             (clobber (reg:CC 100 cc))
>>>>         ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300
>>>> {*ior_scc_scc
>>>> (----snip---)
>>>>
>>>> The issue here is that the above pattern demands 5 registers (LO_REGS).
>>>>
>>>> But when we are in reload, registers r0 is used for pointer to the
>>>> class, r1 and r2 for first and second argument. r7 is used for stack
>>>> pointer.
>>>>
>>>> So we are left with r3,r4,r5 and r6. But the above patterns needs five
>>>> LO_REGS. Hence we get spill failure when processing the last register
>>>> operand in that pattern,
>>>>
>>>> In ARM port,  TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and
>>>> for thumb 2 mode there is mention of using LO_REG in the comment as
>>>> below.
>>>>
>>>> "Care should be taken to avoid adding thumb-2 patterns that require
>>>> many low registers"
>>>>
>>>> So conservative fix is not to allow this pattern for Thumb-2 mode.
>>>
>>> I don't have an additional solution off the top of my head and
>>> probably need to go do some digging.
>>>
>>> It sounds like the conservative fix but what's the impact of doing so
>>> ? Have you measured that in terms of performance or code size on a
>>> range of benchmarks ?
>>>
>>>>
>>
>> I haven't done any benchmark testing. I will try and run some
>> benchmarks with my patch.
>>
>>
>>>> I allowed these pattern for Thumb2 when we have constant operands for
>>>> comparison. That makes the target tests arm/thum2-cond-cmp-1.c to
>>>> thum2-cond-cmp-4.c pass.
>>>
>>> That sounds fine and fair - no trouble there.
>>>
>>> My concern is with removing the register alternatives and loosing the
>>> ability to trigger conditional compares on 4.9 and trunk for Thumb1
>>> till the time the "new" conditional compare work makes it in.
>>>
>>> Ramana
>
> I tested this conservative fix with Coremark (ran it on chromebook)and
> SPEC cpu2006 (cross compiled and compared assembly differences).
>
> With Coremark there are no performance issues. In fact there no
> assembly differences with CPU flags for A15 and A9.
>
> For SPEC2006 I cross compiled and compared assembly differences with
> and without patch (-O3 -fno-common).
> I have not run these bechmarks.
>
> There are major code differences and are due to conditional compare
> instructions not getting generated as you expected. This also results
> in different physical register numbers assigned in the generated code
> and also there are code scheduling differences when comparing it with
> base.
>
>
> I am showing a simple test case to showcase the conditional compare
> difference I am seeing in SPEC2006 benchmarks.
>
> char a,b;
> int i;
> int f( int j)
> {
>   if ( (i >= a) ? (j <= a) : 1)
>     return 1;
>   else
>     return 0;
> }
>
> GCC FSF 4.9
> -----------
>
>         movw    r2, #:lower16:a
>         movw    r3, #:lower16:i
>         movt    r2, #:upper16:a
>         movt    r3, #:upper16:i
>         ldrb    r2, [r2]        @ zero_extendqisi2
>         ldr     r3, [r3]
>         cmp     r2, r3
>         it      le
>         cmple   r2, r0          <== conditional compare instrucion generated.
>         ite     ge
>         movge   r0, #1
>         movlt   r0, #0
>         bx      lr
>
>
> With patch
> ---------
>
>         movw    r2, #:lower16:a
>         movw    r3, #:lower16:i
>         movt    r2, #:upper16:a
>         movt    r3, #:upper16:i
>         ldr     r3, [r3]
>         ldrb    r2, [r2]        @ zero_extendqisi2
>         cmp     r2, r3
>         ite     le
>         movle   r3, #0 <== Unoptimal moves generated.
>         movgt   r3, #1 <==
>         cmp     r2, r0
>         ite     lt
>         movlt   r0, r3
>         orrge   r0, r3, #1<==
>         bx      lr
>
> The following benchmarks have maximum number of conditional compare
> pattern differences and also code scheduling changes/different
> physical register numbers in generated code.
> 416.gamess/
> 434.zeusmp/
> 400.perlbench
> 403.gcc/
> 445.gobmk
> 483.xalancbmk/
> 401.bzip2
> 433.milc/
> 435.gromacs
>
> Assembly files differences seen in the below benchmarks buy are not very large.
>
> 436.cactusADM/
> 447.dealII
> 454.calculix
> 456.hmmer
> 453.povray/
> 465.tonto/
> 471.omnetpp
> 459.GemsFDTD
> 437.leslie3d
> 464.h264ref
>
>
> Based on the results, I am of the opinion that this patch would cause
> some performance hit if we have it up streamed on trunk or GCC 4.9.
>
> Another thing to be noted is that this bug occurs when only when we
> compile with -fno-tree-dce and -mno-lra on GCC 4.9.
>
> Also with current FSF trunk and GCC 4.9,  we have LRA as default for
> ARM, which does not have this bug.
>
> My opinion is that reporter can use this fix to build the package that
> caused it and if it has to be built with -fno-tree-dce.
>
> I am not sure if we can do anything in machine descriptions level to
> prevent this bug. Also in reload pass I am not sure if it is easy to
> fix.
>
> Do you have any suggestions?
>
So this is a 4.8-only bug, or a 4.9/trunk when disabling LRA.
Ramana, do you think it's worth fixing it in 4.8, at the price of a
likely performance degradation, or leaving it open as "known bug" in
that branch?

Thanks,

Christophe.


> regards,
> Venkat.
>
>>
>> This bug does not occur when LRA is enabled. In 4.9 FSF and trunk, the
>>  LRA pass is enabled by default now .
>> May be too conservative, but is there a way to enable this pattern
>> when we have LRA pass and prevent it we have old reload pass?
>>
>> regards,
>> Venkat.
>>
>>
>>
>>>
>>>>
>>>> Regression tested with gcc 4.9 branch since in trunk this bug is
>>>> masked revision 209897.
>>>>
>>>> Please provide your suggestion on this patch
>>>>
>>>> regards,
>>>> Venkat.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]