This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH PR69042/01]Add IV candidate for use with constant offset stripped in base.


On Wed, Mar 23, 2016 at 2:58 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Tue, Mar 22, 2016 at 11:01 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Tue, Mar 22, 2016 at 11:22 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Wed, Mar 16, 2016 at 10:06 AM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>>
>>>> On Wed, Mar 16, 2016 at 10:48 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>> > Hi,
>>>> > When I tried to decrease # of IV candidates, I removed code that adds IV candidates for use with constant offset stripped in use->base.  This is kind of too aggressive and triggers PR69042.  So here is a patch adding back the missing candidates.  Honestly, this patch doesn't truly fix the issue, it just brings back the original behavior in IVOPT part (Which is still a right thing to do I think).  The issue still depends on PIC_OFFSET register used on x86 target.  As discussed in https://gcc.gnu.org/ml/gcc/2016-02/msg00040.html.  Furthermore, the real problem could be in register pressure modeling about PIC_OFFSET symbol in IVOPT.
>>>> >
>>>> > On AArch64, overall spec2k number isn't changed, though 173.applu is regressed by ~4% because couple of loops' # of candidates now hits "--param iv-consider-all-candidates-bound=30".  For spec2k6 data on AArch64, INT is not affected; FP overall is not changed, as for specific case: 459.GemsFDTD is regressed by 2%, 433.milc is improved by 2%.  To address the regression, I will send another patch increasing the parameter bound.
>>>> >
>>>> > Bootstrap&test on x86_64 and AArch64, is it OK?  In the meantime, I will collect spec2k6 data on x86_64.
>>>>
>>>> Ok.
>>> Hi Richard,
>>> Hmm, I got spec2k6 data on my x86_64, it (along with patch increasing
>>> param iv-consider-all-candidates-bound) causes 1% regression for
>>> 436.cactusADM in my run.  I looked into the code, for function
>>> bench_staggeredleapfrog2_ (takes 99% running time after patching),
>>> IVOPT chooses one fewer candidates for outer loop, but it does result
>>> in couple of more instructions there.
>>
>> You mean IVOPTs chooses one fewer IVs for the outer loop?
> Yes.
>>
>>>  For this case, register
>>> pressure is a more interesting issue (36 candidates chosen in outer
>>> loop, many stack accesses), not sure if this 1% regression blocks the
>>> patch at this stage, or not?
>>
>> Is this with or without the increase of the param?  What compiler options and
>> on what sub-architecture was this?
> It's not the param, we have both address and generic iv_uses in the
> form of {base + 4, step}, and there is another address iv_use {base +
> 8, step}.  With the patch we now add candidate {base, step}, which
> will be used for all these uses.  Before the patch, candidates {base +
> 4, step} and {base + 8, step} are chosen for these iv_uses
> respectively.
> As a matter of fact, {base + 4, step} should be the best choice, but
> we can't do that because candidate added from one iv_use is not
> related to other iv_use and we exceed the bound of param
> iv-consider-all-candidates-bound = 30.

I suppose upping that to 40 doesn't help here either (AFAIR the loops are quite
big).

>>
>> I think if the IVO choice looks optimal before the patch and not optimal after
>> then it's worth blocking but it sounds like the IVO choice is a mess anyway?
> Yes, it's a mess here.  For three outermost loops in 436, IVO chooses
>>32 candidates for each loop.  I found at least one bug in register
> pressure calculation is responsible for this.  By fixing this (plus
> one another local patch), # of candidates of these loops are reduced
> from 36/36/32 to 12/12/7.  And # of lines of assembly is reduced from
> ~7150 to ~6850.  Interesting thing is, the performance isn't
> improved...  That may be because they are outermost loops.

Yeah, maybe a good default heuristic would be to keep existing BIVs
and add no related candidates at all for outer loops.  Or to adjust
heuristic to prefer the least number of IVs for those.

>> [can you maybe check IV choice by ICC?]
> I don't have ICC at hand, but from past experiments, ICC tends to
> reuse more common IVs than GCC.

Ok.

I think the patch is ok and eventually we need to revisit the outer loop
handling with this as an excuse.

Richard.

> Thanks,
> bin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]