This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH PR65447]Improve IV handling by grouping address type uses with same base and step


On Wed, May 27, 2015 at 11:44 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
> Hi Bin,
>
>
> On 08/05/15 11:47, Bin Cheng wrote:
>>
>> Hi,
>> GCC's IVO currently handles every IV use independently, which is not right
>> by learning from cases reported in PR65447.
>>
>> The rationale is:
>> 1) Lots of address type IVs refer to the same memory object, share similar
>> base and have same step.  We should handle these IVs as a group in order
>> to
>> maximize CSE opportunities, prefer reg+offset addressing mode.
>> 2) GCC's IVO algorithm is expensive and only is run when candidate set is
>> small enough.  By grouping same family uses, we can decrease the number of
>> both uses and candidates.  Before this patch, number of candidates for
>> PR65447 is too big to run expensive IVO algorithm, resulting in bad
>> assembly
>> code on targets like AArch64 and Mips.
>> 3) Even for cases the assembly code isn't improved, we can still get
>> compilation time benefit with this patch.
>> 4) This is a prerequisite for enabling auto-increment support in IVO on
>> AArch64.
>>
>> For now, this is only done to address type IVs, in the future I may extend
>> it to general IVs too.
>>
>> For AArch64:
>> Benchmarks 470.lbm/spec2k6 and 173.applu/spec2k are improved obviously by
>> this patch.  A couple of cases from spec2k/fp appear regressed.  I looked
>> into generated assembly code and can confirm the regression is false alarm
>> except one case (189.lucas).  For that case, I think it's another issue
>> exposed by this patch (GCC failed to CSE candidate setup code, resulting
>> in
>> bloated loop header).  Anyway, I also fined tuned the patch to minimize
>> the
>> impact.
>>
>> For AArch32, this patch seems to be able to improve spec2kfp too, but I
>> didn't look deep into it.  I guess the reason is it can make life for
>> auto-increment support in IVO better.
>>
>> One of defects of this patch is computation of max offset in
>> compute_max_addr_offset is basically borrowed from get_address_cost.  The
>> comment says we should find a better way to compute all information.
>> People
>> also complained we need to refactor that part of code.  I don't have good
>> solution to that yet, though I did try best to keep
>> compute_max_addr_offset
>> simple.
>>
>> I believe this is a generally wanted change, bootstrap and test on x86_64
>> and AArch64, so is it ok?
>>
>>
>> 2015-05-08  Bin Cheng  <bin.cheng@arm.com>
>>
>>         PR tree-optimization/65447
>>         * tree-ssa-loop-ivopts.c (struct iv_use): New fields.
>>         (dump_use, dump_uses): Support to dump sub use.
>>         (record_use): New parameters to support sub use.  Remove call to
>>         dump_use.
>>         (record_sub_use, record_group_use): New functions.
>>         (compute_max_addr_offset, split_all_small_groups): New functions.
>>         (group_address_uses, rewrite_use_address): New functions.
>>         (strip_offset): New declaration.
>>         (find_interesting_uses_address): Call record_group_use.
>>         (add_candidate): New assertion.
>>         (infinite_cost_p): Move definition forward.
>>         (add_costs): Check INFTY cost and return immediately.
>>         (get_computation_cost_at): Clear setup cost and dependent bitmap
>>         for sub uses.
>>         (determine_use_iv_cost_address): Compute cost for sub uses.
>>         (rewrite_use_address_1): Rename from old rewrite_use_address.
>>         (free_loop_data): Free sub uses.
>>         (tree_ssa_iv_optimize_loop): Call group_address_uses.
>>
>> gcc/testsuite/ChangeLog
>> 2015-05-08  Bin Cheng  <bin.cheng@arm.com>
>>
>>         PR tree-optimization/65447
>>         * gcc.dg/tree-ssa/pr65447.c: New test.
>
>
> I see this test failing on arm-none-eabi with a compiler at r223737.
> My configure options are: --enable-checking=yes --with-newlib
> --with-fpu=neon-fp-armv8 --with-arch=armv8-a --without-isl

Hi Kyrill,
Thank you for reporting this.  I will have a look.

Thanks,
bin
>
> Kyrill
>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]