This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH PR65447]Improve IV handling by grouping address type uses with same base and step


On Wed, May 13, 2015 at 7:38 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, May 8, 2015 at 12:47 PM, Bin Cheng <bin.cheng@arm.com> wrote:
>> Hi,
>> GCC's IVO currently handles every IV use independently, which is not right
>> by learning from cases reported in PR65447.
>>
>> The rationale is:
>> 1) Lots of address type IVs refer to the same memory object, share similar
>> base and have same step.  We should handle these IVs as a group in order to
>> maximize CSE opportunities, prefer reg+offset addressing mode.
>> 2) GCC's IVO algorithm is expensive and only is run when candidate set is
>> small enough.  By grouping same family uses, we can decrease the number of
>> both uses and candidates.  Before this patch, number of candidates for
>> PR65447 is too big to run expensive IVO algorithm, resulting in bad assembly
>> code on targets like AArch64 and Mips.
>> 3) Even for cases the assembly code isn't improved, we can still get
>> compilation time benefit with this patch.
>> 4) This is a prerequisite for enabling auto-increment support in IVO on
>> AArch64.
>>
>> For now, this is only done to address type IVs, in the future I may extend
>> it to general IVs too.
>>
>> For AArch64:
>> Benchmarks 470.lbm/spec2k6 and 173.applu/spec2k are improved obviously by
>> this patch.  A couple of cases from spec2k/fp appear regressed.  I looked
>> into generated assembly code and can confirm the regression is false alarm
>> except one case (189.lucas).  For that case, I think it's another issue
>> exposed by this patch (GCC failed to CSE candidate setup code, resulting in
>> bloated loop header).  Anyway, I also fined tuned the patch to minimize the
>> impact.
>>
>> For AArch32, this patch seems to be able to improve spec2kfp too, but I
>> didn't look deep into it.  I guess the reason is it can make life for
>> auto-increment support in IVO better.
>>
>> One of defects of this patch is computation of max offset in
>> compute_max_addr_offset is basically borrowed from get_address_cost.  The
>> comment says we should find a better way to compute all information.  People
>> also complained we need to refactor that part of code.  I don't have good
>> solution to that yet, though I did try best to keep compute_max_addr_offset
>> simple.
>>
>> I believe this is a generally wanted change, bootstrap and test on x86_64
>> and AArch64, so is it ok?
>
> I'm a little bit worried about the linked list of sub-uses and the "sorting"
> (that's quadratic).  A little.  I don't have any good idea but to use a tree...
> We don't seem to limit the number of sub-uses (if we'd do that it would
> become O(1)).
>
> Similar is searching in the list of uses for a group with same base/step
> (but ISTR IVOPTs has multiple similar loops?)

Hi Richard,
Thanks for reviewing.  Instead of tree, I can also keep the linked
list, then quick sort it by using a vector as temporary storage.  This
can avoid the complexity of BST operations without non-trivial
overload.

For the searching routine, a local hash table could help.
>
> Overall the patch looks like a good improvement to how we do IVO, so I think
> it is ok as-is.
Do you want me to do this change now, or I can pick it up later when
dealing with compilation time issue?  Also it would be nice if any
compilation time issue will be reported after applying this version
patch.  Because we will have a IVO compilation time benchmark then.

Thanks,
bin

>
> Thanks,
> Richard.
>
>
>>
>> 2015-05-08  Bin Cheng  <bin.cheng@arm.com>
>>
>>         PR tree-optimization/65447
>>         * tree-ssa-loop-ivopts.c (struct iv_use): New fields.
>>         (dump_use, dump_uses): Support to dump sub use.
>>         (record_use): New parameters to support sub use.  Remove call to
>>         dump_use.
>>         (record_sub_use, record_group_use): New functions.
>>         (compute_max_addr_offset, split_all_small_groups): New functions.
>>         (group_address_uses, rewrite_use_address): New functions.
>>         (strip_offset): New declaration.
>>         (find_interesting_uses_address): Call record_group_use.
>>         (add_candidate): New assertion.
>>         (infinite_cost_p): Move definition forward.
>>         (add_costs): Check INFTY cost and return immediately.
>>         (get_computation_cost_at): Clear setup cost and dependent bitmap
>>         for sub uses.
>>         (determine_use_iv_cost_address): Compute cost for sub uses.
>>         (rewrite_use_address_1): Rename from old rewrite_use_address.
>>         (free_loop_data): Free sub uses.
>>         (tree_ssa_iv_optimize_loop): Call group_address_uses.
>>
>> gcc/testsuite/ChangeLog
>> 2015-05-08  Bin Cheng  <bin.cheng@arm.com>
>>
>>         PR tree-optimization/65447
>>         * gcc.dg/tree-ssa/pr65447.c: New test.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]