This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH PR65447]Improve IV handling by grouping address type uses with same base and step
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: "Bin.Cheng" <amker dot cheng at gmail dot com>
- Cc: Bin Cheng <bin dot cheng at arm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 18 May 2015 11:14:35 +0200
- Subject: Re: [PATCH PR65447]Improve IV handling by grouping address type uses with same base and step
- Authentication-results: sourceware.org; auth=none
- References: <000001d0897c$57b00d90$071028b0$ at arm dot com> <CAFiYyc05nBwHFbnCqDefduQfXjTxNyW2c=H1=1Z0ifWJfr0ntA at mail dot gmail dot com> <CAHFci29c7tedsEuKvZT9w-X=rDwAxv0HMMCaCpXnd5skWRohMw at mail dot gmail dot com>
On Thu, May 14, 2015 at 1:11 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Wed, May 13, 2015 at 7:38 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Fri, May 8, 2015 at 12:47 PM, Bin Cheng <bin.cheng@arm.com> wrote:
>>> Hi,
>>> GCC's IVO currently handles every IV use independently, which is not right
>>> by learning from cases reported in PR65447.
>>>
>>> The rationale is:
>>> 1) Lots of address type IVs refer to the same memory object, share similar
>>> base and have same step. We should handle these IVs as a group in order to
>>> maximize CSE opportunities, prefer reg+offset addressing mode.
>>> 2) GCC's IVO algorithm is expensive and only is run when candidate set is
>>> small enough. By grouping same family uses, we can decrease the number of
>>> both uses and candidates. Before this patch, number of candidates for
>>> PR65447 is too big to run expensive IVO algorithm, resulting in bad assembly
>>> code on targets like AArch64 and Mips.
>>> 3) Even for cases the assembly code isn't improved, we can still get
>>> compilation time benefit with this patch.
>>> 4) This is a prerequisite for enabling auto-increment support in IVO on
>>> AArch64.
>>>
>>> For now, this is only done to address type IVs, in the future I may extend
>>> it to general IVs too.
>>>
>>> For AArch64:
>>> Benchmarks 470.lbm/spec2k6 and 173.applu/spec2k are improved obviously by
>>> this patch. A couple of cases from spec2k/fp appear regressed. I looked
>>> into generated assembly code and can confirm the regression is false alarm
>>> except one case (189.lucas). For that case, I think it's another issue
>>> exposed by this patch (GCC failed to CSE candidate setup code, resulting in
>>> bloated loop header). Anyway, I also fined tuned the patch to minimize the
>>> impact.
>>>
>>> For AArch32, this patch seems to be able to improve spec2kfp too, but I
>>> didn't look deep into it. I guess the reason is it can make life for
>>> auto-increment support in IVO better.
>>>
>>> One of defects of this patch is computation of max offset in
>>> compute_max_addr_offset is basically borrowed from get_address_cost. The
>>> comment says we should find a better way to compute all information. People
>>> also complained we need to refactor that part of code. I don't have good
>>> solution to that yet, though I did try best to keep compute_max_addr_offset
>>> simple.
>>>
>>> I believe this is a generally wanted change, bootstrap and test on x86_64
>>> and AArch64, so is it ok?
>>
>> I'm a little bit worried about the linked list of sub-uses and the "sorting"
>> (that's quadratic). A little. I don't have any good idea but to use a tree...
>> We don't seem to limit the number of sub-uses (if we'd do that it would
>> become O(1)).
>>
>> Similar is searching in the list of uses for a group with same base/step
>> (but ISTR IVOPTs has multiple similar loops?)
>
> Hi Richard,
> Thanks for reviewing. Instead of tree, I can also keep the linked
> list, then quick sort it by using a vector as temporary storage. This
> can avoid the complexity of BST operations without non-trivial
> overload.
Sounds good if constructing & querying are not done at the same time.
> For the searching routine, a local hash table could help.
>>
>> Overall the patch looks like a good improvement to how we do IVO, so I think
>> it is ok as-is.
> Do you want me to do this change now, or I can pick it up later when
> dealing with compilation time issue? Also it would be nice if any
> compilation time issue will be reported after applying this version
> patch. Because we will have a IVO compilation time benchmark then.
You can pick it up later.
Richard.
> Thanks,
> bin
>
>>
>> Thanks,
>> Richard.
>>
>>
>>>
>>> 2015-05-08 Bin Cheng <bin.cheng@arm.com>
>>>
>>> PR tree-optimization/65447
>>> * tree-ssa-loop-ivopts.c (struct iv_use): New fields.
>>> (dump_use, dump_uses): Support to dump sub use.
>>> (record_use): New parameters to support sub use. Remove call to
>>> dump_use.
>>> (record_sub_use, record_group_use): New functions.
>>> (compute_max_addr_offset, split_all_small_groups): New functions.
>>> (group_address_uses, rewrite_use_address): New functions.
>>> (strip_offset): New declaration.
>>> (find_interesting_uses_address): Call record_group_use.
>>> (add_candidate): New assertion.
>>> (infinite_cost_p): Move definition forward.
>>> (add_costs): Check INFTY cost and return immediately.
>>> (get_computation_cost_at): Clear setup cost and dependent bitmap
>>> for sub uses.
>>> (determine_use_iv_cost_address): Compute cost for sub uses.
>>> (rewrite_use_address_1): Rename from old rewrite_use_address.
>>> (free_loop_data): Free sub uses.
>>> (tree_ssa_iv_optimize_loop): Call group_address_uses.
>>>
>>> gcc/testsuite/ChangeLog
>>> 2015-05-08 Bin Cheng <bin.cheng@arm.com>
>>>
>>> PR tree-optimization/65447
>>> * gcc.dg/tree-ssa/pr65447.c: New test.