This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH PR65447]Improve IV handling by grouping address type uses with same base and step


Hi Bin,

On 08/05/15 11:47, Bin Cheng wrote:
Hi,
GCC's IVO currently handles every IV use independently, which is not right
by learning from cases reported in PR65447.

The rationale is:
1) Lots of address type IVs refer to the same memory object, share similar
base and have same step.  We should handle these IVs as a group in order to
maximize CSE opportunities, prefer reg+offset addressing mode.
2) GCC's IVO algorithm is expensive and only is run when candidate set is
small enough.  By grouping same family uses, we can decrease the number of
both uses and candidates.  Before this patch, number of candidates for
PR65447 is too big to run expensive IVO algorithm, resulting in bad assembly
code on targets like AArch64 and Mips.
3) Even for cases the assembly code isn't improved, we can still get
compilation time benefit with this patch.
4) This is a prerequisite for enabling auto-increment support in IVO on
AArch64.

For now, this is only done to address type IVs, in the future I may extend
it to general IVs too.

For AArch64:
Benchmarks 470.lbm/spec2k6 and 173.applu/spec2k are improved obviously by
this patch.  A couple of cases from spec2k/fp appear regressed.  I looked
into generated assembly code and can confirm the regression is false alarm
except one case (189.lucas).  For that case, I think it's another issue
exposed by this patch (GCC failed to CSE candidate setup code, resulting in
bloated loop header).  Anyway, I also fined tuned the patch to minimize the
impact.

For AArch32, this patch seems to be able to improve spec2kfp too, but I
didn't look deep into it.  I guess the reason is it can make life for
auto-increment support in IVO better.

One of defects of this patch is computation of max offset in
compute_max_addr_offset is basically borrowed from get_address_cost.  The
comment says we should find a better way to compute all information.  People
also complained we need to refactor that part of code.  I don't have good
solution to that yet, though I did try best to keep compute_max_addr_offset
simple.

I believe this is a generally wanted change, bootstrap and test on x86_64
and AArch64, so is it ok?


2015-05-08  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/65447
	* tree-ssa-loop-ivopts.c (struct iv_use): New fields.
	(dump_use, dump_uses): Support to dump sub use.
	(record_use): New parameters to support sub use.  Remove call to
	dump_use.
	(record_sub_use, record_group_use): New functions.
	(compute_max_addr_offset, split_all_small_groups): New functions.
	(group_address_uses, rewrite_use_address): New functions.
	(strip_offset): New declaration.
	(find_interesting_uses_address): Call record_group_use.
	(add_candidate): New assertion.
	(infinite_cost_p): Move definition forward.
	(add_costs): Check INFTY cost and return immediately.
	(get_computation_cost_at): Clear setup cost and dependent bitmap
	for sub uses.
	(determine_use_iv_cost_address): Compute cost for sub uses.
	(rewrite_use_address_1): Rename from old rewrite_use_address.
	(free_loop_data): Free sub uses.
	(tree_ssa_iv_optimize_loop): Call group_address_uses.

gcc/testsuite/ChangeLog
2015-05-08  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/65447
	* gcc.dg/tree-ssa/pr65447.c: New test.

I see this test failing on arm-none-eabi with a compiler at r223737.
My configure options are: --enable-checking=yes --with-newlib --with-fpu=neon-fp-armv8 --with-arch=armv8-a --without-isl

Kyrill



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]