This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH PR41488]Recognize more induction variables by simplifying PEELED chrec in scalar evolution


Hi,
Entry pr41488 shows a case in which induction variable can't be recognized
or coalesced.  As noted in the bug entry, one possible fix is to teach PRE
not to do some specific transformation.  However, it's in nature a scalar
evolution issue.  Considering below snippet:

   # i_17 = PHI <i_13(5), 0(3)>
   # _20 = PHI <_5(5), start_4(D)(3)>
   ...
   i_13 = i_17 + 1;
   _5 = start_4(D) + i_13;

Variable _20 appears in the form of PEELED chrec like (start_4, _5)_LOOP,
meaning it has value "start_4" in the 1st iteration, then changes to _5 in
following iterations.  PEELED chrec is not implemented by GCC right now, it
can be simplified into either POLYNOMIAL or PERIOD one.  The POLYNOMIAL
chrec is processed by GCC after being recognized, as in the examle, _20 is
actually {start_4, 1}_LOOP.

This patch modifies scalar evolution by trying to simplify PEELED chrec.
Without this patch, generated code for pr41488.c is like:
foo:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	cmp	r1, r2
	bge	.L7
	push	{r4}
	mov	r3, r1
	ldr	r4, [r0]
	adds	r2, r2, #1
	adds	r1, r1, #1
	movs	r0, #0
.L3:
	str	r0, [r4, r3, lsl #2]
	mov	r3, r1
	adds	r1, r1, #1
	cmp	r1, r2
	bne	.L3
	ldr	r4, [sp], #4
.L7:
	bx	lr
	.size	foo, .-foo

Which is improved to:
foo:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	cmp	r1, r2
	bge	.L1
	ldr	r3, [r0]
	movs	r0, #0
	add	r1, r3, r1, lsl #2
	add	r2, r3, r2, lsl #2
.L3:
	str	r0, [r1], #4
	cmp	r1, r2
	bne	.L3
.L1:
	bx	lr
	.size	foo, .-foo

One point needs to be clarified that I use tree_to_aff_combination_expand in
the patch.  Rational is the number of the specific PEELED_CHRECs should be
moderate, I also check the equality literally firstly and resort to affine
facility if that fails.  By measuring bootstrap of gcc and spec2kint, I
collected the number of opportunities caught by this patch like below:
                            literal comparison/affine comparison
GCC                          ~1200/~250
Spec2Kint              93/34

I could back trace the ssa chain instead of using affine functions, but that
would miss some cases.

Bootstrap and test on x86/x86_64/arm.  Is it OK?

Thanks,
bin

2013-12-06  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/41488
	* tree-ssa-loop-ivopts.c (add_old_iv_candidates): Don't add cand
	for PEELED_CHREC kind of IV.
	* tree-scalar-evolution.c: Include necessary header files.
	(peeled_chrec_map, simplify_peeled_chrec): New.
	(analyze_evolution_in_loop): New static variable.
	Call simplify_peeled_chrec.
	(scev_initialize): Initialize peeled_chrec_map.
	(scev_reset, scev_finalize): Reset and release peeled_chrec_map.

gcc/testsuite/ChangeLog
2013-12-06  Bin Cheng  <bin.cheng@arm.com>

	* gcc.dg/tree-ssa/scev-7.c: New test.
	* gcc.dg/pr41488.c: New test.

Attachment: pr41488-20131205-a.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]