[PATCH]Enable elimination of IV use with unsigned type candidate

Bin Cheng bin.cheng@arm.com
Mon Jun 23 09:49:00 GMT 2014


Hi,
For below simplified case:

#define LEN (32000)
__attribute__((aligned(16))) float a[LEN],b[LEN];

int foo (int M)
{
  for (int i = 0; i < M; i++)
    a[i+M] = a[i] + b[i];
}

Compiling it with command like:
$ aarch64-elf-gcc -O3 -S foo.c -o foo.S -std=c99

The assembly code of vectorized loop is in below form:
        mov  x1, 0
        mov  w2, 0
.L4:
        ldr     q0, [x1, x3]
        add     w2, w2, 1
        ldr     q1, [x1, x4]
        cmp     w2, w5
        fadd    v0.4s, v0.4s, v1.4s
        str     q0, [x6, x1]
        add     x1, x1, 16
        bcc     .L4

Induction variable w2 is unnecessary and can be eliminated with x1.  This is
safe because X1 will never overflow during all iterations of loop. The
optimal assembly should be like:
        mov  x1, 0
.L4:
        ldr     q0, [x1, x2]
        ldr     q1, [x1, x4]
        fadd    v0.4s, v0.4s, v1.4s
        str     q0, [x5, x1]
        add     x1, x1, 16
        cmp     x1, x3
        bcc     .L4

This case can be handled if we do more complex overflow check on unsigned
type in function iv_elimination_compare_lt.

Also there is another blocker for the transformation, function
number_of_iterations_lt calls fold_build2 to build folded form of
may_be_zero, while iv_elimination_compare_lt only handles simple form tree
expressions.  It's possible to have iv_elimination_compare_lt to do some
undo transformation on may_be_zero, but I found it's difficult for cases
involving signed/unsigned conversion like case loop-41.c.  Since I think
there is no obvious benefit to fold may_be_zero here (somehow because the
two operands are already in folded forms), this patch just calls build2_loc
instead.

This optimization is picked up by patch B, but patch A is necessary since
the check in iv_elimination_compare_lt of two aff_trees isn't enough when
two different types (one in signed type, the other in unsigned) involved.  I
have to use tree comparison here instead.  Considering below simple case:

Analyzing # of iterations of loop 5
  exit condition [1, + , 1](no_overflow) <= i_1
  bounds on difference of bases: -3 ... 1
  result:
    zero if i_1 + 1 < 1
    # of iterations (unsigned int) i_1, bounded by 2
  number of iterations (unsigned int) i_1; zero if i_1 + 1 < 1

use 0
  compare
  in statement if (S.7_9 > i_1)

  at position 
  type integer(kind=4)
  base 1
  step 1
  is a biv
  related candidates 

candidate 0 (important)
  var_before ivtmp.28
  var_after ivtmp.28
  incremented at end
  type unsigned int
  base 0
  step 1

When GCC trying to eliminate use 0 with cand 0, the miscellaneous trees in
iv_elimination_compare_lt are like below with i_1 of signed type:
B:             i_1 + 1
A:             0
niter->niter:  (unsigned int)i_1

Apparently, (B-A-1) is "i_1", which doesn't equal to "(unsigned int)i_1".
Without this patch, it is considered equal to each other.

Note that the induction variable IS necessary on 32 bit systems since
otherwise there is type overflow.

These two patch fix the mentioned problem.  
They pass bootstrap and regression test on x86_64/x86/aarch64/arm, so any
comments?

Thanks,
bin

PATCH A)

2014-06-23  Bin Cheng  <bin.cheng@arm.com>

	* tree-ssa-loop-ivopts.c (iv_elimination_compare_lt): Check number
	of iteration using tree comparison.

PATCH B)

2014-06-23  Bin Cheng  <bin.cheng@arm.com>

	* tree-ssa-loop-niter.c (number_of_iterations_lt): Build unfolded
	form of may_be_zero.
	* tree-ssa-loop-ivopts.c (iv_nowrap_period)
	(nowrap_cand_for_loop_niter_p): New functions.
	(period_greater_niter_exit): New function refactored from
	may_eliminate_iv.
	(iv_elimination_compare_lt): New parameter.  Relax overflow check.
	Handle special forms may_be_zero expression.
	(may_eliminate_iv): Call period_greater_niter_exit.  Pass new
	argument for iv_elimination_compare_lt.

gcc/testsuite/ChangeLog
2014-06-23  Bin Cheng  <bin.cheng@arm.com>

	* gcc.dg/tree-ssa/loop-40.c: New test.
	* gcc.dg/tree-ssa/loop-41.c: New test.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: condition-iv_use-elimination-20140623-b.txt
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20140623/8634b65d/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: condition-iv_use-elimination-20140623-a.txt
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20140623/8634b65d/attachment-0001.txt>


More information about the Gcc-patches mailing list