[PATCH] Improve cost computations in IVOPTS

Mon May 25 11:08:00 GMT 2009

On Mon, May 25, 2009 at 12:55 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
> Hi,
>
> for loops with non-constant iteration origin, e.g.:
>
> extern char arr1[];
>
> int foo1(char v, int start, int end)
> {
>  int i;
>
>  for (i = start; i < end; i++)
>    arr1[i] = v;
> }
>
> extern int arr2[];
>
> int foo2(int v, int start, int end)
> {
>  int i;
>
>  for (i = start; i < end; i++)
>    arr2[i] = v;
> }
>
> IVOPS generates code that uses one more register than needed on x86 at -O:
>
> .L3:
>        movb    %bl, (%edx)
>        addl    $1, %eax
>        addl    $1, %edx
>        cmpl    %eax, %ecx
>        jg      .L3
>
> .L8:
>        movl    %ebx, (%edx)
>        addl    $1, %eax
>        addl    $4, %edx
>        cmpl    %eax, %ecx
>        jg      .L8
>
> i.e. 3 registers for the iteration instead of 2.  That's problematic for very
> nested inner loops.
>
>
> The problem comes for the cost computed for the IV associated with the use:
>
> use 0
>  address
>  in statement arr1[i_12] = v_5(D);
>
>  at position arr1[i_12]
>  type char *
>  base &arr1[start_2(D)]
>  step 1
>  base object (void *) &arr1
>  related candidates
>
> Use 0:
>  cand  cost    compl.  depends on
>  0     19      1       1
>  1     27      2       1
>  2     27      1       1
>  3     27      1       1
>  4     2       0
>
> candidate 1 (important)
>  incremented before exit test
>  type unsigned int
>  base (unsigned int) (start_2(D) + 1)
>  step 1
> candidate 2 (important)
>  incremented before exit test
>  type unsigned int
>  base (unsigned int) start_2(D)
>  step 1
> candidate 3 (important)
>  original biv
>  type unsigned int
>  base (unsigned int) start_2(D)
>  step 1
>
> The costs for candidates 1, 2 and 3 are greatly skewed because the compiler
> doesn't correctly evaluate the cost of &arr1[start_2(D)] - start_2(D).
>
>
> The attached patch is aimed at fixing this.  It generates for the above loops:
>
> .L3:
>        movb    %cl, arr1(%eax)
>        addl    $1, %eax
>        cmpl    %eax, %edx
>        jg      .L3
>
> .L8:
>        movl    %ecx, arr2(,%eax,4)
>        addl    $1, %eax
>        cmpl    %eax, %edx
>        jg      .L8
>
>
> Tested on i586-suse-linux, OK for mainline?

+      if (TREE_CODE (op1) != INTEGER_CST)
+       return orig_expr;

should test host_integerp (op1, 0).

There seem to be unrelated changes in the patch, like

-  if (TYPE_PRECISION (utype) != TYPE_PRECISION (ctype))
+  if (TYPE_PRECISION (utype) < TYPE_PRECISION (ctype))

and

+  STRIP_NOPS (cbase);
+  ctype = TREE_TYPE (cbase);

are you sure you want to strip sign-changes here?  I am not.
I would use STRIP_USELESS_TYPE_CONVERSIONS here.

Otherwise the result of course looks promising.  I'd like Zdenek
to ack this change though.

Thanks,
Richard.

>
> 2009-05-25  Eric Botcazou  <ebotcazou@adacore.com>
>
>        * tree-ssa-loop-ivopts.c (strip_offset_1) <MULT_EXPR>: New case.
>        (force_expr_to_var_cost) <NEGATE_EXPR>: Likewise.
>        (ptr_difference_cost): Use affine combinations to compute it.
>        (difference_cost): Likewise.
>        (get_computation_cost_at): Compute more accurate cost for addresses
>        if the ratio is a multiplier allowed in addresses.
>        For non-addresses, consider that an additional offset or symbol is
>        added only once.
>
>
> --
> Eric Botcazou
>