[PATCH] Improve cost computations in IVOPTS
Richard Guenther
richard.guenther@gmail.com
Mon May 25 11:08:00 GMT 2009
On Mon, May 25, 2009 at 12:55 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
> Hi,
>
> for loops with non-constant iteration origin, e.g.:
>
> extern char arr1[];
>
> int foo1(char v, int start, int end)
> {
> int i;
>
> for (i = start; i < end; i++)
> arr1[i] = v;
> }
>
> extern int arr2[];
>
> int foo2(int v, int start, int end)
> {
> int i;
>
> for (i = start; i < end; i++)
> arr2[i] = v;
> }
>
> IVOPS generates code that uses one more register than needed on x86 at -O:
>
> .L3:
> movb %bl, (%edx)
> addl $1, %eax
> addl $1, %edx
> cmpl %eax, %ecx
> jg .L3
>
> .L8:
> movl %ebx, (%edx)
> addl $1, %eax
> addl $4, %edx
> cmpl %eax, %ecx
> jg .L8
>
> i.e. 3 registers for the iteration instead of 2. That's problematic for very
> nested inner loops.
>
>
> The problem comes for the cost computed for the IV associated with the use:
>
> use 0
> address
> in statement arr1[i_12] = v_5(D);
>
> at position arr1[i_12]
> type char *
> base &arr1[start_2(D)]
> step 1
> base object (void *) &arr1
> related candidates
>
> Use 0:
> cand cost compl. depends on
> 0 19 1 1
> 1 27 2 1
> 2 27 1 1
> 3 27 1 1
> 4 2 0
>
> candidate 1 (important)
> incremented before exit test
> type unsigned int
> base (unsigned int) (start_2(D) + 1)
> step 1
> candidate 2 (important)
> incremented before exit test
> type unsigned int
> base (unsigned int) start_2(D)
> step 1
> candidate 3 (important)
> original biv
> type unsigned int
> base (unsigned int) start_2(D)
> step 1
>
> The costs for candidates 1, 2 and 3 are greatly skewed because the compiler
> doesn't correctly evaluate the cost of &arr1[start_2(D)] - start_2(D).
>
>
> The attached patch is aimed at fixing this. It generates for the above loops:
>
> .L3:
> movb %cl, arr1(%eax)
> addl $1, %eax
> cmpl %eax, %edx
> jg .L3
>
> .L8:
> movl %ecx, arr2(,%eax,4)
> addl $1, %eax
> cmpl %eax, %edx
> jg .L8
>
>
> Tested on i586-suse-linux, OK for mainline?
+ if (TREE_CODE (op1) != INTEGER_CST)
+ return orig_expr;
should test host_integerp (op1, 0).
There seem to be unrelated changes in the patch, like
- if (TYPE_PRECISION (utype) != TYPE_PRECISION (ctype))
+ if (TYPE_PRECISION (utype) < TYPE_PRECISION (ctype))
and
+ STRIP_NOPS (cbase);
+ ctype = TREE_TYPE (cbase);
are you sure you want to strip sign-changes here? I am not.
I would use STRIP_USELESS_TYPE_CONVERSIONS here.
Otherwise the result of course looks promising. I'd like Zdenek
to ack this change though.
Thanks,
Richard.
>
> 2009-05-25 Eric Botcazou <ebotcazou@adacore.com>
>
> * tree-ssa-loop-ivopts.c (strip_offset_1) <MULT_EXPR>: New case.
> (force_expr_to_var_cost) <NEGATE_EXPR>: Likewise.
> (ptr_difference_cost): Use affine combinations to compute it.
> (difference_cost): Likewise.
> (get_computation_cost_at): Compute more accurate cost for addresses
> if the ratio is a multiplier allowed in addresses.
> For non-addresses, consider that an additional offset or symbol is
> added only once.
>
>
> --
> Eric Botcazou
>
More information about the Gcc-patches
mailing list