This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Improve cost computations in IVOPTS


On Mon, May 25, 2009 at 12:55 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
> Hi,
>
> for loops with non-constant iteration origin, e.g.:
>
> extern char arr1[];
>
> int foo1(char v, int start, int end)
> {
> ?int i;
>
> ?for (i = start; i < end; i++)
> ? ?arr1[i] = v;
> }
>
> extern int arr2[];
>
> int foo2(int v, int start, int end)
> {
> ?int i;
>
> ?for (i = start; i < end; i++)
> ? ?arr2[i] = v;
> }
>
> IVOPS generates code that uses one more register than needed on x86 at -O:
>
> .L3:
> ? ? ? ?movb ? ?%bl, (%edx)
> ? ? ? ?addl ? ?$1, %eax
> ? ? ? ?addl ? ?$1, %edx
> ? ? ? ?cmpl ? ?%eax, %ecx
> ? ? ? ?jg ? ? ?.L3
>
> .L8:
> ? ? ? ?movl ? ?%ebx, (%edx)
> ? ? ? ?addl ? ?$1, %eax
> ? ? ? ?addl ? ?$4, %edx
> ? ? ? ?cmpl ? ?%eax, %ecx
> ? ? ? ?jg ? ? ?.L8
>
> i.e. 3 registers for the iteration instead of 2. ?That's problematic for very
> nested inner loops.
>
>
> The problem comes for the cost computed for the IV associated with the use:
>
> use 0
> ?address
> ?in statement arr1[i_12] = v_5(D);
>
> ?at position arr1[i_12]
> ?type char *
> ?base &arr1[start_2(D)]
> ?step 1
> ?base object (void *) &arr1
> ?related candidates
>
> Use 0:
> ?cand ?cost ? ?compl. ?depends on
> ?0 ? ? 19 ? ? ?1 ? ? ? 1
> ?1 ? ? 27 ? ? ?2 ? ? ? 1
> ?2 ? ? 27 ? ? ?1 ? ? ? 1
> ?3 ? ? 27 ? ? ?1 ? ? ? 1
> ?4 ? ? 2 ? ? ? 0
>
> candidate 1 (important)
> ?incremented before exit test
> ?type unsigned int
> ?base (unsigned int) (start_2(D) + 1)
> ?step 1
> candidate 2 (important)
> ?incremented before exit test
> ?type unsigned int
> ?base (unsigned int) start_2(D)
> ?step 1
> candidate 3 (important)
> ?original biv
> ?type unsigned int
> ?base (unsigned int) start_2(D)
> ?step 1
>
> The costs for candidates 1, 2 and 3 are greatly skewed because the compiler
> doesn't correctly evaluate the cost of &arr1[start_2(D)] - start_2(D).
>
>
> The attached patch is aimed at fixing this. ?It generates for the above loops:
>
> .L3:
> ? ? ? ?movb ? ?%cl, arr1(%eax)
> ? ? ? ?addl ? ?$1, %eax
> ? ? ? ?cmpl ? ?%eax, %edx
> ? ? ? ?jg ? ? ?.L3
>
> .L8:
> ? ? ? ?movl ? ?%ecx, arr2(,%eax,4)
> ? ? ? ?addl ? ?$1, %eax
> ? ? ? ?cmpl ? ?%eax, %edx
> ? ? ? ?jg ? ? ?.L8
>
>
> Tested on i586-suse-linux, OK for mainline?

+      if (TREE_CODE (op1) != INTEGER_CST)
+       return orig_expr;

should test host_integerp (op1, 0).

There seem to be unrelated changes in the patch, like

-  if (TYPE_PRECISION (utype) != TYPE_PRECISION (ctype))
+  if (TYPE_PRECISION (utype) < TYPE_PRECISION (ctype))

and

+  STRIP_NOPS (cbase);
+  ctype = TREE_TYPE (cbase);

are you sure you want to strip sign-changes here?  I am not.
I would use STRIP_USELESS_TYPE_CONVERSIONS here.

Otherwise the result of course looks promising.  I'd like Zdenek
to ack this change though.

Thanks,
Richard.


>
> 2009-05-25 ?Eric Botcazou ?<ebotcazou@adacore.com>
>
> ? ? ? ?* tree-ssa-loop-ivopts.c (strip_offset_1) <MULT_EXPR>: New case.
> ? ? ? ?(force_expr_to_var_cost) <NEGATE_EXPR>: Likewise.
> ? ? ? ?(ptr_difference_cost): Use affine combinations to compute it.
> ? ? ? ?(difference_cost): Likewise.
> ? ? ? ?(get_computation_cost_at): Compute more accurate cost for addresses
> ? ? ? ?if the ratio is a multiplier allowed in addresses.
> ? ? ? ?For non-addresses, consider that an additional offset or symbol is
> ? ? ? ?added only once.
>
>
> --
> Eric Botcazou
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]