Bug 32949 - suboptimal address generation for int indices on 64-bit targets
Summary: suboptimal address generation for int indices on 64-bit targets
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.3.0
: P3 enhancement
Target Milestone: 4.3.0
Assignee: Zdenek Dvorak
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2007-07-31 17:28 UTC by Alexander Monakov
Modified: 2009-01-01 05:34 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2007-08-21 21:29:28


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Monakov 2007-07-31 17:28:56 UTC
When compiling with -O2

void foo(int N, int k, int di, double x[N])
{       
  long i;
  for (i = 0; k; k--, i += di)
    x[i] = 0;   
}

GCC produces for x86_64:

.L9:
        movq    $0, (%rcx)
        addq    %rax, %rcx
        subl    $1, %esi
        jne     .L9

and for ia64:

.L10:
        .mib
        stfd [r35] = f0
        add r35 = r35, r34
        br.cloop.sptk.few .L10
        ;;

However, when `long i' is changed to `int i', generated code is worse:

.L3:
        movslq  %edi,%rax
        addl    %edx, %edi
        subl    $1, %esi
        movq    $0, (%rcx,%rax,8)
        jne     .L3

and for ia64:

.L3:
        .mii
        nop 0
        sxt4 r14 = r15
        add r15 = r15, r34
        ;;
        .mii
        shladd r14 = r14, 3, r35
        nop 0
        ;;
        nop 0
        .mmb
        stfd [r14] = f0
        nop 0
        br.cloop.sptk.few .L3
        ;;

Basically, unadjusted loop index (i) is copied into temporary, which is then multiplied by sizeof(x[0]), added to base address of array and used for addressing.  I assume this can be improved on the grounds that signed integer overflow is undefined.
Comment 1 Zdenek Dvorak 2007-08-21 21:29:28 UTC
This patch fixes the problem:

Index: tree-ssa-loop-niter.c
===================================================================
*** tree-ssa-loop-niter.c       (revision 127674)
--- tree-ssa-loop-niter.c       (working copy)
*************** scev_probably_wraps_p (tree base, tree s
*** 2969,2977 ****
       2032, 2040, 0, 8, ..., but the code is still legal.  */

    if (chrec_contains_undetermined (base)
!       || chrec_contains_undetermined (step)
!       || TREE_CODE (step) != INTEGER_CST)
!     return true;

    if (integer_zerop (step))
      return false;
--- 2969,2975 ----
       2032, 2040, 0, 8, ..., but the code is still legal.  */

    if (chrec_contains_undetermined (base)
!       || chrec_contains_undetermined (step))

    if (integer_zerop (step))
      return false;
*************** scev_probably_wraps_p (tree base, tree s
*** 2981,2986 ****
--- 2979,2989 ----
    if (use_overflow_semantics && nowrap_type_p (type))
      return false;

+   /* To be able to use estimates on number of iterations of the loop,
+      we must have an upper bound on the absolute value of the step.  */
+   if (TREE_CODE (step) != INTEGER_CST)
+     return true;
+
    /* Don't issue signed overflow warnings.  */
    fold_defer_overflow_warnings ();
Comment 2 Alexander Monakov 2007-08-22 10:13:02 UTC
> *************** scev_probably_wraps_p (tree base, tree s
> *** 2969,2977 ****
>        2032, 2040, 0, 8, ..., but the code is still legal.  */
> 
>     if (chrec_contains_undetermined (base)
> !       || chrec_contains_undetermined (step)
> !       || TREE_CODE (step) != INTEGER_CST)
> !     return true;
> 
>     if (integer_zerop (step))
>       return false;
> --- 2969,2975 ----
>        2032, 2040, 0, 8, ..., but the code is still legal.  */
> 
>     if (chrec_contains_undetermined (base)
> !       || chrec_contains_undetermined (step))
> 
>     if (integer_zerop (step))
>       return false;

Zdenek, isn't 'return true' missing here?
Comment 3 Alexander Monakov 2007-08-22 21:29:33 UTC
With first hunk modified not to delete 'return true', this patch passes bootstrap with all default languages on ia64 and x86_64 with --disable-multilib, and passes regtest with no new regressions (all said with 10-days-old trunk, revision 127475).
Comment 4 Zdenek Dvorak 2007-08-22 23:05:19 UTC
Subject: Bug 32949

Author: rakdver
Date: Wed Aug 22 23:05:05 2007
New Revision: 127720

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=127720
Log:
2007-08-22  Zdenek Dvorak  <ook@ucw.cz>

	PR tree-optimization/32949
	* tree-ssa-loop-niter.c (scev_probably_wraps_p): Test nowrap_type_p
	before failing for ivs with non-constant step.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-loop-niter.c

Comment 5 Andrew Pinski 2009-01-01 05:34:18 UTC
Fixed.