Bug 18431 - Code for arrays and pointers are not the same
Summary: Code for arrays and pointers are not the same
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.0.0
: P2 enhancement
Target Milestone: 4.0.0
Assignee: Zdenek Dvorak
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2004-11-11 18:08 UTC by Andrew Pinski
Modified: 2005-07-23 22:49 UTC (History)
2 users (show)

See Also:
Host:
Target: powerpc-darwin
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2004-11-11 18:08:54 UTC
We should produce the same code in the two loops
unsigned short *q;
#define NOSB 10
int last;
void h1()
{
 int i;
 for (i=0;i<last+NOSB;i++)
   {
     q[i] = 0;
   }
}
-----
unsigned short q[100];
#define NOSB 10
int last;
void h1()
{
 int i;
 for (i=0;i<last+NOSB;i++)
   {
     q[i] = 0;
   }
}

For the array one we produce:
L4:
        sth r0,0(r2)
        addi r2,r2,2
        bdnz L4
which is optimial but for the pointer one we produce:
L4:
        slwi r2,r9,1
        addi r9,r9,1
        sthx r0,r2,r11
        bdnz L4
which sucks as we have tree instructions in the loop instead of two.

I think this is a place where IV-OPTS is failing.
Comment 1 Andrew Pinski 2004-11-11 18:15:18 UTC
Note with powerpc64 we get much worse than array case which stays the same:
L4:
        sldi r2,r9,1
        addi r0,r9,1
        sthx r10,r2,r11
        extsw r9,r0
        bdnz L4
Comment 2 Andrew Pinski 2004-11-11 20:39:44 UTC
Rewritting the code this way, shows how we ahould be optimizing the code:
void h()
{
 int i;
 unsigned short *q1 = q;
 for (i=0;i<last+NOSB;i++)
   {
     *q1 = 0;
     q1++;
   }
}
Comment 3 Nathan Sidwell 2004-11-12 09:32:58 UTC
It is not what I thought it was.  The array case is optimized at the tree level,
the pointer case is optimized at the rtl level.
Comment 4 Zdenek Dvorak 2004-11-12 11:24:13 UTC
The problem seems to be that licm does not move load of q from the loop.  Ivopts
then do not recognize q + 2*i as induction variable, and thus they are
optimizing it not like an address of memory reference, but just like a
computation of 2*i, for which it obviously does not make sense to perform
strength reduction.
Comment 5 Andrew Pinski 2004-11-12 14:38:20 UTC
Patch here which should fix not pulling the load of q out of the loop:
http://gcc.gnu.org/ml/gcc-patches/2004-11/msg00957.html

Then the only thing left is for IV-OPTS to be fixed.
Comment 6 Zdenek Dvorak 2004-11-12 14:42:27 UTC
Subject: Re:  Code for arrays and pointers are not the same

> Patch here which should fix not pulling the load of q out of the loop:
> http://gcc.gnu.org/ml/gcc-patches/2004-11/msg00957.html
> 
> Then the only thing left is for IV-OPTS to be fixed.

no -- once this is done, ivopts work just fine, producing the same code
as for array access.
Comment 7 Andrew Pinski 2004-11-12 14:44:54 UTC
huh a compiler built with that patch gives:
L4:
        slwi r2,r9,1
        addi r9,r9,1
        sthx r0,r2,r11
        bdnz L4

Also pulling the load manually out loop also produce the same asm as I just produced:
unsigned short *q;
#define NOSB 10
int last;
void h1()
{
 int i;
unsigned short *q1 = q;
 for (i=0;i<last+NOSB;i++)
   {
     q1[i] = 0;
   }
}
Comment 8 Zdenek Dvorak 2004-11-12 14:46:49 UTC
Subject: Re:  Code for arrays and pointers are not the same

> > Patch here which should fix not pulling the load of q out of the loop:
> > http://gcc.gnu.org/ml/gcc-patches/2004-11/msg00957.html
> > 
> > Then the only thing left is for IV-OPTS to be fixed.
> 
> no -- once this is done, ivopts work just fine, producing the same code
> as for array access.

btw. the patch above might be wrong -- I think ssa names also pass the
is_gimple_addressable test:

is_gimple_addressable -> is_gimple_id -> is_gimple_variable -> SSA_NAME.

So the correct patch would be

Index: tree-ssa-loop-im.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-im.c,v
retrieving revision 2.21
diff -c -3 -p -r2.21 tree-ssa-loop-im.c
*** tree-ssa-loop-im.c	8 Nov 2004 13:54:39 -0000	2.21
--- tree-ssa-loop-im.c	12 Nov 2004 11:29:40 -0000
*************** stmt_cost (tree stmt)
*** 367,373 ****
    /* Hoisting memory references out should almost surely be a win.  */
    if (!is_gimple_variable (lhs))
      cost += 20;
!   if (is_gimple_addressable (rhs) && !is_gimple_variable (rhs))
      cost += 20;
  
    switch (TREE_CODE (rhs))
--- 367,373 ----
    /* Hoisting memory references out should almost surely be a win.  */
    if (!is_gimple_variable (lhs))
      cost += 20;
!   if (is_gimple_addressable (rhs) && TREE_CODE (rhs) != SSA_NAME)
      cost += 20;
  
    switch (TREE_CODE (rhs))
Comment 9 Zdenek Dvorak 2004-11-12 14:48:02 UTC
Subject: Re:  Code for arrays and pointers are not the same

> huh a compiler built with that patch gives:
> L4:
>         slwi r2,r9,1
>         addi r9,r9,1
>         sthx r0,r2,r11
>         bdnz L4
> 
> Also pulling the load manually out loop also produce the same asm as I just produced:
> unsigned short *q;
> #define NOSB 10
> int last;
> void h1()
> {
>  int i;
> unsigned short *q1 = q;
>  for (i=0;i<last+NOSB;i++)
>    {
>      q1[i] = 0;
>    }
> }

not for me:

.L4:
    sth 0,0(9)
    addi 9,9,2
    bdnz .L4

what architecture/flags are you using?

Zdenek
Comment 10 Andrew Pinski 2004-11-12 14:52:02 UTC
powerpc-darwin

just -O3

hmm, must be a local modification which changes it.
Comment 11 Andrew Pinski 2004-11-12 14:53:22 UTC
(In reply to comment #8)
> 
> is_gimple_addressable -> is_gimple_id -> is_gimple_variable -> SSA_NAME.
> 
> So the correct patch would be

I did post that patch also before it was rejected as I did not look into the aliasing problem we had.
Comment 12 Zdenek Dvorak 2004-11-12 15:10:23 UTC
Subject: Re:  Code for arrays and pointers are not the same

> powerpc-darwin
> 
> just -O3
> 
> hmm, must be a local modification which changes it.

maybe you are checking 64 bit?  That still looks terrible.
Comment 13 Andrew Pinski 2004-11-12 15:33:55 UTC
The local patch which I had in which caused this was a not so correct for PR 18293 (which we remove 
an extra copy RTL as we expand it so it looks like the cost analysis is doing something wrong which is 
why we don't do the IV-OPT when at the RTL either). 
Comment 14 GCC Commits 2004-11-14 18:04:37 UTC
Subject: Bug 18431

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	rakdver@gcc.gnu.org	2004-11-14 18:04:26

Modified files:
	gcc            : ChangeLog tree-flow.h tree-ssa-loop-im.c 
	                 tree-ssa.c 

Log message:
	PR tree-optimization/18431
	* tree-flow.h (stmt_references_memory_p): Declare.
	* tree-ssa-loop-im.c (stmt_cost): Use stmt_references_memory_p.
	* tree-ssa.c (stmt_references_memory_p): New function.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6332&r2=2.6333
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-flow.h.diff?cvsroot=gcc&r1=2.62&r2=2.63
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop-im.c.diff?cvsroot=gcc&r1=2.21&r2=2.22
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa.c.diff?cvsroot=gcc&r1=2.57&r2=2.58

Comment 15 GCC Commits 2004-11-15 00:18:44 UTC
Subject: Bug 18431

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	rakdver@gcc.gnu.org	2004-11-15 00:18:37

Modified files:
	gcc            : ChangeLog fold-const.c tree-ssa-loop-niter.c 
	                 tree.c tree.h 
	gcc/testsuite  : ChangeLog 
Added files:
	gcc/testsuite/gcc.c-torture/execute: 20041114-1.c 

Log message:
	PR tree-optimization/18431
	* fold-const.c (associate_trees): Do not produce x + 0.
	(fold_widened_comparison, fold_sign_changed_comparison): New functions.
	(fold): Use them.
	* tree-ssa-loop-niter.c (upper_bound_in_type, lower_bound_in_type):
	Moved ...
	* tree.c (upper_bound_in_type, lower_bound_in_type): Here.
	* tree.h (upper_bound_in_type, lower_bound_in_type): Declare.
	
	* testsuite/gcc.c-torture/execute/20041114-1.c: New test.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6340&r2=2.6341
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/fold-const.c.diff?cvsroot=gcc&r1=1.473&r2=1.474
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop-niter.c.diff?cvsroot=gcc&r1=2.15&r2=2.16
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree.c.diff?cvsroot=gcc&r1=1.447&r2=1.448
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree.h.diff?cvsroot=gcc&r1=1.654&r2=1.655
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/ChangeLog.diff?cvsroot=gcc&r1=1.4602&r2=1.4603
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/gcc.c-torture/execute/20041114-1.c.diff?cvsroot=gcc&r1=NONE&r2=1.1

Comment 16 Zdenek Dvorak 2004-11-15 00:20:04 UTC
Fixed.
Comment 17 Pat Haugen 2005-06-02 18:46:45 UTC
Does this need to be reopened?  I see the following for mainline.

  -m32:
.L4:
	slwi %r9,%r11,1	 #, tmp130, i
	addi %r11,%r11,1	 # i, i,
	sthx %r0,%r9,%r10	 #* q.1, tmp132
	bdnz .L4	 #



  -m64:
.L4:
	sthx %r0,%r11,%r10	 #* ivtmp.11, tmp132
	addi %r10,%r10,2	 # ivtmp.11, ivtmp.11,
	bdnz .L4	 #


Both can be improved to use a non-indexed store and increment of that base reg,
similar to what Zdenek said he saw in comment #9.