Bug 2462 - "restrict" implementation bug
Summary: "restrict" implementation bug
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 3.1
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: alias, missed-optimization
Depends on:
Blocks:
 
Reported: 2001-04-02 14:36 UTC by Dan Nicolaescu
Modified: 2010-07-13 11:28 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2005-12-30 22:25:12


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Nicolaescu 2001-04-02 14:36:01 UTC
According to http://wwwold.dkuug.dk/JTC1/SC22/WG14/www/docs/n897.pdf  
(pointed to by a link from http://gcc.gnu.org/readings.html) Chapter 
6.7.3.1 paragraph 9  global variables with a restrict qualifier should
act "as if it were declared as an array". 
An in Chapter 6.7.3.1 paragraph 4: a pointer returned from a call to
"malloc" is the initial single mean to access an array. 
The conclusion from these 2 is that a pointer from malloc cannot alias
a global restricted var. 

It seems that there is a bug when using both a restricted global var
and a pointer obtained from "malloc" call. 
As shown in the example bellow when using either of them individualy
"restrict" works correctly. 

Here is an example:

float *  __restrict__ d;
float *  __restrict__ g;
float *  __restrict__ h;

struct two_floatp {
  float *first;
  float *second;
};

/* a malloced pointer, local restricted vars and a global restricted var */
float* 
foo (float * a, float * b, float * c, int n)
{
  int i;
  float *f;
  f = (float*) malloc (n * sizeof (float));


  {
    float * __restrict__ p = a;
    float * __restrict__ q = b;
    float * __restrict__ r = c;
    
    for (i = 0;i < n; i++)
      {
        f[i] = p[i] + q[i];
        d[i] = p[i] + r[i]; 
        f[i] += p[i] + q[i];
      }

  }
  return f;
}


/* a malloced pointer and a restricted global var */
float* 
bar (float *  __restrict__ b,
     float *  __restrict__ c, int n)
{
  int i;

  float * f = (float*) malloc (n * sizeof (float));
  
  for (i = 0;i < n; i++)
    {
      f[i] = b[i] + c[i];
      d[i] = b[i] + c[i]; 
      f[i] += b[i] + c[i];
    }
  return f;
}


/* 2 malloced pointers */
struct two_floatp
foobar (float *  __restrict__ b,
        float *  __restrict__ c, int n)
{
  int i;

  struct two_floatp retval;

  float * f = (float*) malloc (n * sizeof (float));
  float * ff = (float*) malloc (n * sizeof (float));
  
  for (i = 0;i < n; i++)
    {
      f[i] = b[i] + c[i];
      ff[i] = b[i] + c[i]; 
      f[i] += b[i] + c[i];
    }
  retval.first = f;
  retval.second = ff;
  return retval;
}

/* 2 restricted global vars */
float* 
baz (float *  __restrict__ b,
     float *  __restrict__ c, int n)
{
  int i;
  for (i = 0;i < n; i++)
    {
      g[i] = b[i] + c[i];
      d[i] = b[i] + c[i]; 
      g[i] += b[i] + c[i];
    }
  return g;
}

Following is the SPARC assembly just for the loops from all the functions.
	

foo:
[snip]	
.LL5:
	sll	%o2, 2, %o0
	ld	[%i0+%o0], %f3
	add	%o2, 1, %o2
	ld	[%i1+%o0], %f4
	cmp	%o2, %i3
	fadds	%f3, %f4, %f4
	ld	[%i2+%o0], %f2
	fadds	%f3, %f2, %f3
	st	%f4, [%o1+%o0]
	st	%f3, [%o3+%o0]
	ld	[%o1+%o0], %f2
	fadds	%f2, %f4, %f2
	bl	.LL5
	st	%f2, [%o1+%o0]
	
Note there are 2 extra loads

		
bar:
[snip]	
.LL12:
	sll	%o2, 2, %o0
	ld	[%i0+%o0], %f3
	add	%o2, 1, %o2
	ld	[%i1+%o0], %f2
	cmp	%o2, %i2
	fadds	%f3, %f2, %f3
	st	%f3, [%o1+%o0]
	st	%f3, [%o3+%o0]
	ld	[%o1+%o0], %f2
	fadds	%f2, %f3, %f2
	bl	.LL12
	st	%f2, [%o1+%o0]
	
Note one extra load
		
foobar:
[snip]	
.LL19:
	sll	%o1, 2, %o0
	ld	[%i0+%o0], %f2
	add	%o1, 1, %o1
	ld	[%i1+%o0], %f3
	cmp	%o1, %i2
	fadds	%f2, %f3, %f2
	fadds	%f2, %f2, %f4
	st	%f2, [%o2+%o0]
	bl	.LL19
	st	%f4, [%l1+%o0]

This one is fine.
		
baz:
[snip]	
.LL26:
	sll	%i3, 2, %i0
	ld	[%o7+%i0], %f2
	add	%i3, 1, %i3
	ld	[%i1+%i0], %f3
	cmp	%i3, %i2
	fadds	%f2, %f3, %f2
	fadds	%f2, %f2, %f4
	st	%f2, [%i5+%i0]
	bl	.LL26
	st	%f4, [%i4+%i0]
	b	.LL30
	ld	[%g1+%lo(g)], %i0
	
As is this one.

Release:
gcc version 3.1 20010402 (experimental)

Environment:
sparc-sun-solaris2.7
But the problem is arhitecture independent.
The problem is also present in gcc-2.95.2

How-To-Repeat:
Compile with gcc -O2 -fstrict-aliasing -S 
and look at the assembly for the foo and bar functions.
Comment 1 Dan Nicolaescu 2003-03-24 17:53:26 UTC
From: Dan Nicolaescu <dann@ics.uci.edu>
To: bangerth@dealii.org
Cc: gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org, nobody@gcc.gnu.org,
   gcc-gnats@gcc.gnu.org
Subject: Re: c/2462: "restrict" implementation bug
Date: Mon, 24 Mar 2003 17:53:26 -0800

 bangerth@dealii.org writes:
 
   > Synopsis: "restrict" implementation bug
   > 
   > State-Changed-From-To: open->feedback
   > State-Changed-By: bangerth
   > State-Changed-When: Tue Mar 25 01:28:38 2003
   > State-Changed-Why:
   >     Dan, I don't have this platform, so have to ask: this report
   >     is now almost 2 years old, can you say anything about whether
   >     the problem still persists with present versions of gcc?
 
 Yes it does, CVS gcc still has the same problem on SPARC. 
 
 
 
   >     Thanks
   >       Wolfgang
   > 
   > http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=2462
Comment 2 Wolfgang Bangerth 2003-03-25 01:28:38 UTC
State-Changed-From-To: open->feedback
State-Changed-Why: Dan, I don't have this platform, so have to ask: this report
    is now almost 2 years old, can you say anything about whether
    the problem still persists with present versions of gcc?
    
    Thanks
      Wolfgang
Comment 3 Wolfgang Bangerth 2003-03-25 02:35:35 UTC
State-Changed-From-To: feedback->analyzed
State-Changed-Why: Still a problem. Can at least be moved into "analyzed" state
    this way.
    
    Thanks for the quick feedback! W.
Comment 4 Dan Nicolaescu 2003-05-03 16:16:52 UTC
From: Dan Nicolaescu <dann@ics.uci.edu>
To: bangerth@dealii.org
Cc: gcc-gnats@gcc.gnu.org
Subject: Re: c/2462: "restrict" implementation bug
Date: Sat, 03 May 2003 16:16:52 -0700

 With the tweaks below the code in this PR can be added to the
 GCC testsuite in case somebody wants to do that.
 
 /* { dg-do link } */
 
 #include <stdlib.h>
 
 int *  __restrict__ d;
 int *  __restrict__ g;
 int *  __restrict__ h;
 
 struct two_intp {
   int *first;
   int *second;
 };
 
 extern void link_error(void);
 
 /* a malloced pointer, local restricted vars and a global restricted var */
 int*
 foo (int *a, int *b, int *c, int n)
 {
   int i;
   int *f;
   f = (int*) malloc (n * sizeof (int));
 
 
   {
     int * __restrict__ p = a;
     int * __restrict__ q = b;
     int * __restrict__ r = c;
    
     for (i = 0;i < n; i++)
       {
         p[i] = 1;
         q[i] = 1;
         r[i] = 1;
         f[i] = p[i] + q[i];
 
         if (f[i] != 2)
           link_error ();
 
         d[i] = p[i] + r[i];
 
         if (d[i] != 2)
           link_error ();
 
         f[i] += p[i] + q[i];
 
         if (f[i] != 4)
           link_error ();
       }
 
   }
   return f;
 }
 
 
 /* a malloced pointer and a restricted global var */
 int*
 bar (int *  __restrict__ b,
      int *  __restrict__ c, int n)
 {
   int i;
 
   int * f = (int*) malloc (n * sizeof (int));
  
   for (i = 0;i < n; i++)
     {
       b[i] = 1;
       c[i] = 1;
       
       f[i] = b[i] + c[i];
 
       if (f[i] != 2)
         link_error ();
 
       if ((b[i] != 1) || (c[i] != 1))
           link_error ();
       
       d[i] = b[i] + c[i];
       
       if (d[i] != 2)
         link_error ();
       
       f[i] += b[i] + c[i];
 
       if (f[i] != 4)
         link_error ();
       
     }
   return f;
 }
 
 
 /* 2 malloced pointers */
 struct two_intp
 foobar (int *  __restrict__ b,
         int *  __restrict__ c, int n)
 {
   int i;
 
   struct two_intp retval;
 
   int * f = (int*) malloc (n * sizeof (int));
   int * ff = (int*) malloc (n * sizeof (int));
  
   for (i = 0;i < n; i++)
     {
       b[i] = 1;
       c[i] = 1;
 
       f[i] = b[i] + c[i];
 
       if (f[i] != 2)
         link_error ();
       
       ff[i] = b[i] + c[i];
 
       if (ff[i] != 2)
         link_error ();
 
       f[i] += b[i] + c[i];
 
       if (f[i] != 4)
         link_error ();
 
     }
   retval.first = f;
   retval.second = ff;
   return retval;
 }
 
 /* 2 restricted global vars */
 int*
 baz (int *  __restrict__ b,
      int *  __restrict__ c, int n)
 {
   int i;
   for (i = 0;i < n; i++)
     {
       b[i] = 1;
       c[i] = 1;
       g[i] = b[i] + c[i];
       
       if (g[i] != 2)
         link_error ();
       
       d[i] = b[i] + c[i];
 
       if (d[i] != 2)
         link_error ();
 
       g[i] += b[i] + c[i];
 
       if (g[i] != 4)
         link_error ();
     }
   return g;
 }
 
 int main (void)
 {
 
   int *a,  *b,  *c, *f, n;
   struct two_intp twoints;
   f = foo (a, b, c, n);
   f = bar (b, c, n);
   twoints = foobar (b, c, n);
   f = baz (b, c, n);
   return 0;
 }
Comment 5 Andrew Pinski 2007-05-22 02:09:27 UTC
pointer plus branch helps out with the heap allocated memory, may_alias gets less confused with them.  It does not fully fix this bug but it does help out.
Comment 6 Uroš Bizjak 2009-06-25 08:58:00 UTC
Oops...
Comment 7 Richard Biener 2009-06-25 10:28:19 UTC
With the new restrict implementation baz() works and all the rest would work
as well if the calls to link_error () would not cause the malloced memory
to be clobbered.  The artifact here is that malloced memory is considered
global (we are not allowed to remove stores to it).

But this is all unrelated to restrict support which should be properly
fixed now.
Comment 8 Dan Nicolaescu 2009-06-25 15:31:40 UTC
(In reply to comment #7)
> With the new restrict implementation baz() works and all the rest would work
> as well if the calls to link_error () would not cause the malloced memory
> to be clobbered.  The artifact here is that malloced memory is considered
> global (we are not allowed to remove stores to it).

The intention for link_error was to just make it easier to write a test, not to prohibit optimization.
Please feel free to adjust the code accordingly.

Comment 9 Steven Bosscher 2010-07-13 10:48:34 UTC
Restrict has been implemented anew for GCC 4.6.  Does that fix this bug?
Comment 10 Richard Biener 2010-07-13 11:12:43 UTC
(In reply to comment #9)
> Restrict has been implemented anew for GCC 4.6.  Does that fix this bug?

In 4.5, see comment #7 for the status of this bug.
Comment 11 Steven Bosscher 2010-07-13 11:28:13 UTC
We have a separate bug for malloced memory. So this bug is FIXED.