Bug 16913 - [4.2/4.3/4.4 Regression] restrict does not make a difference
Summary: [4.2/4.3/4.4 Regression] restrict does not make a difference
Status: RESOLVED DUPLICATE of bug 14187
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.0.0
: P5 minor
Target Milestone: 4.2.5
Assignee: Not yet assigned to anyone
URL:
Keywords: alias, missed-optimization
Depends on: 14187 14192 16306
Blocks: 33272
  Show dependency treegraph
 
Reported: 2004-08-07 11:41 UTC by Martin Reinecke
Modified: 2008-10-01 14:33 UTC (History)
11 users (show)

See Also:
Host:
Target:
Build:
Known to work: 3.4.2
Known to fail: 4.0.0 4.0.4
Last reconfirmed: 2006-10-24 12:43:05


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Reinecke 2004-08-07 11:41:52 UTC
for the following C code the current mainline produces worse code than
the current 3.4 branch. Mainline doesn't seem to recognize that the pointers
a and b may not alias.

#define N 200

void foo ( float * restrict a, float * restrict b, int n, int j)
  {
  int i;
  for( i=0; i<n; ++i )
    a[i] = (b[j+N/4] + b[j-N/4]) * 0.5f;
  }

Compiling with mainline using -O3 -fstrict-alilasing -std=c99 -S produces
the following assembler for the loop:

.L4:
	flds	(%eax)
	fadds	(%ecx)
	fmul	%st(1), %st
	fstps	(%esi,%edx,4)
	incl	%edx
	cmpl	%edx, %ebx
	jg	.L4


The current 3.4 version produces: 
.L5:
	fsts	(%ecx,%eax,4)
	incl	%eax
	cmpl	%edx, %eax
	jl	.L5

In this case, all floating point computations are hoisted out of the loop.
Comment 1 Andrew Pinski 2004-08-07 22:18:16 UTC
Really this a RTL problem but this is also a missed-optimization from the tree level too.

Confirmed a regression.
Comment 2 Jie Zhang 2004-08-18 03:33:05 UTC
I'm investigating this. It seems that gcc failed to get the correct alias set
for b[j+N/4]. Current get_alias_set () uses find_base_decl () to find the base
pointer that the address pointer expression depends on. But when the pointer
expression has been gimplified, find_base_decl () cannot follow the ud chains to
find the true base pointer. My initial thought is replace find_base_decl with
something that can walk ud chains to find the true base pointer.

And When I looked tts t04.generic dump

foo (a, b, n, j)
{
  unsigned int i.0;
  unsigned int T.1;
  float * restrict T.2;
  float * restrict T.3;
  unsigned int j.4;
  unsigned int T.5;
  float * restrict T.6;
  float * restrict T.7;
  float * restrict T.8;
  float T.9;
  float * restrict T.10;
  float T.11;
  float T.12;
  float T.13;
  int i;

  i = 0;
  goto <D1042>;
  <D1041>:;
  i.0 = (unsigned int)i;
  T.1 = i.0 * 4;
  T.2 = (float * restrict)T.1;
  T.3 = T.2 + a;
  j.4 = (unsigned int)j;
  T.5 = j.4 * 4;
  T.6 = (float * restrict)T.5;
  T.7 = T.6 + b;
  T.8 = T.7 + 200B;
  T.9 = *T.8;
  j.4 = (unsigned int)j;
  T.5 = j.4 * 4;
  T.6 = (float * restrict)T.5;
  T.7 = T.6 + b;
  T.10 = T.7 - 200B;
  T.11 = *T.10;
  T.12 = T.9 + T.11;
  T.13 = T.12 * 5.0e-1;
  *T.3 = T.13;
  i = i + 1;
  <D1042>:;
  if (i < n)
    {
      goto <D1041>;
    }
  else
    {
      goto <D1043>;
    }
  <D1043>:;
}

I found it was rather weird that "T.1" was first cast to (float * restrict)
before it was added with "a". I think it's  better to directly add the index to
"a" without the cast. I asked this on the gcc mailing list with a simpler case
<http://gcc.gnu.org/ml/gcc/2004-08/msg00700.html>.
Comment 3 Martin Reinecke 2004-08-18 08:22:10 UTC
BTW, the original testcase is part of a more comprehensive test given by
Arch Robison in
http://www.cuj.com/documents/s=8050/cuj9907robison/
Maybe the whole program given in Listing 1 should be added to the gcc testsuite
to see how well gcc deals with restricted pointers.
What do you think?
Comment 4 Jie Zhang 2004-08-19 03:12:32 UTC
Yeah, it should be added into testsuite. But I don't know how to add such
optimization test.

And I find it will be difficult to write something to replace find_base_decl ()
if we cast the index to pointer before add it to the base pointer. I'm thinking
how to remove the cast, but I'm not familiar with the C type system in GCC.

Could someone give some comments before I go ahead? Thanks.
Comment 5 Andrew Pinski 2004-11-22 07:12:48 UTC
The way to fix this is before going out of SSA to have another HOST_WIDE_INT in INDIRECT_REF to say 
the aliasing set for that tree.
And also add this patch which looks at the use defs:
Index: alias.c
===============================================================
====
RCS file: /cvs/gcc/gcc/gcc/alias.c,v
retrieving revision 1.246
diff -u -p -r1.246 alias.c
--- alias.c     11 Nov 2004 23:08:55 -0000      1.246
+++ alias.c     22 Nov 2004 07:08:29 -0000
@@ -347,6 +347,15 @@ find_base_decl (tree t)
 
   if (t == 0 || t == error_mark_node || ! POINTER_TYPE_P (TREE_TYPE (t)))
     return 0;
+    
+  if (TREE_CODE (t) == SSA_NAME)
+    {
+      d0 = SSA_NAME_DEF_STMT (t);
+      if (TREE_CODE (d0) == MODIFY_EXPR)
+       return find_base_decl (TREE_OPERAND (d0, 1));
+      else
+        return 0;
+    }
 
   /* If this is a declaration, return it.  */
   if (DECL_P (t))

Currently if I use the following optimizations I get the optimization:
-fno-tree-pre -fno-tree-loop-im -fno-tree-dominator-opts -fno-ivopts
The reason why PRE and loop-im is needed is because we would pull out the b[j] out of the loop.
the reason why dominator-opts is need is because we would combine b[j] for the INDIRECT's.
The reason why -fno-ivopts is needed is because we would change a[i] into *iv.opt
Comment 6 Steven Bosscher 2005-02-10 10:50:58 UTC
The real problem here is that the tree alias analyses do not take full 
advantage of 'restrict'.  There are more PRs about this, and it is also *the* 
major source of regressions in a well-known commercial benchmark. 
 
But this is not fixable for GCC 4.0, it's too late for that now I think. 
 
Comment 7 Steven Bosscher 2005-02-10 10:55:19 UTC
Add some dependencies to other restrict-related problem reports. 
Comment 8 Mark Mitchell 2005-10-30 22:47:06 UTC
Marked as P5; I don't think missed-optimizations depending on "restrict" are release-critical.
Comment 9 Gabriel Dos Reis 2007-01-18 03:01:52 UTC
Won't fix for GCC-4.0.x
Comment 10 Andrew Pinski 2007-06-11 00:47:03 UTC
There are a couple of issues here, first pointer_plus improves the aliasing set issue, but then PRE comes around and messes it up because it does not add pointer types which have DECL_BASED_ON_RESTRICT_P/DECL_GET_RESTRICT_BASE setup correctly.  

Disabling PRE on powerpc-linux-gnu (on the pointer_plus branch) is enough to get the RTL optimizers to optimize away the extra loads and we get for the inner loop:
.L3:
        stfsx 0,9,3
        addi 9,9,4
        bdnz .L3
Which is almost the best you can do :).



One more issue (for x86) is expand emits code that causes the rtl optimizers not to optimize well as they only look into loads in sets.  I don't know how to fix that issue without fixing restrict at the tree level.
Comment 11 Andrew Pinski 2007-07-04 17:02:00 UTC
>Which is almost the best you can do :).
Well except to use store with update but that is an IV-OPTs issue.
Comment 12 Joseph S. Myers 2008-07-04 16:32:36 UTC
Closing 4.1 branch.
Comment 13 Richard Biener 2008-10-01 14:33:29 UTC
This is a dup of PR14187 - we simply fail to implement restrict properly.

*** This bug has been marked as a duplicate of 14187 ***