Bug 49279 - [4.5 Regression] Optimization incorrectly presuming constant variable inside loop in g++ 4.5 and 4.6 with -O2 and -O3 for x86_64 targets
Summary: [4.5 Regression] Optimization incorrectly presuming constant variable inside ...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.0
: P2 major
Target Milestone: 4.5.4
Assignee: Richard Biener
URL:
Keywords: wrong-code
: 48764 (view as bug list)
Depends on: 48764
Blocks:
  Show dependency treegraph
 
Reported: 2011-06-03 21:23 UTC by Thiago Martins
Modified: 2012-01-03 14:05 UTC (History)
7 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.4.6, 4.6.2, 4.7.0
Known to fail: 4.5.4, 4.6.0
Last reconfirmed: 2011-06-04 16:37:22


Attachments
Testcase (reduced automatically using multidelta) (9.07 KB, text/x-c++src)
2011-06-03 21:23 UTC, Thiago Martins
Details
patch (845 bytes, patch)
2011-10-05 09:04 UTC, Jakub Jelinek
Details | Diff
CAST_RESTRICT removal (1.33 KB, patch)
2011-10-05 15:48 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Thiago Martins 2011-06-03 21:23:26 UTC
Created attachment 24427 [details]
Testcase (reduced automatically using multidelta)

GCC is apparently producing the wrong code with Eigen2 (a template-based linear algebra library) with optimization levels -O3 and -O2 for x86_64-unknown-linux-gnu targets. A reduced test case is provided that reproduces the error.

As I understand, the core of the problem is this loop (line 1132 of
the submitted test case):

for (; (ProcessFirstHalf ? i && i.index () < j : i); ++i)  {
   if (LhsIsSelfAdjoint) {
      int a = LhsIsRowMajor ? j : i.index ();
      int b = LhsIsRowMajor ? i.index () : j;
      Scalar v = i.value ();
      derived ().row (b) += ei_conj (v) * product.rhs ().row (a);        
  }
}

which is being translated into:

	movq	-8(%rsp), %rsi
	movq	(%rsi), %rbp
	addq	%rdx, %rbp
	movsd	0(%rbp), %xmm1   # <- %xmm1 is initialized here and 
.L5:                             #    no longer touched!
	leaq	0(,%rcx,8), %rsi
	leaq	4(%r8,%rcx,4), %r8
	movl	%r9d, %ecx
	jmp	.L8
.L13:                              # <-Loop here!!!
	movl	(%r8), %r10d
	addq	$4, %r8
.L8:
	movsd	0(%r13,%rsi), %xmm0
	movslq	%r10d, %r10
	addl	$1, %ecx
	mulsd	(%r12,%r10,8), %xmm0
	cmpl	%r11d, %ecx
	addsd	%xmm1, %xmm0
	movsd	%xmm0, 0(%rbp)   # <- % shouldn't %xmm1 be updated here?
	je	.L3              
	addq	$8, %rsi
	cmpl	%ecx, %r9d
	jle	.L13              # <- Loop ends 

the sum operation on line

 derived ().row (b) += ei_conj (v) * product.rhs ().row (a);

is apparently being performed by the instruction

 addsd	%xmm1, %xmm0

but the value of %xmm1 isn't being updated inside the loop!! Apparently the compiler is presuming derived ().row (b) is constant inside the loop, which is
evidently *not* true. Since the value of %xmm1 is never updated, the 
value of derived ().row (b) at the end of the loop is equal to the last 
ei_conj (v) * product.rhs ().row (a) result.

The bug was verified on gcc versions 4.5.2 and 4.6.0 with -O2 and -O3 switches. The compiler produces the correct code with -O0 and -O switches.

It is *NOT* present on the 4.4 branch (that is, 4.4 compiles the code correctly) for -O0, -0, -02 and -O3 switches. 

I suppose it is a regression of the 4.5 branch.

Command line (for gcc 4.6.0):
/opt/gnu/gcc-4.6/bin/g++ -v -save-temps -nostdinc -O3  testcase.a.cpp

Compiler output:
Using built-in specs.
COLLECT_GCC=/opt/gnu/gcc-4.6/bin/g++
COLLECT_LTO_WRAPPER=/opt/gnu/gcc-4.6/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../configure : (reconfigured) ../configure : (reconfigured) ../configure
Thread model: posix
gcc version 4.6.0 (GCC) 
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-nostdinc' '-O3' '-shared-libgcc' '-mtune=generic' '-march=x86-64'
 /opt/gnu/gcc-4.6/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/cc1plus -E -quiet -nostdinc -v -iprefix /opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/ -D_GNU_SOURCE testcase.a.cpp -mtune=generic -march=x86-64 -O3 -fpch-preprocess -o testcase.a.ii
#include "..." search starts here:
#include <...> search starts here:
End of search list.
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-nostdinc' '-O3' '-shared-libgcc' '-mtune=generic' '-march=x86-64'
 /opt/gnu/gcc-4.6/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/cc1plus -fpreprocessed testcase.a.ii -quiet -dumpbase testcase.a.cpp -mtune=generic -march=x86-64 -auxbase testcase.a -O3 -version -o testcase.a.s
GNU C++ (GCC) version 4.6.0 (x86_64-unknown-linux-gnu)
        compiled by GNU C version 4.6.0, GMP version 4.3.2, MPFR version 3.0.0-p8, MPC version 0.9
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU C++ (GCC) version 4.6.0 (x86_64-unknown-linux-gnu)
        compiled by GNU C version 4.6.0, GMP version 4.3.2, MPFR version 3.0.0-p8, MPC version 0.9
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 3f3899c46d47b31a2bc0cb7f3d1408a6
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-nostdinc' '-O3' '-shared-libgcc' '-mtune=generic' '-march=x86-64'
 as --64 -o testcase.a.o testcase.a.s
COMPILER_PATH=/opt/gnu/gcc-4.6/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/:/opt/gnu/gcc-4.6/bin/../libexec/gcc/
LIBRARY_PATH=/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/:/opt/gnu/gcc-4.6/bin/../lib/gcc/:/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/usr/lib/nvidia-current/:/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-nostdinc' '-O3' '-shared-libgcc' '-mtune=generic' '-march=x86-64'
 /opt/gnu/gcc-4.6/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/collect2 --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib/../lib64/crt1.o /usr/lib/../lib64/crti.o /opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/crtbegin.o -L/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0 -L/opt/gnu/gcc-4.6/bin/../lib/gcc -L/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/nvidia-current -L/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/../../.. testcase.a.o -lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/crtend.o /usr/lib/../lib64/crtn.o

A test case was produced with the preprocessed output (generated from Eigen version 2.0.15) and automatically reduced using the multidelta tool. Testcase included.
Comment 1 H.J. Lu 2011-06-04 16:37:22 UTC
It is caused by revision 155360:

http://gcc.gnu.org/ml/gcc-cvs/2009-12/msg00504.html
Comment 2 Richard Biener 2011-06-06 09:05:21 UTC
Passes with -fno-strict-aliasing.  Presumably the alias-improvement branch
merge is the real issue.  Not sure if it isn't an issue with the testcase
which has

    inline const Derived & derived () const
    {
      return *static_cast < const Derived *>(this);
    } inline Derived & derived ()
    {
      return *static_cast < Derived * >(this);
    }

I wasn't able to follow the template instantiation quickly ... ;)

I will have a look.
Comment 3 Richard Biener 2011-06-06 13:00:44 UTC
Indeed PRE thinks that it is loop invariant, presumambly because of TBAA:

  D.7681_206 = MEM[(const double &)D.7674_199];
  D.7680_208 = D.7425_99 * D.7681_206;
  D.7663_216 = MEM[(const double &)SR.167_105];
  D.7664_217 = D.7680_208 + D.7663_216;
  MEM[(Scalar &)SR.167_105] = D.7664_217;

and MEM[(const double &)SR.167_105] is sought to be loop-invariant, not
aliasing MEM[(Scalar &)SR.167_105].

We do not see the must-alias because SR.167_105 is value-numbered to some
other SSA name and value_dies_in_block_x (after fixing another bug in that
function) does not do value-replacement on the stmt it queries
stmt_may_clobber_ref_p_1.  Then the oracle figures

  if (!ptr_derefs_may_alias_p (ptr1, ptr2))
    return false;

and both vars point to restrict tags (but different restrict tags).  So
this looks to me related to PR48764.

Of course this case is somewhat different in that we basically have

struct {
  double * restrict x;
} m;

double * restrict wrap () { return m.x; }

double x;
m.x = &x;
for (;;)
  wrap (&x) = wrap (&x) + 1;

and GCC thinks that the two restrict qualified pointers point to different
things.

See the restrict qualified m_data pointer in MapBase.  As you are
creating a new object via forceAligned -> force_aligned_impl.run
-> _convertToForceAligned you effectively are loading from two different
restrict qualified pointers.

I'll fixup the bug I noticed in value_dies_in_block_x, but it won't fix
this PR.
Comment 4 Richard Biener 2011-06-06 14:03:27 UTC
  # PT = nonlocal escaped
  D.7613_101 = MEM[(struct ei_matrix_storage *)this_30(D)];
  # PT = nonlocal escaped
  D.7616_104 = D.7613_101 + D.7582_90;
  # PT = nonlocal escaped { D.7934 } (restr)
  SR.167_105 = (const Scalar * restrict) D.7616_104;

  # PT = nonlocal escaped
  D.7671_127 = MEM[(struct ei_matrix_storage *)this_30(D)];
  D.7672_129 = D.7554_68 * 8;
  # PT = nonlocal escaped
  D.7674_130 = D.7671_127 + D.7672_129;
  # PT = nonlocal escaped { D.7936 } (restr)
  SR.169_131 = (const Scalar * restrict) D.7674_130;

so while the fix for PR48764 makes this_30 point to a single tag that
would make derived pointers derived it doesn't seem to work for this
case.  I have a variant that does.
Comment 5 Jakub Jelinek 2011-10-04 16:47:22 UTC
Looking at this (don't see how this is related to the other PR), I think the bug is either that we have these TYPE_RESTRICT casts (caused by storing a non-restricted pointer into a restricted field) in the IL at all, or that we are propagating the restrict tags through the restricted field which we have decided not to have a restrict tag for.  If it is a field that can have a restrict tag safely attached to it (restrict field in a structure argument, or DECL_BY_REFERENCE/restrict REFERENCE_TYPE argument pointed structure, or global variable field, or perhaps automatic variable without address taken, the casts for store to it should either not have (restr) at all, or at least should use the same restrict tag as the field.  But if the field can't have safely a restrict tag, if we make it up because of the casts for store to it, different
stores might be with different tags and if two different pointers alias and PTA doesn't figure it out, we have expressions based on the same restricted field
with different restrict tags and miscompile.

Now the question is how to safely find out what is a cast for store from other casts.  TYPE_RESTRICT casts from non-restrict pointers feeding one stmt which stores it into restricted field?  Or should we just say such casts shouldn't be in the IL?  Or should only TYPE_RESTRICT casts to SSA_NAMEs of non-artificial vars be considered?

In the testcase, all the __restrict casts are casts for store into some field that can't have safely a restrict tag attached:
  # PT = nonlocal escaped { D.7761 } (restr)
  data.11D.7427_92 = (const ScalarD.4717 * restrict) D.7426_91;
  # .MEMD.7374_151 = VDEF <.MEMD.7374_52>
  MEM[(struct MapBaseD.4571 *)&D.4884].m_dataD.4766 = data.11D.7427_92;
Comment 6 Jakub Jelinek 2011-10-04 16:57:36 UTC
Short testcase that is miscompiled:
struct S { int a; int *__restrict p; };

struct S *bar (struct S *);

int
foo (int *p, int *q, int z)
{
  struct S s, *t;
  s.a = 1;
  s.p = p;
  t = bar (&s);
  t->p = q;
  s.p[0] = 0;
  t->p[0] = 1;
  return s.p[0];
}

If bar returns the parameter it has been passed, s and *t are the same object, so it must return 1 instead of 0 that 4.6 as well as trunk return.
Comment 7 Jakub Jelinek 2011-10-05 08:09:24 UTC
BTW, the extra problematic casts aren't coming from the frontend, it is the gimplifier that is inserting them:
  /* Insert pointer conversions required by the middle-end that are not
     required by the frontend.  This fixes middle-end type checking for
     for example gcc.dg/redecl-6.c.  */
  if (POINTER_TYPE_P (TREE_TYPE (*to_p)))
    {
      STRIP_USELESS_TYPE_CONVERSION (*from_p);
      if (!useless_type_conversion_p (TREE_TYPE (*to_p), TREE_TYPE (*from_p)))
        *from_p = fold_convert_loc (loc, TREE_TYPE (*to_p), *from_p);
    }
Comment 8 Jakub Jelinek 2011-10-05 09:04:59 UTC
Created attachment 25419 [details]
patch

An incomplete patch to avoid those casts from the IL.
We'd need to adjust SRA to add casts if a restrict field is scalarized and a non-restrict pointer is stored into it.  It fixes this short testcase though.
It is not enough though, because FRE/PRE can see through the field store
and will create undesirable restrict casts:
  t_3->p = q_4(D);
...
  D.2099_6 = t_3->p;
becomes
  t_3->p = q_4(D);
...
  D.2099_6 = (int * restrict) q_4(D);
during fre1 (in this case it doesn't matter, as for s.p FRE can't see through
the pointer and thus it doesn't have a restrict tag).  So, either FRE/PRE (and whatever else) would need to give up on seeing through in these non-restrict -> restrict field cases, or it would be better instead of avoiding these casts
make non-restrict -> restrict casts from user code explicit (e.g. some different kind of gimple unary assignment), treat them more carefully during optimizations (yes, it would prevent some optimizations, but hopefully restrict is used only where it actually can give more improvements from successful disambiguation) and only handle those as CAST_RESTRICT and not random other !TYPE_RESTRICT -> TYPE_RESTRICT conversions in the IL which are there just for IL consistency
(we could even make such non-special !restrict -> restrict conversions useless).
Comment 9 rguenther@suse.de 2011-10-05 09:40:58 UTC
On Wed, 5 Oct 2011, jakub at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49279
> 
> --- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-10-05 09:04:59 UTC ---
> Created attachment 25419 [details]
>   --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25419
> patch
> 
> An incomplete patch to avoid those casts from the IL.
> We'd need to adjust SRA to add casts if a restrict field is scalarized and a
> non-restrict pointer is stored into it.  It fixes this short testcase though.
> It is not enough though, because FRE/PRE can see through the field store
> and will create undesirable restrict casts:
>   t_3->p = q_4(D);
> ...
>   D.2099_6 = t_3->p;
> becomes
>   t_3->p = q_4(D);
> ...
>   D.2099_6 = (int * restrict) q_4(D);
> during fre1 (in this case it doesn't matter, as for s.p FRE can't see through
> the pointer and thus it doesn't have a restrict tag).  So, either FRE/PRE (and
> whatever else) would need to give up on seeing through in these non-restrict ->
> restrict field cases, or it would be better instead of avoiding these casts
> make non-restrict -> restrict casts from user code explicit (e.g. some
> different kind of gimple unary assignment), treat them more carefully during
> optimizations (yes, it would prevent some optimizations, but hopefully restrict
> is used only where it actually can give more improvements from successful
> disambiguation) and only handle those as CAST_RESTRICT and not random other
> !TYPE_RESTRICT -> TYPE_RESTRICT conversions in the IL which are there just for
> IL consistency
> (we could even make such non-special !restrict -> restrict conversions
> useless).

I think rather than trying to remove those casts from the IL we should
think if preserving restrict optimizations for inlining is really
important (which is what the cast handling is for).  The cast handling,
compared to the existing handling of arguments and global vars, is
really sensitive to optimizations to be 'correct' (see the other PR
where we inline a restrict argument function twice and assign
different restrict tags during PTA to the two copies and do
wrong cross inlined-body optimizations).

So, for correctness we should consider removing the cast handling.
(My proposed fix for the other PR sofar was to assign a fake tag
to every pointer so propagation can see the two pointers (with
different restrict tag) are still based on each other).

Richard.
Comment 10 Richard Biener 2011-10-05 14:38:40 UTC
After talking about this for some time another idea came up.  Basically, assign
the restrict tags for parameters at gimplification time by gimplifying

foo (int * restrict p)
{

to

foo (int * restrict p)
{
  p = RESTRICT <p, tag>;

with tag coming from allocate_decl_uid ().  We would use these copies
as restrict tag sources using the specified UID.  Thus every inline
copy (and clone) would share them.

We could even expose this as __builtin_restrict (p, tag) (with a mapping
from user tags to uids) much similar to __builtin_assume_aligned.

The fortran frontend could use this instead of restrict qualification
for the descriptor->data loads.
Comment 11 Jakub Jelinek 2011-10-05 15:48:57 UTC
Created attachment 25423 [details]
CAST_RESTRICT removal

Attaching a test patch that just removed CAST_RESTRICT altogether, plus IRC discussion that lead to it.  The only testsuite regressions are Wobjsize-1.c and strlenopt-4gf.c which show an important security related problem - we probably shouldn't be folding builtins if DECL_INITIAL (fndecl) != NULL && DECL_DECLARED_INLINE_P (fndecl) && cfun && !cfun->after_inlining,
because then we happily fold e.g. char buf[2]; strcpy (buf, "abcd"); into
__builtin_memcpy even when strcpy is always_inline inline wrapper that calls
__builtin___strcpy_chk and would complain about the buffer overflow resp. add runtime checking.

jakub> IMHO something like RESTRICT_CAST_EXPR gimple stmts or handling as CAST_RESTRICT only non-restrict -> restrict assignments to non-artificial vars would be some form of smaller CAST_RESTRICT removal
jakub> with both, even if we forward propagate or CSE the rhs, we wouldn't create the undesirable restrict tags
richi> yeah
richi> it would also remove that awkward single non-useless pointer cast
jakub> while still restrict could be (somewhat) supported for inlined routines and even for cases where one writes { int *__restrict p = something; *p ... in the code
richi> of course we'd have to handle CAST_RESTRICT everywhere ...
richi> but currently it's very fragile
jakub> yes, it would be some work
richi> maybe try removing the cast handling first?
richi> and see how many people complain?
richi> it would be the solution for release branches anyway I guess
jakub> would be nice to see SPEC numbers with that removed first
richi> not a single restrict in SPEC ...
jakub> really?  Even in 2k6?
richi> really.
richi> test coverage is very bad for restrict :/
richi> we get some via fortran of course
richi> but those are for globals and params only
jakub> but with inlined fortran routines you get the restrict casts, right?
richi> but you get them like the case you want to remove - descriptor->data = (restrict *)x;
richi> but yes
jakub> yeah, those are definitely undesirable
richi> though we either get a decl in the caller (even better) or another restrict param passed throuhg in fortran
jakub> anyway, if you want, I can try the RESTRICT_CAST_EXPR (or CAST_RESTRICT_EXPR, better suggestion) approach
richi> so for fortran I believe we wouldn't lose anything
richi> I suppose only the frontends would insert those?  How would those be different in the case of inlining two copies?
jakub> I'd think the FEs as well as the inliner would insert those
jakub> not sure if I understood well the inlining two copies problem though
jakub> you have inline with __restrict parameter(s), inline it twice
richi> restrict pointers on arguments point to { restrict-tag NONLOCAL }, the disambiguator ignores NONLOCAL, but that's a problem if two restrict pointers are based on one that only points to NONLOCAL (like a non-restrict parameter from the caller)
richi> foo (int *p) { bar(p); bar(p); } bar (int * restrict p) { *p = 1; } would disambiguate the two stores to p
richi> that's the issue in the other PR
richi> needs quite some obfuscation to trigger of course ;)
jakub> and the problem was during scheduling?
jakub> or something that intermixed the bodies?
richi> yes, but I can imagine that you could trigger bogus CSE as well
richi> the alias info is clearly wrong
richi> my idea was to improve the based-on tracking by giving each pointer a distinct pointed-to tag
richi> instead of just NONLOCAL if we don't know
richi> of course that will increase the points-to set sizes even more
richi> so I don't like it too much
jakub> not sure I understand how that would help
richi> ISTR that issue is also why we require two restrict-tag pointers and don't disambiguate int * restrict p and int *q
richi> you'd get { TAG1 NONLOCAL ptr } { TAG2 NONLOCAL ptr } and thus no disambiguation
richi> see the bitmap_intersect_p check in pt_solutions_same_restrict_base
jakub> on the other side, restrict in C99 is very tight to the scope in which the cast occurs
jakub> so perhaps we could disambiguate through the explicit cast restricts only if in the right BLOCK?
richi> ick
richi> what's the "right block"?
richi> also that doesn't sound like a viable middle-end concept
jakub> not sure if it would be enough to just consider TREE_BLOCK of the RESTRICT_CAST_EXPR stmts
richi> we don't implement C99 restrict in the middle-end, we implement something that should capture it
jakub> "right block" in the sense the same block or one block is subblock of the other (transitively)
richi> you'd have to remember which RESTRICT_CAST_EXPR feeds a memory access
richi> at least PTA propagation won't be able to avoid leaking the restrict tags out of the blocks
jakub> so, if we have baz (int *__restrict p, int *__restrict q) { *p = 1; *q = 1; return *p; } inline, both the cast restricts would be the same block, so can be used to disambiguate
richi> but at disambiguation we don't see the casts but two memory accesses
jakub> if you inline it twice, we wouldn't disambiguate p' and p'' based accesses
jakub> if we can from the CAST_RESTRICT artificial heap var get to the TREE_BLOCK to which it is attached, we could use that...
richi> the heap var doesn't exist - it's just a UID ...
jakub> perhaps as proof of concept we could start with only testing TREE_BLOCKs for equality, not walking to the parents
jakub> but we can somewhere on the side map that UID to TREE_BLOCK, can't we?
richi> and we can't go from random UIDs to DECLs (and they don't exist)
richi> huh, sure we could
richi> it sounds awkward though, tailored to stupid C99
richi> btw, you don't know which bit is the restrict tag, so you'd need to iterate over all and check them all
jakub> bitmap which uids are cast restrict tags?
richi> if you have p1 { R1 R2 } p2 { R3 R4 } is it enough to have one pair that has matching blocks, say R1 and R4?
richi> or do all possible pairs to be useful for disambiguation?
jakub> not sure, when exactly you get two or more restrict tags in points-to set?  x = y ? p1 : p2  ?
richi> OTOH, the simpler patch that just removes the offending four lines that handle restrict casts sounds very very much more appealing to me ;)
richi> from PHIs for example, or from context-insensitive parts (when we go through memory or calls)
jakub> I can try to bootstrap that, wonder what the fallout in the testsuite would be
richi> nothing we can't XFAIL
jakub> sure, but the question is if we don't pessimize real-world code that way too much
richi> at least for fixing the two bugs on release branches that sounds best
richi> well, it opens up optimizing restrict vs. non-restrict qualified pointers
jakub> probably
jakub> btw, do we create restrict tags for fields of automatic aggregates which we can name?
richi> yes, I think so
jakub> without CAST_RESTRICT, I guess in my short testcase it would be fine if s.p was given a restrict tag (but apparently it doesn't get it)
richi> let me check
jakub> even if we take its address, just going through the pointer (t->p) shouldn't have restrict cast and thus not miscompile it
richi> ah no, we don't initialize them
richi> they don't point to anything initially
richi> so we assume we can compute precise points-to sets for them
jakub> probably not even with s = { 1, p }; in the source
richi> right, because we gimplify that
richi> if you make them local static they get restrict tags
jakub> well, local statics are like global
jakub> they live forever
richi> right
jakub> though, with the automatic vars after scheduling we might hit the same issue as with cast restrict - if code from different blocks overlaps
Comment 12 Jakub Jelinek 2011-10-05 15:52:49 UTC
(In reply to comment #10)
> with tag coming from allocate_decl_uid ().  We would use these copies
> as restrict tag sources using the specified UID.  Thus every inline
> copy (and clone) would share them.

Assuming we don't CSE over it (in particular, the LHS of such a builtin with
some non-restricted later pointers not based on it), perhaps it could work.
I wouldn't expose it to users, because how would users ensure uniqueness of the tag over the whole CU?  __COUNTER__ or something similar?  That's going to lead to bugs...

The FEs should probably add that for user __restrict variable initializers too.
Comment 13 Richard Biener 2011-10-06 08:07:59 UTC
(In reply to comment #12)
> (In reply to comment #10)
> > with tag coming from allocate_decl_uid ().  We would use these copies
> > as restrict tag sources using the specified UID.  Thus every inline
> > copy (and clone) would share them.
> 
> Assuming we don't CSE over it (in particular, the LHS of such a builtin with
> some non-restricted later pointers not based on it), perhaps it could work.
> I wouldn't expose it to users, because how would users ensure uniqueness of the
> tag over the whole CU?  __COUNTER__ or something similar?  That's going to lead
> to bugs...

Yeah, I suppose it's not trivial.  OTOH programs have to sort out global
symbol space as well, so ...

> The FEs should probably add that for user __restrict variable initializers too.

Yes.

One slight complication with the proposal arises when the restrict qualified
parameter is address-taken, you'd get

foo (int * restrict p)
{
  tem = p;
  tem = RESTRICT <tem, tag>;
  p = tem;
  bar (&p);
}

which would work for the purpose of making loads from p restrict, but the
load and the store above would not be identifiable as artificial (as opposed
to the RESTRICT statement), and thus may persist in the final code.

Not sure if we need to worry about this case though (similar issue
with address-taken pointers and __builtin_assume_aligned).  I'd just
add the RESTRICT casts for non-address-taken pointers.
Comment 14 Richard Biener 2011-10-06 08:09:51 UTC
(In reply to comment #11)
> Created attachment 25423 [details]
> CAST_RESTRICT removal
> 
> Attaching a test patch that just removed CAST_RESTRICT altogether, plus IRC
> discussion that lead to it.  The only testsuite regressions are Wobjsize-1.c
> and strlenopt-4gf.c which show an important security related problem - we
> probably shouldn't be folding builtins if DECL_INITIAL (fndecl) != NULL &&
> DECL_DECLARED_INLINE_P (fndecl) && cfun && !cfun->after_inlining,
> because then we happily fold e.g. char buf[2]; strcpy (buf, "abcd"); into
> __builtin_memcpy even when strcpy is always_inline inline wrapper that calls
> __builtin___strcpy_chk and would complain about the buffer overflow resp. add
> runtime checking.

The patch looks ok to me once we solved the folding issue (we probably have
to backport that as well then).
Comment 15 Jakub Jelinek 2011-10-06 16:38:35 UTC
Author: jakub
Date: Thu Oct  6 16:38:29 2011
New Revision: 179620

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=179620
Log:
	PR tree-optimization/49279
	* tree-ssa-structalias.c (find_func_aliases): Don't handle
	CAST_RESTRICT.
	* tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Allow
	restrict propagation.
	* tree-ssa.c (useless_type_conversion_p): Don't return false
	if TYPE_RESTRICT differs.

	* gcc.dg/tree-ssa/restrict-4.c: XFAIL.
	* gcc.c-torture/execute/pr49279.c: New test.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr49279.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/tree-ssa/restrict-4.c
    trunk/gcc/tree-ssa-forwprop.c
    trunk/gcc/tree-ssa-structalias.c
    trunk/gcc/tree-ssa.c
Comment 16 Jakub Jelinek 2011-10-06 19:56:40 UTC
Author: jakub
Date: Thu Oct  6 19:56:32 2011
New Revision: 179633

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=179633
Log:
	PR tree-optimization/49279
	* tree-ssa-structalias.c (find_func_aliases): Don't handle
	CAST_RESTRICT.

	* gcc.c-torture/execute/pr49279.c: New test.

Added:
    branches/gcc-4_6-branch/gcc/testsuite/gcc.c-torture/execute/pr49279.c
Modified:
    branches/gcc-4_6-branch/gcc/ChangeLog
    branches/gcc-4_6-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_6-branch/gcc/tree-ssa-structalias.c
Comment 17 Richard Biener 2011-10-07 08:15:09 UTC
Sofar fixed on trunk and the 4.6 branch.
Comment 18 Michael Matz 2011-10-12 15:26:54 UTC
I'm working on the ADD_RESTRICT idea implementation now.
Comment 19 Richard Biener 2012-01-03 12:19:07 UTC
*** Bug 48764 has been marked as a duplicate of this bug. ***
Comment 20 Richard Biener 2012-01-03 13:54:48 UTC
Author: rguenth
Date: Tue Jan  3 13:54:44 2012
New Revision: 182846

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=182846
Log:
2012-01-03  Richard Guenther  <rguenther@suse.de>

	Backport from mainline
	2011-10-06  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/49279
	* tree-ssa-structalias.c (find_func_aliases): Don't handle
	CAST_RESTRICT.

	2011-10-06  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/49279
	* gcc.c-torture/execute/pr49279.c: New test.

Added:
    branches/gcc-4_5-branch/gcc/testsuite/gcc.c-torture/execute/pr49279.c
Modified:
    branches/gcc-4_5-branch/gcc/ChangeLog
    branches/gcc-4_5-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_5-branch/gcc/tree-ssa-structalias.c
Comment 21 Richard Biener 2012-01-03 14:05:36 UTC
Fixed.