Bug 85315 - missed range optimisation opportunity for derefences where index must be 0 or otherwise constrained
Summary: missed range optimisation opportunity for derefences where index must be 0 or...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 8.0.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: VRP
  Show dependency treegraph
 
Reported: 2018-04-10 07:58 UTC by Vegard Nossum
Modified: 2020-11-26 07:26 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-04-10 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vegard Nossum 2018-04-10 07:58:34 UTC
Input:

extern int x;
extern int a;
extern int b;

int f()
{
    int y = x;
    return *(&y + (a + b));
}

With -O3, trunk outputs:

f():
  movl x(%rip), %eax
  movl %eax, -4(%rsp)
  movl b(%rip), %eax
  addl a(%rip), %eax
  cltq
  movl -4(%rsp,%rax,4), %eax
  ret

Clang, on the other hand, infers that (a + b) == 0:

f(): # @f()
  movl x(%rip), %eax
  retq

From richi on IRC:

"""
we don't do this kind of optimization at the moment
a related one would be to place a if (a+b != 0) link_error (); after the memory access
similarly for array accesses an if (i not-in-range) link_error ()
thus, we do not derive ranges for address-computation parts [at dereference sites]
"""
Comment 1 Richard Biener 2018-04-10 09:22:19 UTC
Confirmed.
Comment 2 Jakub Jelinek 2018-04-10 09:38:28 UTC
Note for
int x;

...
  *(&x + (a + b))
if x is common we need to take -funconstrained-commons into account, similarly for vars that end with flexible array members or similar arrays.
Comment 3 Martin Sebor 2018-04-10 14:50:49 UTC
A simpler test case that shows the same optimization opportunity is:

  int f (int i)
  {
    extern int x;
    int y;
    return &y + i == &x;
  }

A test case that both Clang and GCC get wrong is below.  (This will be discussed at the next WG14 meeting -- see http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2222.htm#q23-can-one-do-comparison-between-pointers-to-objects-of-compatible-types-with-different-provenances-that-are-not-strictly-within-their-original-allocations)

  int g (int i)
  {
    int x, y, z;

    int *p = &x + i;
    int *q = &y;

    return p == q;
  }
Comment 4 rguenther@suse.de 2018-04-10 20:37:47 UTC
On April 10, 2018 4:50:49 PM GMT+02:00, "msebor at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85315
>
>Martin Sebor <msebor at gcc dot gnu.org> changed:
>
>           What    |Removed                     |Added
>----------------------------------------------------------------------------
>              CC|                            |msebor at gcc dot gnu.org
>
>--- Comment #3 from Martin Sebor <msebor at gcc dot gnu.org> ---
>A simpler test case that shows the same optimization opportunity is:
>
>  int f (int i)
>  {
>    extern int x;
>    int y;
>    return &y + i == &x;
>  }
>
>A test case that both Clang and GCC get wrong is below.  (This will be
>discussed at the next WG14 meeting -- see
>http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2222.htm#q23-can-one-do-comparison-between-pointers-to-objects-of-compatible-types-with-different-provenances-that-are-not-strictly-within-their-original-allocations)
>
>  int g (int i)
>  {
>    int x, y, z;
>
>    int *p = &x + i;
>    int *q = &y;
>
>    return p == q;
>  }

If you eventually expect a true result then please no - this should be undefined.
Comment 5 Martin Sebor 2018-04-10 21:20:13 UTC
(In reply to rguenther@suse.de from comment #4)
...
> If you eventually expect a true result then please no - this should be
> undefined.

The second test case in comment #4 is currently well-defined in C17 (by 6.5.9, p6) and requires the function to return true if x and y are adjacent and i is +/-1.  (I'm not defending the requirement.)

There is a proposal for C2X to make the result of the equality comparison between a past-the-end pointer and another pointer unspecified instead (N2219).  Would you prefer it to be undefined?  (Making it undefined would be a rather drastic departure from the historical requirement and likely objectionable to the authors.)

See http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2219.htm
(Scroll down to the Proposed Technical Corrigendum section at the end of the paper to read the proposed changes.)
Comment 6 rguenther@suse.de 2018-04-11 07:50:32 UTC
On Tue, 10 Apr 2018, msebor at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85315
> 
> --- Comment #5 from Martin Sebor <msebor at gcc dot gnu.org> ---
> (In reply to rguenther@suse.de from comment #4)
> ...
> > If you eventually expect a true result then please no - this should be
> > undefined.
> 
> The second test case in comment #4 is currently well-defined in C17 (by 6.5.9,
> p6) and requires the function to return true if x and y are adjacent and i is
> +/-1.  (I'm not defending the requirement.)
> 
> There is a proposal for C2X to make the result of the equality comparison
> between a past-the-end pointer and another pointer unspecified instead (N2219).
>  Would you prefer it to be undefined?  (Making it undefined would be a rather
> drastic departure from the historical requirement and likely objectionable to
> the authors.)

Yes, it should be undefined.  GCC treats it as undefined at the moment
because otherwise points-to analysis cannot be used to optimize pointer
equality tests at all which would be a major showstopper.

Also consider the following:

int main()
{
  int a, b = 0;
  int *p = &a+1;
  if (p == &b)
    *p = 1;
  return b;
}

is that now valid and required to return 1 if b appens to be adjacent
to a?  points-to says p points to a and thus cannot be used to modify
b.  And IIRC *(&a+1) isn't valid but if &a+1 == &b isn't undefined
but is required to return true if they are adjacent what sense does
it make to _not_ allow a pointer that is equal to &b to access b?

> See http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2219.htm
> (Scroll down to the Proposed Technical Corrigendum section at the end of the
> paper to read the proposed changes.)

It's not clear to me how "happens to immediately follow the first array 
object in the address space" not always contradicts "single provenance".
&a+1 and &b have different provenance, no?

Given GCC has leeway to pad declarations an implementation could
choose to always pad variables by at least one byte and thus
never making "happens to immediately follow the first array object in the 
address space" true?

What's the intended use of that special case?  Is it to make the
access via p above valid?  If not how is it useful to allow this
kind of comparison?  Users can always go via [u]intptr_t if they
want to compare pointers of different "provenance" so why's it
necessary to add an ability to do so without???  That just
needessly complicates implementations IMHO.

So that -f[no-]provenance stuff is bollocks IMHO, whoever invented
it must be making sth up on a sheet of paper rather than addressing
a real problem.
Comment 7 Martin Sebor 2018-04-11 16:37:17 UTC
I asked Peter about that yesterday.  The access to *p in your example is still meant to be undefined even under the proposed provenance rules.  Here's his response: http://www.open-std.org/jtc1/sc22/wg14/15051  It's not yet entirely clear to me how to derive this answer from the proposed wording but if it can't be it's presumably just a bug that will be fixed.

I think the intent of the "happens to immediately follow the first array 
object in the address space" was to allow implementations to return true for the (&a + i == &b) kind of equalities and the text just came out wrong (as a requirement rather than a permission).  The text was added in C99 but I haven't yet been able to find the paper that added it.

The implications of the "immediately follows the first array" sentence for equality having to return true would be at odds with the "single provenance" and so the result of the equality is only specified with -fno-provenance (otherwise it's unspecified).

Regarding your suggestion for the equality being undefined, since it's well-defined in C and  explicitly unspecified in C++, changing it to undefined in C would introduce an incompatibility with C++.  That would go against WG14's goal (part of C2X charter) to align more closely with C++.  To make it pass in WG14, C++ would most likely have to agree to change first.  But I'll bring it up during the meeting.

The rationale for the-f[no-]provenance option is discussed in N2119.  It sounds like the authors think it disabling the provenance rules might make for more intuitive behavior and help avoid bugs.  I'll also bring it at the meeting.
Comment 8 rguenther@suse.de 2018-04-12 08:09:50 UTC
On Wed, 11 Apr 2018, msebor at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85315
> 
> --- Comment #7 from Martin Sebor <msebor at gcc dot gnu.org> ---
> I asked Peter about that yesterday.  The access to *p in your example is still
> meant to be undefined even under the proposed provenance rules.  Here's his

Even with -fno-provenance?

> response: http://www.open-std.org/jtc1/sc22/wg14/15051  It's not yet entirely
> clear to me how to derive this answer from the proposed wording but if it can't
> be it's presumably just a bug that will be fixed.
> 
> I think the intent of the "happens to immediately follow the first array 
> object in the address space" was to allow implementations to return true for
> the (&a + i == &b) kind of equalities and the text just came out wrong (as a
> requirement rather than a permission).  The text was added in C99 but I haven't
> yet been able to find the paper that added it.

OK, even GCC might end up computing true if we don't optimize but
translate to a cmp instruction and var layout happens to be.  Like
for -O0 and

 int *p = &a + i;
 int *q = &b;
 if (p == q)
   ...

but at -O1 we'd always compute it as false.

> The implications of the "immediately follows the first array" sentence for
> equality having to return true would be at odds with the "single provenance"
> and so the result of the equality is only specified with -fno-provenance
> (otherwise it's unspecified).
> 
> Regarding your suggestion for the equality being undefined, since it's
> well-defined in C and  explicitly unspecified in C++, changing it to undefined
> in C would introduce an incompatibility with C++.  That would go against WG14's

I think for the compiler "unspecified" works as well - we can still
unconditionally optimize it to return false, correct?

> goal (part of C2X charter) to align more closely with C++.  To make it pass in
> WG14, C++ would most likely have to agree to change first.  But I'll bring it
> up during the meeting.
> 
> The rationale for the-f[no-]provenance option is discussed in N2119.  It sounds
> like the authors think it disabling the provenance rules might make for more
> intuitive behavior and help avoid bugs.  I'll also bring it at the meeting.

The issue I see is that implementing -fno-provenance is impossible
(well, we might be able to alias it to -O0 and disable all folding).
But I don't have time to read up its full specification/rationale right 
now.
Comment 9 Martin Sebor 2018-04-20 22:25:15 UTC
(In reply to rguenther@suse.de from comment #8)
> > I asked Peter about that yesterday.  The access to *p in your example is still
> > meant to be undefined even under the proposed provenance rules.  Here's his
> 
> Even with -fno-provenance?

With -fno-provenance the access would be valid under the proposed rules.

> I think for the compiler "unspecified" works as well - we can still
> unconditionally optimize it to return false, correct?

Correct.
Comment 10 Andrew Macleod 2020-11-17 17:42:18 UTC
OK, so whats the deal here. I can't really follow what the final request, or action is.

Is there a conclusion on what needs to be done? if anything?
Comment 11 Richard Biener 2020-11-18 07:37:36 UTC
(In reply to Andrew Macleod from comment #10)
> OK, so whats the deal here. I can't really follow what the final request, or
> action is.
> 
> Is there a conclusion on what needs to be done? if anything?

See the original description - the request was to derive ranges for
the offset operand in pointer arithmetic based on the size of the
object offsetted.  In the simple example a plain 'int' which is
(in GIMPLE) offsetted by 4 * (a + b) where we should derive that
a + b is zero.
Comment 12 Andrew Macleod 2020-11-18 14:55:44 UTC
Maybe I'm a little dense.

if we are presuming that  
  &x + (a + b) 
implies a + b == 0, then we also should assume that
 
  &x + a  implies a == 0

and if we can make those assumptions, then
&x + 1 is garbage because we can assume 1 == 0.

And if a and b are both unsigned, then I guess we can also assume a == b == MAX_UINT / 2 ?


Now, if we decided to actually do this...  I see IL:

<bb 2> :
  x.0_1 = x;
  y = x.0_1;
  a.1_2 = a;
  b.2_3 = b;
  _4 = a.1_2 + b.2_3;
  _5 = (long unsigned int) _4;
  _6 = _5 * 4;
  _7 = &y + _6;

The clear implications is that _6 == 0 in this expression?

If we implemented that in the operator_pointer_plus::op1_range routine, and then were to back substitute, we'd get
(_6)[0,0] = _5 * 4   -> _5 = [0,0]
(_5)[0,0] = (long unsigned int) _4;  -> _4 == [0,0]
(_4)[0,0] = a.1_2 + b.2_3   which gives us nothing additional...  Other than a potential relationship to track I suppose  a.1_2 == -B.2_3 for signed, but it would record that _4 is [0,0] when we calculate an outgoing range.

but regardless, its seems that another straightforward place to do this would be in statement folding?  Isn't the basic assumption:

_7 = &y + _6;
implies _6 is always 0, which would enable us to fold this to
_7 = &y
then _6 is unused and the other statements would ultimately just go away.

So why not make folding simply throw away the "+ _6" part because it is now being forced to be 0?  We can't really assume that it is [0,0], but then not use that information to optimize?
Comment 13 Jakub Jelinek 2020-11-18 15:17:04 UTC
(In reply to Andrew Macleod from comment #12)
> Maybe I'm a little dense.
> 
> if we are presuming that  
>   &x + (a + b) 
> implies a + b == 0, then we also should assume that

&x + (a + b) for scalar x doesn't imply a + b == 0, it implies a + b <= 1.
Only when it is dereferenced, i.e. (&x)[a + b] is accessed a + b has to be 0.
Comment 14 Martin Sebor 2020-11-18 15:46:04 UTC
Andrew, we discussed the same idea for built-in functions at Couldron.  For instance, in:

  void f (const char *s, int n)
  {
    char a[8];
    memcpy (a, s, n);     // n must be in [0, 8]
    if (n < 0 || n > 8)   // fold to false
      abort ();
  }
Comment 15 rguenther@suse.de 2020-11-18 19:05:48 UTC
On November 18, 2020 3:55:44 PM GMT+01:00, amacleod at redhat dot com <gcc-bugzilla@gcc.gnu.org> wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85315
>
>--- Comment #12 from Andrew Macleod <amacleod at redhat dot com> ---
>Maybe I'm a little dense.
>
>if we are presuming that  
>  &x + (a + b) 
>implies a + b == 0, then we also should assume that
>
>  &x + a  implies a == 0
>
>and if we can make those assumptions, then
>&x + 1 is garbage because we can assume 1 == 0.
>
>And if a and b are both unsigned, then I guess we can also assume a ==
>b ==
>MAX_UINT / 2 ?
>
>
>Now, if we decided to actually do this...  I see IL:
>
><bb 2> :
>  x.0_1 = x;
>  y = x.0_1;
>  a.1_2 = a;
>  b.2_3 = b;
>  _4 = a.1_2 + b.2_3;
>  _5 = (long unsigned int) _4;
>  _6 = _5 * 4;
>  _7 = &y + _6;
>
>The clear implications is that _6 == 0 in this expression?
>
>If we implemented that in the operator_pointer_plus::op1_range routine,
>and
>then were to back substitute, we'd get
>(_6)[0,0] = _5 * 4   -> _5 = [0,0]
>(_5)[0,0] = (long unsigned int) _4;  -> _4 == [0,0]
>(_4)[0,0] = a.1_2 + b.2_3   which gives us nothing additional...  Other
>than a
>potential relationship to track I suppose  a.1_2 == -B.2_3 for signed,
>but it
>would record that _4 is [0,0] when we calculate an outgoing range.
>
>but regardless, its seems that another straightforward place to do this
>would
>be in statement folding?  Isn't the basic assumption:
>
>_7 = &y + _6;
>implies _6 is always 0, which would enable us to fold this to
>_7 = &y
>then _6 is unused and the other statements would ultimately just go
>away.
>
>So why not make folding simply throw away the "+ _6" part because it is
>now
>being forced to be 0?  We can't really assume that it is [0,0], but
>then not
>use that information to optimize?

Well, clearly the zero case is degenerate but it extends to sth like int a[2] ; and &a + n. I guess you're already handling ARRAY_REF indices.
Comment 16 Andrew Macleod 2020-11-19 16:24:14 UTC
(In reply to Jakub Jelinek from comment #13)
> (In reply to Andrew Macleod from comment #12)
> > Maybe I'm a little dense.
> > 
> > if we are presuming that  
> >   &x + (a + b) 
> > implies a + b == 0, then we also should assume that
> 
> &x + (a + b) for scalar x doesn't imply a + b == 0, it implies a + b <= 1.
> Only when it is dereferenced, i.e. (&x)[a + b] is accessed a + b has to be 0.

OK. certain things about this still confuse me, but thats OK for now. we'll come back to them.

There seems to be 2 things at play here:
1) lhs = ptr + X
    has certain implications on X, it ptr is &scalar then X = [0, 1] * sizeof (&scalar)?  
2) if lhs is later de-referenced, then X is known to have been [0,0]?

We're ignoring the nonscalar cases of ptr for now, but Im guessing they are similar just the values for X are determined differently. 

1) a) could be handled with something like a previously mentioned range_after_stmt() API for operands which are affected by statements. 
   b)  It can also be impacted by op2_range() during a wind back, but would likely require some tweaking of gimple_range_calc_op2() to determine that 'ptr' satisfied the criteria of this scalar condition.   the a) solution would eliminate the necessity of this.

2) This one is a bit trickier.  We cant use the non-null property since &x + blah  will already make it non-null... so you are looking for an actual dereference of itself or an equivalence?   I can probably tweak the non-null property manager to indicate whether the non-null property also contains a dereference.. but I guess you would need to know that ALL paths contain a dereference.

We'd probably get most of what we're looking for if we simply check for a dereference of LHS in the same block?  and then assert that X is 0.

Is that he basic gist? 
Regardless, I'll come back to this eventually and get someone to clarify all the intricacies for me, IM just trying to understand the general requirements.