Bug 78655 - gcc doesn't exploit the fact that the result of pointer addition can not be nullptr
Summary: gcc doesn't exploit the fact that the result of pointer addition can not be n...
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 7.0
: P3 normal
Target Milestone: ---
Assignee: Andrew Macleod
URL:
Keywords: missed-optimization
Depends on: 22485
Blocks: VRP
  Show dependency treegraph
 
Reported: 2016-12-02 15:26 UTC by Ivan Sorokin
Modified: 2022-11-17 15:47 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-12-06 00:00:00


Attachments
testcase (337 bytes, text/plain)
2020-11-13 14:46 UTC, Andrew Macleod
Details
patch to fix the problem (998 bytes, patch)
2022-11-16 15:46 UTC, Andrew Macleod
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ivan Sorokin 2016-12-02 15:26:50 UTC
Consider the following piece of code:

#include <memory>

struct blob
{
    void* data;
    size_t size;
};

void uninitialized_copy(blob* first, blob* last, blob* current)
{
    for (; first != last; ++first, (void) ++current) {
        ::new (static_cast<void*>(current)) blob(*first);
    }
}

The nested loop generated for it by GCC 7 is the following:

.L4:
        testq   %rdx, %rdx
        je      .L3
        movdqu  (%rdi), %xmm0
        movups  %xmm0, (%rdx)
.L3:
        addq    $16, %rdi
        addq    $16, %rdx
        cmpq    %rdi, %rsi
        jne     .L4

As you can see after each iteration generated code checks if current is nullptr and omit calling the copy constructor if it is so.

Clang 3.9 doesn't exhibit such behavior. It translates the loop into:

.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        movups  (%rdi), %xmm0
        movups  %xmm0, (%rdx)
        addq    $16, %rdi
        addq    $16, %rdx
        cmpq    %rdi, %rsi
        jne     .LBB0_1

This optimization is valid, because if addition of pointer and integer results in nullptr, the integer was clearly out of bound of allocated memory block thus the addition causes undefined behavior.

Absence of this optimization affects std::uninitialized_copy and any functions that use it (for example std::vector<T>::push_back).

The issue can be reproduced with a simpler piece of code:

bool g(int* a)
{
    return (a + 10) != nullptr;
}

GCC 7:
g(int*):
        cmpq    $-40, %rdi
        setne   %al
        ret

Clang 3.9:
g(int*):                                 # @g(int*)
        movb    $1, %al
        retq

P.S. Another issue is that GCC used mismatched instructions to read from/to memory. movdqu -- is for integer data, movups -- is for single precision floating point. I don't know if it causes any stalls on modern CPUs, I heard that on older CPU writing register with an instruction of one type and reading with an instruction of another might causes stalls. Should I report a separate bug report issue for this?
Comment 1 Richard Biener 2016-12-06 10:44:11 UTC
Confirmed.  It might be possible to rely on this nowadays where we make sure not to transform uintptr_t arithmetic to pointer arithmetic in the IL.

I think that even nullptr + X is invalid.

But for the testcase we'd require threading the jump across the
backedge (FSM threading is not looking at get_ptr_nonnull ()).

VRP parts:

@@ -1145,6 +1147,13 @@ vrp_stmt_computes_nonzero (gimple *stmt,
        }
     }
 
+  if (flag_delete_null_pointer_checks
+      && is_gimple_assign (stmt)
+      && is_gimple_assign (stmt)
+      && gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR
+      && tree_expr_nonzero_p (gimple_assign_rhs2 (stmt)))
+    return true;
+
   return false;
 }

@@ -4873,7 +4854,13 @@ infer_value_range (gimple *stmt, tree op
        return false;
     }
 
-  if (infer_nonnull_range (stmt, op))
+  if (infer_nonnull_range (stmt, op)
+      /* PTR + X with X != 0 is invalid when PTR is a null pointer.  */
+      || (flag_delete_null_pointer_checks
+         && is_gimple_assign (stmt)
+         && gimple_assign_rhs1 (stmt) == op
+         && gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR
+         && tree_expr_nonzero_p (gimple_assign_rhs2 (stmt))))
     {
       *val_p = build_int_cst (TREE_TYPE (op), 0);
       *comp_code_p = NE_EXPR;


Of course I wonder why the check is here in the first place...  Is placement
new valid for nullptr?
Comment 2 Marc Glisse 2016-12-06 20:43:36 UTC
See also PR 35878.
Comment 3 Ivan Sorokin 2016-12-06 21:05:20 UTC
(In reply to Richard Biener from comment #1)

> Of course I wonder why the check is here in the first place...  Is placement
> new valid for nullptr?

I believe this check here is to allow placement new (std::nothrow) T() to work. I don't know about the standard wording though.
Comment 4 Ivan Sorokin 2016-12-06 22:03:33 UTC
(In reply to Richard Biener from comment #1)

> +  if (flag_delete_null_pointer_checks
> +      && is_gimple_assign (stmt)
> +      && is_gimple_assign (stmt)

Duplicate conjunct.

> +      && gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR
> +      && tree_expr_nonzero_p (gimple_assign_rhs2 (stmt)))
> +    return true;
> +
>    return false;
>  }

I thought a bit more about the issue. I think that not only the result of pointer-integer addition, but also its argument are values that could be proven non-zero. Consider the following example:

bool f(int* a)
{
  bool x = a == nullptr;
  a += 10;
  return x;
}

In this example x could be proven to be false. Because if it is true then (a+=10) causes undefined behavior. At the moment clang doesn't optimize this. I don't any real example when this is needed. Just the way this optimization can be made more widely applicable.
Comment 5 Jonathan Wakely 2016-12-07 11:06:02 UTC
(In reply to Richard Biener from comment #1)
> Of course I wonder why the check is here in the first place...  Is placement
> new valid for nullptr?

It used to be, but is now UB, since
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1748

(There's more background in the PR Marc mentioned, PR 35878).
Comment 6 Richard Biener 2018-08-20 14:01:40 UTC
Author: rguenth
Date: Mon Aug 20 14:01:05 2018
New Revision: 263662

URL: https://gcc.gnu.org/viewcvs?rev=263662&root=gcc&view=rev
Log:
2018-08-20  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/78655
	* tree-vrp.c (extract_range_from_binary_expr_1): Make
	pointer + offset nonnull if either operand is nonnull work.

	* gcc.dg/tree-ssa/evrp11.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/evrp11.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vrp.c
Comment 7 Richard Biener 2018-08-20 14:03:49 UTC
Note the original testcase is already optimized with GCC 8 due to the fix for PR35878 and enabling it for all standard modes.
Comment 8 Marc Glisse 2018-08-20 17:12:48 UTC
(just to put this somewhere)
We have multiple ways of doing pointer arithmetic in gcc. After the recent patch, we know that g returns nonnull, but we don't know it for f.

struct A{int a,b;};
int*f(A*p){return&p->b;}
int*g(A*p){return(int*)p+1;}
Comment 9 Andrew Macleod 2020-11-13 14:46:52 UTC
Created attachment 49556 [details]
testcase

(In reply to Marc Glisse from comment #8)
> (just to put this somewhere)
> We have multiple ways of doing pointer arithmetic in gcc. After the recent
> patch, we know that g returns nonnull, but we don't know it for f.
> 
> struct A{int a,b;};
> int*f(A*p){return&p->b;}
> int*g(A*p){return(int*)p+1;}

I tweaked this and made a testcase out of it. I think it is correct?  
We do know that both f and g are non-null now, as well as checks for when returning p->a for 0 offsets... So I think this is covered?

Furthermore,

bool f(int* a)
{
  bool x = a == nullptr;
  a += 10;
  return x;
}
turns into
a_1(D)  int * VARYING
    <bb 2> :
    x_2 = a_1(D) == 0B;
    a_3 = a_1(D) + 40;
    return x_2;

a_3 : int * [1B, +INF]

And from there, I don't see any way to determine that 'a_1' can't be nullptr.  we've lost whatever context nullptr is suppose to provide... its just a 0 now.
All we can see is that a_3 is non-null.
Comment 10 Andrew Macleod 2022-01-12 16:33:37 UTC
We currently get everything except the last tidbit.

a_1(D)  int * VARYING
    <bb 2> :
    x_2 = a_1(D) == 0B;
    a_3 = a_1(D) + 40;
    return x_2;

When we see
   a_3 = a_1(D) + 40;

Are we allowed to assume that a_1 is non-null, just like we do with *a_1? That seems a little dicier

Because if that is a valid assumption, we can handle pointer_plus the same as we do non-null pointer tracking of dereferences.  That alone might handle it.  We could also consider recalculating x_2 at the return location if need be.

If we cant make that assumption, then there isn't much else to do here as the rest of the PR is covered.
Comment 11 Jeffrey A. Law 2022-01-12 19:21:12 UTC
We can assume that the result of a POINTER_PLUS is non-null if either argument is non-null.  So X + constant is always non-null.  X + Y would be non-null if either X or Y is known to be non-null.

If we know that X + Y is non-null via some mechanism (for example the result was dereferenced), then we know that X and Y are non-null.
Comment 12 Richard Biener 2022-01-13 07:20:47 UTC
(In reply to Jeffrey A. Law from comment #11)
> We can assume that the result of a POINTER_PLUS is non-null if either
> argument is non-null.  So X + constant is always non-null.  X + Y would be
> non-null if either X or Y is known to be non-null.
> 
> If we know that X + Y is non-null via some mechanism (for example the result
> was dereferenced), then we know that X and Y are non-null.

Yes, that summarizes it.  Note this only applies when !targetm.addr_space.zero_address_valid.  There non-null plus a negative offset may yield null.

Btw, the old VRP did this already, so I wonder whether it isn't already handled correctly (minus the addr space thing maybe).
Comment 13 Andrew Macleod 2022-01-13 13:43:26 UTC
Probably.  rangers nonnull processing also invokes infer_nonnull_range () on the statement, so should also be picking it up.

The latter test case is really about recomputation then

    x_2 = a_1(D) == 0B;
    a_3 = a_1(D) + 40;
    return x_2;

When x_2 is defined, we don't know that it is non-null.  we only know its non-null if we were to recompute its value at the use in return x_2.

We can leave this for now.. I'll follow up with it when we revisit the nonnull processing and recomputation model for the next stage 1.
Comment 14 Andrew Macleod 2022-11-16 15:46:33 UTC
Created attachment 53910 [details]
patch to fix the problem

This patch fixes the PR, but exposes issues in other passes.  They introduce undefined behaviour by hoisting POINTER_PLUS expressions into blocks in which the operand is not known to be non-null.   (See PR 107709)
Until this can be audited fully, we put this patch on hold.
Comment 15 rguenther@suse.de 2022-11-17 07:47:28 UTC
On Wed, 16 Nov 2022, amacleod at redhat dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78655
> 
> Andrew Macleod <amacleod at redhat dot com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|NEW                         |ASSIGNED
>            Assignee|unassigned at gcc dot gnu.org      |amacleod at redhat dot com
> 
> --- Comment #14 from Andrew Macleod <amacleod at redhat dot com> ---
> Created attachment 53910 [details]
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53910&action=edit
> patch to fix the problem
> 
> This patch fixes the PR, but exposes issues in other passes.  They introduce
> undefined behaviour by hoisting POINTER_PLUS expressions into blocks in which
> the operand is not known to be non-null.   (See PR 107709)
> Until this can be audited fully, we put this patch on hold.

In fact the patch doesnt 'exploit the fact that the result of
pointer addition can not be nullptr' but it exploits that if
ptr + CST is performed then 'ptr' cannot be nullptr(?)

That is, it assumes that there is a zero sized object at nullptr and
thus the pointer increment to outside (but not one after it) invokes UB.

That's much more aggressive than optimizing

  ptr + 4 != nullptr

to true and I'm not sure the reasoning follows.  I think at 'nullptr'
there is _no_ object and thus the restriction to pointer addition
doesn't apply at all - if a pointer does not refer to an object then
the rule that increment to outside of the object invokes UB doesn't 
apply.
Comment 16 Andrew Macleod 2022-11-17 14:35:48 UTC
(In reply to rguenther@suse.de from comment #15)
> On Wed, 16 Nov 2022, amacleod at redhat dot com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78655
> > 
> > Andrew Macleod <amacleod at redhat dot com> changed:
> > 
> >            What    |Removed                     |Added
> > ----------------------------------------------------------------------------
> >              Status|NEW                         |ASSIGNED
> >            Assignee|unassigned at gcc dot gnu.org      |amacleod at redhat dot com
> > 
> > --- Comment #14 from Andrew Macleod <amacleod at redhat dot com> ---
> > Created attachment 53910 [details]
> >   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53910&action=edit
> > patch to fix the problem
> > 
> > This patch fixes the PR, but exposes issues in other passes.  They introduce
> > undefined behaviour by hoisting POINTER_PLUS expressions into blocks in which
> > the operand is not known to be non-null.   (See PR 107709)
> > Until this can be audited fully, we put this patch on hold.
> 
> In fact the patch doesnt 'exploit the fact that the result of
> pointer addition can not be nullptr' but it exploits that if
> ptr + CST is performed then 'ptr' cannot be nullptr(?)
> 

GCC already exploits the title of the PR, when the code represents it with that situation. 
The only case we don't get is if the code is reordered slightly as the testcase in comment 4:

bool f(int* a)
{
  bool x = a == nullptr;
  a += 10;
  return x;
}

That testcase requires this patch.  


> That is, it assumes that there is a zero sized object at nullptr and
> thus the pointer increment to outside (but not one after it) invokes UB.
> 
> That's much more aggressive than optimizing
> 
>   ptr + 4 != nullptr
> 
> to true and I'm not sure the reasoning follows.  I think at 'nullptr'
> there is _no_ object and thus the restriction to pointer addition
> doesn't apply at all - if a pointer does not refer to an object then
> the rule that increment to outside of the object invokes UB doesn't 
> apply.

It implements comment 11, and comment 12 confirmed that 11 was accurate.

(1)
>> X + Y would be non-null if either X or Y is known to be non-null.
so if Y is non-zero, X + Y is non-zero.
(2)
>> If we know that X + Y is non-null via some mechanism (for example the result was dereferenced), then we know that X and Y are non-null.
X + Y is non-null based on (2)

Going back to the code generated for this:

    x_2 = a_1(D) == 0B;
    a_3 = a_1(D) + 40;
    return x_2;

The only way to to return 1 with this sequence is for a_3 = a_1(D) + 40 to determine that a_1 is non-zero after this statement... which is what this patch does.
It then reapplies this knowledge as a recomputation of x_2 based on this knowledge after the defintion of a_3.

So yes, this is much more aggressive.  I suppose in theory we could close this PR and open a new one to cover the more aggressive version.
Comment 17 rguenther@suse.de 2022-11-17 14:47:50 UTC
On Thu, 17 Nov 2022, amacleod at redhat dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78655
> 
> --- Comment #16 from Andrew Macleod <amacleod at redhat dot com> ---
> (In reply to rguenther@suse.de from comment #15)
> > On Wed, 16 Nov 2022, amacleod at redhat dot com wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78655
> > > 
> > > Andrew Macleod <amacleod at redhat dot com> changed:
> > > 
> > >            What    |Removed                     |Added
> > > ----------------------------------------------------------------------------
> > >              Status|NEW                         |ASSIGNED
> > >            Assignee|unassigned at gcc dot gnu.org      |amacleod at redhat dot com
> > > 
> > > --- Comment #14 from Andrew Macleod <amacleod at redhat dot com> ---
> > > Created attachment 53910 [details]
> > >   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53910&action=edit
> > > patch to fix the problem
> > > 
> > > This patch fixes the PR, but exposes issues in other passes.  They introduce
> > > undefined behaviour by hoisting POINTER_PLUS expressions into blocks in which
> > > the operand is not known to be non-null.   (See PR 107709)
> > > Until this can be audited fully, we put this patch on hold.
> > 
> > In fact the patch doesnt 'exploit the fact that the result of
> > pointer addition can not be nullptr' but it exploits that if
> > ptr + CST is performed then 'ptr' cannot be nullptr(?)
> > 
> 
> GCC already exploits the title of the PR, when the code represents it with that
> situation. 
> The only case we don't get is if the code is reordered slightly as the testcase
> in comment 4:
> 
> bool f(int* a)
> {
>   bool x = a == nullptr;
>   a += 10;
>   return x;
> }
> 
> That testcase requires this patch.  
> 
> 
> > That is, it assumes that there is a zero sized object at nullptr and
> > thus the pointer increment to outside (but not one after it) invokes UB.
> > 
> > That's much more aggressive than optimizing
> > 
> >   ptr + 4 != nullptr
> > 
> > to true and I'm not sure the reasoning follows.  I think at 'nullptr'
> > there is _no_ object and thus the restriction to pointer addition
> > doesn't apply at all - if a pointer does not refer to an object then
> > the rule that increment to outside of the object invokes UB doesn't 
> > apply.
> 
> It implements comment 11, and comment 12 confirmed that 11 was accurate.
> 
> (1)
> >> X + Y would be non-null if either X or Y is known to be non-null.
> so if Y is non-zero, X + Y is non-zero.
> (2)
> >> If we know that X + Y is non-null via some mechanism (for example the result was dereferenced), then we know that X and Y are non-null.
> X + Y is non-null based on (2)

But (2) is IMHO false for 'nullptr + 4', we know from (1) that the result
is not nullptr but we don't know that _both_ are not null.  I don't see
how computing 'nullptr + 4' invokes UB.  Hmm, one might read that
into C17/6.5.6/8 "If both the pointer operand and the result point to
elements of the same array object, or one past the last element of an 
array object, the evaluation shall not produce an overflow; otherwise,
the behavior is undefined" - here nullptr and nullptr + 4 do not
point to elements of the same array object (nullptr doesn't point to
any object), so 'otherwise' applies and the behavior is undefined?
Comment 18 Andrew Macleod 2022-11-17 15:47:46 UTC
(In reply to rguenther@suse.de from comment #17)
> On Thu, 17 Nov 2022, amacleod at redhat dot com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78655
> > 
> > --- Comment #16 from Andrew Macleod <amacleod at redhat dot com> ---
> > (In reply to rguenther@suse.de from comment #15)
> > > On Wed, 16 Nov 2022, amacleod at redhat dot com wrote:
> > > 
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78655
> > > > 
> > > > Andrew Macleod <amacleod at redhat dot com> changed:
> > > > 
> > > >            What    |Removed                     |Added
> > > > ----------------------------------------------------------------------------
> > > >              Status|NEW                         |ASSIGNED
> > > >            Assignee|unassigned at gcc dot gnu.org      |amacleod at redhat dot com
> > > > 
> > > > --- Comment #14 from Andrew Macleod <amacleod at redhat dot com> ---
> > > > Created attachment 53910 [details]
> > > >   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53910&action=edit
> > > > patch to fix the problem
> > > > 
> > > > This patch fixes the PR, but exposes issues in other passes.  They introduce
> > > > undefined behaviour by hoisting POINTER_PLUS expressions into blocks in which
> > > > the operand is not known to be non-null.   (See PR 107709)
> > > > Until this can be audited fully, we put this patch on hold.
> > > 
> > > In fact the patch doesnt 'exploit the fact that the result of
> > > pointer addition can not be nullptr' but it exploits that if
> > > ptr + CST is performed then 'ptr' cannot be nullptr(?)
> > > 
> > 
> > GCC already exploits the title of the PR, when the code represents it with that
> > situation. 
> > The only case we don't get is if the code is reordered slightly as the testcase
> > in comment 4:
> > 
> > bool f(int* a)
> > {
> >   bool x = a == nullptr;
> >   a += 10;
> >   return x;
> > }
> > 
> > That testcase requires this patch.  
> > 
> > 
> > > That is, it assumes that there is a zero sized object at nullptr and
> > > thus the pointer increment to outside (but not one after it) invokes UB.
> > > 
> > > That's much more aggressive than optimizing
> > > 
> > >   ptr + 4 != nullptr
> > > 
> > > to true and I'm not sure the reasoning follows.  I think at 'nullptr'
> > > there is _no_ object and thus the restriction to pointer addition
> > > doesn't apply at all - if a pointer does not refer to an object then
> > > the rule that increment to outside of the object invokes UB doesn't 
> > > apply.
> > 
> > It implements comment 11, and comment 12 confirmed that 11 was accurate.
> > 
> > (1)
> > >> X + Y would be non-null if either X or Y is known to be non-null.
> > so if Y is non-zero, X + Y is non-zero.
> > (2)
> > >> If we know that X + Y is non-null via some mechanism (for example the result was dereferenced), then we know that X and Y are non-null.
> > X + Y is non-null based on (2)
> 
> But (2) is IMHO false for 'nullptr + 4', we know from (1) that the result
> is not nullptr but we don't know that _both_ are not null.  I don't see
> how computing 'nullptr + 4' invokes UB.  Hmm, one might read that
> into C17/6.5.6/8 "If both the pointer operand and the result point to
> elements of the same array object, or one past the last element of an 
> array object, the evaluation shall not produce an overflow; otherwise,
> the behavior is undefined" - here nullptr and nullptr + 4 do not
> point to elements of the same array object (nullptr doesn't point to
> any object), so 'otherwise' applies and the behavior is undefined?

FWIW, that would be my take...  NULL + anything would be UB.  Assuming of course 0 is not an addressable value.