Consider the following piece of code: #include <memory> struct blob { void* data; size_t size; }; void uninitialized_copy(blob* first, blob* last, blob* current) { for (; first != last; ++first, (void) ++current) { ::new (static_cast<void*>(current)) blob(*first); } } The nested loop generated for it by GCC 7 is the following: .L4: testq %rdx, %rdx je .L3 movdqu (%rdi), %xmm0 movups %xmm0, (%rdx) .L3: addq $16, %rdi addq $16, %rdx cmpq %rdi, %rsi jne .L4 As you can see after each iteration generated code checks if current is nullptr and omit calling the copy constructor if it is so. Clang 3.9 doesn't exhibit such behavior. It translates the loop into: .LBB0_1: # =>This Inner Loop Header: Depth=1 movups (%rdi), %xmm0 movups %xmm0, (%rdx) addq $16, %rdi addq $16, %rdx cmpq %rdi, %rsi jne .LBB0_1 This optimization is valid, because if addition of pointer and integer results in nullptr, the integer was clearly out of bound of allocated memory block thus the addition causes undefined behavior. Absence of this optimization affects std::uninitialized_copy and any functions that use it (for example std::vector<T>::push_back). The issue can be reproduced with a simpler piece of code: bool g(int* a) { return (a + 10) != nullptr; } GCC 7: g(int*): cmpq $-40, %rdi setne %al ret Clang 3.9: g(int*): # @g(int*) movb $1, %al retq P.S. Another issue is that GCC used mismatched instructions to read from/to memory. movdqu -- is for integer data, movups -- is for single precision floating point. I don't know if it causes any stalls on modern CPUs, I heard that on older CPU writing register with an instruction of one type and reading with an instruction of another might causes stalls. Should I report a separate bug report issue for this?
Confirmed. It might be possible to rely on this nowadays where we make sure not to transform uintptr_t arithmetic to pointer arithmetic in the IL. I think that even nullptr + X is invalid. But for the testcase we'd require threading the jump across the backedge (FSM threading is not looking at get_ptr_nonnull ()). VRP parts: @@ -1145,6 +1147,13 @@ vrp_stmt_computes_nonzero (gimple *stmt, } } + if (flag_delete_null_pointer_checks + && is_gimple_assign (stmt) + && is_gimple_assign (stmt) + && gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR + && tree_expr_nonzero_p (gimple_assign_rhs2 (stmt))) + return true; + return false; } @@ -4873,7 +4854,13 @@ infer_value_range (gimple *stmt, tree op return false; } - if (infer_nonnull_range (stmt, op)) + if (infer_nonnull_range (stmt, op) + /* PTR + X with X != 0 is invalid when PTR is a null pointer. */ + || (flag_delete_null_pointer_checks + && is_gimple_assign (stmt) + && gimple_assign_rhs1 (stmt) == op + && gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR + && tree_expr_nonzero_p (gimple_assign_rhs2 (stmt)))) { *val_p = build_int_cst (TREE_TYPE (op), 0); *comp_code_p = NE_EXPR; Of course I wonder why the check is here in the first place... Is placement new valid for nullptr?
See also PR 35878.
(In reply to Richard Biener from comment #1) > Of course I wonder why the check is here in the first place... Is placement > new valid for nullptr? I believe this check here is to allow placement new (std::nothrow) T() to work. I don't know about the standard wording though.
(In reply to Richard Biener from comment #1) > + if (flag_delete_null_pointer_checks > + && is_gimple_assign (stmt) > + && is_gimple_assign (stmt) Duplicate conjunct. > + && gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR > + && tree_expr_nonzero_p (gimple_assign_rhs2 (stmt))) > + return true; > + > return false; > } I thought a bit more about the issue. I think that not only the result of pointer-integer addition, but also its argument are values that could be proven non-zero. Consider the following example: bool f(int* a) { bool x = a == nullptr; a += 10; return x; } In this example x could be proven to be false. Because if it is true then (a+=10) causes undefined behavior. At the moment clang doesn't optimize this. I don't any real example when this is needed. Just the way this optimization can be made more widely applicable.
(In reply to Richard Biener from comment #1) > Of course I wonder why the check is here in the first place... Is placement > new valid for nullptr? It used to be, but is now UB, since http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1748 (There's more background in the PR Marc mentioned, PR 35878).
Author: rguenth Date: Mon Aug 20 14:01:05 2018 New Revision: 263662 URL: https://gcc.gnu.org/viewcvs?rev=263662&root=gcc&view=rev Log: 2018-08-20 Richard Biener <rguenther@suse.de> PR tree-optimization/78655 * tree-vrp.c (extract_range_from_binary_expr_1): Make pointer + offset nonnull if either operand is nonnull work. * gcc.dg/tree-ssa/evrp11.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/evrp11.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vrp.c
Note the original testcase is already optimized with GCC 8 due to the fix for PR35878 and enabling it for all standard modes.
(just to put this somewhere) We have multiple ways of doing pointer arithmetic in gcc. After the recent patch, we know that g returns nonnull, but we don't know it for f. struct A{int a,b;}; int*f(A*p){return&p->b;} int*g(A*p){return(int*)p+1;}
Created attachment 49556 [details] testcase (In reply to Marc Glisse from comment #8) > (just to put this somewhere) > We have multiple ways of doing pointer arithmetic in gcc. After the recent > patch, we know that g returns nonnull, but we don't know it for f. > > struct A{int a,b;}; > int*f(A*p){return&p->b;} > int*g(A*p){return(int*)p+1;} I tweaked this and made a testcase out of it. I think it is correct? We do know that both f and g are non-null now, as well as checks for when returning p->a for 0 offsets... So I think this is covered? Furthermore, bool f(int* a) { bool x = a == nullptr; a += 10; return x; } turns into a_1(D) int * VARYING <bb 2> : x_2 = a_1(D) == 0B; a_3 = a_1(D) + 40; return x_2; a_3 : int * [1B, +INF] And from there, I don't see any way to determine that 'a_1' can't be nullptr. we've lost whatever context nullptr is suppose to provide... its just a 0 now. All we can see is that a_3 is non-null.
We currently get everything except the last tidbit. a_1(D) int * VARYING <bb 2> : x_2 = a_1(D) == 0B; a_3 = a_1(D) + 40; return x_2; When we see a_3 = a_1(D) + 40; Are we allowed to assume that a_1 is non-null, just like we do with *a_1? That seems a little dicier Because if that is a valid assumption, we can handle pointer_plus the same as we do non-null pointer tracking of dereferences. That alone might handle it. We could also consider recalculating x_2 at the return location if need be. If we cant make that assumption, then there isn't much else to do here as the rest of the PR is covered.
We can assume that the result of a POINTER_PLUS is non-null if either argument is non-null. So X + constant is always non-null. X + Y would be non-null if either X or Y is known to be non-null. If we know that X + Y is non-null via some mechanism (for example the result was dereferenced), then we know that X and Y are non-null.
(In reply to Jeffrey A. Law from comment #11) > We can assume that the result of a POINTER_PLUS is non-null if either > argument is non-null. So X + constant is always non-null. X + Y would be > non-null if either X or Y is known to be non-null. > > If we know that X + Y is non-null via some mechanism (for example the result > was dereferenced), then we know that X and Y are non-null. Yes, that summarizes it. Note this only applies when !targetm.addr_space.zero_address_valid. There non-null plus a negative offset may yield null. Btw, the old VRP did this already, so I wonder whether it isn't already handled correctly (minus the addr space thing maybe).
Probably. rangers nonnull processing also invokes infer_nonnull_range () on the statement, so should also be picking it up. The latter test case is really about recomputation then x_2 = a_1(D) == 0B; a_3 = a_1(D) + 40; return x_2; When x_2 is defined, we don't know that it is non-null. we only know its non-null if we were to recompute its value at the use in return x_2. We can leave this for now.. I'll follow up with it when we revisit the nonnull processing and recomputation model for the next stage 1.
Created attachment 53910 [details] patch to fix the problem This patch fixes the PR, but exposes issues in other passes. They introduce undefined behaviour by hoisting POINTER_PLUS expressions into blocks in which the operand is not known to be non-null. (See PR 107709) Until this can be audited fully, we put this patch on hold.
On Wed, 16 Nov 2022, amacleod at redhat dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78655 > > Andrew Macleod <amacleod at redhat dot com> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Status|NEW |ASSIGNED > Assignee|unassigned at gcc dot gnu.org |amacleod at redhat dot com > > --- Comment #14 from Andrew Macleod <amacleod at redhat dot com> --- > Created attachment 53910 [details] > --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53910&action=edit > patch to fix the problem > > This patch fixes the PR, but exposes issues in other passes. They introduce > undefined behaviour by hoisting POINTER_PLUS expressions into blocks in which > the operand is not known to be non-null. (See PR 107709) > Until this can be audited fully, we put this patch on hold. In fact the patch doesnt 'exploit the fact that the result of pointer addition can not be nullptr' but it exploits that if ptr + CST is performed then 'ptr' cannot be nullptr(?) That is, it assumes that there is a zero sized object at nullptr and thus the pointer increment to outside (but not one after it) invokes UB. That's much more aggressive than optimizing ptr + 4 != nullptr to true and I'm not sure the reasoning follows. I think at 'nullptr' there is _no_ object and thus the restriction to pointer addition doesn't apply at all - if a pointer does not refer to an object then the rule that increment to outside of the object invokes UB doesn't apply.
(In reply to rguenther@suse.de from comment #15) > On Wed, 16 Nov 2022, amacleod at redhat dot com wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78655 > > > > Andrew Macleod <amacleod at redhat dot com> changed: > > > > What |Removed |Added > > ---------------------------------------------------------------------------- > > Status|NEW |ASSIGNED > > Assignee|unassigned at gcc dot gnu.org |amacleod at redhat dot com > > > > --- Comment #14 from Andrew Macleod <amacleod at redhat dot com> --- > > Created attachment 53910 [details] > > --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53910&action=edit > > patch to fix the problem > > > > This patch fixes the PR, but exposes issues in other passes. They introduce > > undefined behaviour by hoisting POINTER_PLUS expressions into blocks in which > > the operand is not known to be non-null. (See PR 107709) > > Until this can be audited fully, we put this patch on hold. > > In fact the patch doesnt 'exploit the fact that the result of > pointer addition can not be nullptr' but it exploits that if > ptr + CST is performed then 'ptr' cannot be nullptr(?) > GCC already exploits the title of the PR, when the code represents it with that situation. The only case we don't get is if the code is reordered slightly as the testcase in comment 4: bool f(int* a) { bool x = a == nullptr; a += 10; return x; } That testcase requires this patch. > That is, it assumes that there is a zero sized object at nullptr and > thus the pointer increment to outside (but not one after it) invokes UB. > > That's much more aggressive than optimizing > > ptr + 4 != nullptr > > to true and I'm not sure the reasoning follows. I think at 'nullptr' > there is _no_ object and thus the restriction to pointer addition > doesn't apply at all - if a pointer does not refer to an object then > the rule that increment to outside of the object invokes UB doesn't > apply. It implements comment 11, and comment 12 confirmed that 11 was accurate. (1) >> X + Y would be non-null if either X or Y is known to be non-null. so if Y is non-zero, X + Y is non-zero. (2) >> If we know that X + Y is non-null via some mechanism (for example the result was dereferenced), then we know that X and Y are non-null. X + Y is non-null based on (2) Going back to the code generated for this: x_2 = a_1(D) == 0B; a_3 = a_1(D) + 40; return x_2; The only way to to return 1 with this sequence is for a_3 = a_1(D) + 40 to determine that a_1 is non-zero after this statement... which is what this patch does. It then reapplies this knowledge as a recomputation of x_2 based on this knowledge after the defintion of a_3. So yes, this is much more aggressive. I suppose in theory we could close this PR and open a new one to cover the more aggressive version.
On Thu, 17 Nov 2022, amacleod at redhat dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78655 > > --- Comment #16 from Andrew Macleod <amacleod at redhat dot com> --- > (In reply to rguenther@suse.de from comment #15) > > On Wed, 16 Nov 2022, amacleod at redhat dot com wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78655 > > > > > > Andrew Macleod <amacleod at redhat dot com> changed: > > > > > > What |Removed |Added > > > ---------------------------------------------------------------------------- > > > Status|NEW |ASSIGNED > > > Assignee|unassigned at gcc dot gnu.org |amacleod at redhat dot com > > > > > > --- Comment #14 from Andrew Macleod <amacleod at redhat dot com> --- > > > Created attachment 53910 [details] > > > --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53910&action=edit > > > patch to fix the problem > > > > > > This patch fixes the PR, but exposes issues in other passes. They introduce > > > undefined behaviour by hoisting POINTER_PLUS expressions into blocks in which > > > the operand is not known to be non-null. (See PR 107709) > > > Until this can be audited fully, we put this patch on hold. > > > > In fact the patch doesnt 'exploit the fact that the result of > > pointer addition can not be nullptr' but it exploits that if > > ptr + CST is performed then 'ptr' cannot be nullptr(?) > > > > GCC already exploits the title of the PR, when the code represents it with that > situation. > The only case we don't get is if the code is reordered slightly as the testcase > in comment 4: > > bool f(int* a) > { > bool x = a == nullptr; > a += 10; > return x; > } > > That testcase requires this patch. > > > > That is, it assumes that there is a zero sized object at nullptr and > > thus the pointer increment to outside (but not one after it) invokes UB. > > > > That's much more aggressive than optimizing > > > > ptr + 4 != nullptr > > > > to true and I'm not sure the reasoning follows. I think at 'nullptr' > > there is _no_ object and thus the restriction to pointer addition > > doesn't apply at all - if a pointer does not refer to an object then > > the rule that increment to outside of the object invokes UB doesn't > > apply. > > It implements comment 11, and comment 12 confirmed that 11 was accurate. > > (1) > >> X + Y would be non-null if either X or Y is known to be non-null. > so if Y is non-zero, X + Y is non-zero. > (2) > >> If we know that X + Y is non-null via some mechanism (for example the result was dereferenced), then we know that X and Y are non-null. > X + Y is non-null based on (2) But (2) is IMHO false for 'nullptr + 4', we know from (1) that the result is not nullptr but we don't know that _both_ are not null. I don't see how computing 'nullptr + 4' invokes UB. Hmm, one might read that into C17/6.5.6/8 "If both the pointer operand and the result point to elements of the same array object, or one past the last element of an array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined" - here nullptr and nullptr + 4 do not point to elements of the same array object (nullptr doesn't point to any object), so 'otherwise' applies and the behavior is undefined?
(In reply to rguenther@suse.de from comment #17) > On Thu, 17 Nov 2022, amacleod at redhat dot com wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78655 > > > > --- Comment #16 from Andrew Macleod <amacleod at redhat dot com> --- > > (In reply to rguenther@suse.de from comment #15) > > > On Wed, 16 Nov 2022, amacleod at redhat dot com wrote: > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78655 > > > > > > > > Andrew Macleod <amacleod at redhat dot com> changed: > > > > > > > > What |Removed |Added > > > > ---------------------------------------------------------------------------- > > > > Status|NEW |ASSIGNED > > > > Assignee|unassigned at gcc dot gnu.org |amacleod at redhat dot com > > > > > > > > --- Comment #14 from Andrew Macleod <amacleod at redhat dot com> --- > > > > Created attachment 53910 [details] > > > > --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53910&action=edit > > > > patch to fix the problem > > > > > > > > This patch fixes the PR, but exposes issues in other passes. They introduce > > > > undefined behaviour by hoisting POINTER_PLUS expressions into blocks in which > > > > the operand is not known to be non-null. (See PR 107709) > > > > Until this can be audited fully, we put this patch on hold. > > > > > > In fact the patch doesnt 'exploit the fact that the result of > > > pointer addition can not be nullptr' but it exploits that if > > > ptr + CST is performed then 'ptr' cannot be nullptr(?) > > > > > > > GCC already exploits the title of the PR, when the code represents it with that > > situation. > > The only case we don't get is if the code is reordered slightly as the testcase > > in comment 4: > > > > bool f(int* a) > > { > > bool x = a == nullptr; > > a += 10; > > return x; > > } > > > > That testcase requires this patch. > > > > > > > That is, it assumes that there is a zero sized object at nullptr and > > > thus the pointer increment to outside (but not one after it) invokes UB. > > > > > > That's much more aggressive than optimizing > > > > > > ptr + 4 != nullptr > > > > > > to true and I'm not sure the reasoning follows. I think at 'nullptr' > > > there is _no_ object and thus the restriction to pointer addition > > > doesn't apply at all - if a pointer does not refer to an object then > > > the rule that increment to outside of the object invokes UB doesn't > > > apply. > > > > It implements comment 11, and comment 12 confirmed that 11 was accurate. > > > > (1) > > >> X + Y would be non-null if either X or Y is known to be non-null. > > so if Y is non-zero, X + Y is non-zero. > > (2) > > >> If we know that X + Y is non-null via some mechanism (for example the result was dereferenced), then we know that X and Y are non-null. > > X + Y is non-null based on (2) > > But (2) is IMHO false for 'nullptr + 4', we know from (1) that the result > is not nullptr but we don't know that _both_ are not null. I don't see > how computing 'nullptr + 4' invokes UB. Hmm, one might read that > into C17/6.5.6/8 "If both the pointer operand and the result point to > elements of the same array object, or one past the last element of an > array object, the evaluation shall not produce an overflow; otherwise, > the behavior is undefined" - here nullptr and nullptr + 4 do not > point to elements of the same array object (nullptr doesn't point to > any object), so 'otherwise' applies and the behavior is undefined? FWIW, that would be my take... NULL + anything would be UB. Assuming of course 0 is not an addressable value.