Created attachment 46289 [details] Source file to reproduce the problem In attachment is a small source file that seems to be miscompiled with "-O1 -finline-small-functions". Tested versions/environments where the issue appears: * GCC 7.3, Ubuntu 18.04.2, x86_64, "gcc -O1 -finline-small-functions test.i -o test" * GCC 8.3, Ubuntu 18.04.2, x86_64, "gcc -O1 -finline-small-functions test.i -o test" * GCC 8.3, Ubuntu 18.04.2, x86_64, "gcc -m32 -O1 -finline-small-functions test.i -o test" * GCC 9.0, Fedora 30, i686, "gcc -O2 test.i -o test" In a loop, a 4-byte (or larger) char array "in" is created, and then increasingly long prefixes initialized to zero. A small inlinable function "set_one_on_stack" is invoked during the loop that should have no effect (it sets a local variable "buf" to one in a roundabout way), but apparently the "buf" variable is given the same stack location as the caller's "in" variable, overwriting the latter. When compiled incorrectly, an unexpected assertion occurs. The test file is a reduced version of an issue observed on some platforms in the Bitcoin Core unit tests. See https://github.com/bitcoin/bitcoin/issues/14580 for more details.
Created attachment 46290 [details] Un-preprocessed test case C file Adding the original .c file as well to demonstrate the problem on different platforms (the .i file is for GCC 8.3, x86_64, -m64).
/*Below's the compiler testsuite code with no headers. Here's the assembly difference (because the difference is very small. The one on the left works (-O1 only) while the one on the right has `-O1 -finline-small-functions`): https://www.diffchecker.com/hAYLeoPV I've also tested it on macOS 10.14 with GCC 8, it failed too. In addition, on Aarch64 - Raspberry/Raspbian using GCC 6 and GCC 9: GCC 9 also failed on ARM, but GCC 6 didn't cause assertion fail, thus it's clearly a 7/8/9/10 regression.*/ /* { dg-do run } */ /* { dg-options "-O2" } */ void __attribute__ ((noinline)) set_one (unsigned char* ptr) { *ptr = 1; } int __attribute__ ((noinline)) check_nonzero (unsigned char const* in, unsigned int len) { for (unsigned int i = 0; i < len; ++i) { if (in[i] != 0) return 1; } return 0; } void set_one_on_stack () { unsigned char buf[1]; set_one (buf); } int main () { for (int i = 0; i < 5; ++i) { unsigned char in[4]; for (int j = 0; j < i; ++j) { in[j] = 0; set_one_on_stack (); } if (check_nonzero (in, i)) { __builtin_abort (); } } } //~ MCCCS (DesWurstes)
Partition 0: size 4 align 8 in buf That is wrong, it should have been two seperate ones. Confirmed with: gcc version 9.0.1 20190310 (experimental) [master revision 449a19898aa:0239598e3c8:8fe074cf790f632b22e59c24f102e528407bb04e] (GCC) On aarch64-linux-gnu, I don't see any changes to the expand sources after that point. I could not reproduce it with Ubuntu's 7.3.0 but can with a non modified version of GCC 7.3.0 on mips64-linux-gnu: Partition 0: size 4 align 8 in buf
Sth wrong with liveness analysis. There's obvious liveness of IN from BB2: ;; basic block 2, loop depth 0 ;; pred: ENTRY _30 = (unsigned long) ∈ ivtmp.30_29 = _30 + 1; goto <bb 5>; [100.00%] ;; succ: 5 to ;; basic block 7, loop depth 1 ;; pred: 5 in ={v} {CLOBBER}; i_10 = i_6 + 1; if (i_10 != 5) goto <bb 3>; [80.00%] else goto <bb 8>; [20.00%] ;; succ: 3 ;; 8 but maybe that being at different loop depth somehow confuses the algorithm in fact having it there seems odd to me but the address-taking in BB2 is the result of IVOPTs hoisting. The CLOBBER doesn't effect hoisting the address but RTL expansion liveness compute splits 'in' into multiple logical instances at the CLOBBER which _does_ make the addresses effectively different ...
I think before ivopts this looks fine, we have: <bb 3> [local count: 858993460]: # j_20 = PHI <0(10), j_13(11)> in[j_20] = 0; set_one (&buf); buf ={v} {CLOBBER}; j_13 = j_20 + 1; if (i_6 > j_20) goto <bb 11>; [80.00%] else goto <bb 4>; [20.00%] in the inner loop and <bb 7> [local count: 214748365]: in ={v} {CLOBBER}; i_10 = i_6 + 1; ivtmp_22 = ivtmp_4 - 1; if (ivtmp_22 != 0) goto <bb 8>; [80.00%] else goto <bb 9>; [20.00%] near the end of the outer loop. The problem is that ivopts turns the in[j_20] access into <bb 2> [local count: 42959980]: _30 = (unsigned int) ∈ ivtmp.27_29 = _30 + 1; goto <bb 5>; [100.00%] <bb 10> [local count: 171798691]: ivtmp.21_16 = (unsigned int) ∈ _23 = (unsigned int) ∈ <bb 3> [local count: 858993460]: # ivtmp.21_21 = PHI <ivtmp.21_16(10), ivtmp.21_3(11)> _18 = (void *) ivtmp.21_21; MEM[base: _18, offset: 0B] = 0; set_one (&buf); buf ={v} {CLOBBER}; which is still fine, for the life analysis we still see in as live in bb 10 and therefore in bb3 too. But later on dom3 changes <bb 3> [local count: 171798691]: - ivtmp.21_16 = (unsigned int) ∈ - _23 = (unsigned int) ∈ + ivtmp.21_16 = _30; + _23 = _30; and now &in is live clearly only in bb2 before the outer loop for the life analysis purposes. Now, if we kill that if (gimple_clobber_p (stmt)) { tree lhs = gimple_assign_lhs (stmt); size_t *v; /* Nested function lowering might introduce LHSs that are COMPONENT_REFs. */ if (!VAR_P (lhs)) continue; if (DECL_RTL_IF_SET (lhs) == pc_rtx && (v = decl_to_stack_part->get (lhs))) bitmap_clear_bit (work, *v); } you've mentioned, I'm afraid we lose all the stack slot sharing possibilities, or at least most of them. So I'm afraid we need to do something smarter. Either we need to track during the life analysis algorithm what variables SSA_NAMEs can point to (even for non-pointers, as in this testcase ivopts uses an integral value (unsigned int) &in and then later casts to integers), or could we use the alias analysis points to information for that? For this testcase, it would be enough to walk for referenced pointer SSA_NAMEs through their points to variables and mark those variables as also seen in the IL.
We'd probably need to change decl_to_stack_part from hash_map<tree, size_t> to hash_map<unsigned, size_t> that would just map DECL_UIDs which have DECL_RTL_IF_SET equal to pc_rtx to the partition numbers, rather than from decl, or have some other mapping from DECL_UIDs that appear in the points to info vars back to the decls.
No, this is not a problem in the stack slot sharing algorithm, but rather in the input. As presented to expand, and only showing the important parts, and reordering some BBs to make the flow more obvious: ;; basic block 2, loop depth 0 ;; pred: ENTRY _30 = (unsigned long) ∈ ivtmp.27_29 = _30 + 1; goto <bb 5>; [100.00%] So, 'in' becomes "live" here, it's address in _30 and _29. Fallthrough to bb5, which also uses in, but otherwise is uninteresting, except that it can reach only BBs 6 and 7: ;; basic block 5, loop depth 1 ... _2 = check_zero (&in, _31); if (_2 != 0) goto <bb 7>; [99.96%] else goto <bb 6>; [0.04%] bb6 is a no-return block, hence uninteresting. bb7 _is_ interesting in that it clobbers in: ;; basic block 7, loop depth 1 ;; pred: 5 in ={v} {CLOBBER}; if (i_11 != 5) goto <bb 8>; [83.33%] else goto <bb 9>; [16.67%] Note that the semantics of the clobber is not only that the former contents are destroyed, but rather that the very storage associated with the clobbered entity is gone. So, from now on, any pointers into 'in', and memory accesses into 'in' are invalid. Nevertheless the flow from bb7 goes to bb 8 and 9, the latter being the return block, so: ;; basic block 8, loop depth 1 ;; pred: 7 if (i_11 > 0) goto <bb 3>; [100.00%] else goto <bb 4>; [0.00%] and here we finally have a path into bb3, which is the other interesting one: ;; basic block 3, loop depth 2 # ivtmp.20_6 = PHI <_30(8), ivtmp.20_18(3)> .... BOOM! .... Here _30 is used, and ... _4 = (void *) ivtmp.20_6; MEM[base: _4, offset: 0B] = 0; ... even written into ... That's invalid. _30 is associated with an object that is clobbered and gone ... set_one (&buf); buf ={v} {CLOBBER}; ivtmp.20_18 = ivtmp.20_6 + 1; ... and as the MEM[] write can't have possibly been into 'in' (as that is invalid, as 'in' didn't exist at the MEM access), it's okay and sensible to allocate 'in' and 'buf' into the same memory. It seems to be a common misconception of what the clobber is really about. I designed it to mean what I wrote above, the storage associated with the clobbered entities is gone after the clobber (not merely it's former contents!). But ivopts or dom extends the lifetime of 'in' (by moving an address-taken earlier) and hence the lifetime of its storage, but without doing anything about the clobber (and hence not _really_ extending the lifetime). That doesn't work. It's basically a mirrored transformation of the usual take-address-of-local and access it out of it's declared scope, just that here the _start_ of the supposed lifetime is moved out of the declared scope, not the end.
(In reply to Michael Matz from comment #7) > No, this is not a problem in the stack slot sharing algorithm, but rather in > the input. As presented to expand, and only showing the important parts, > and reordering some BBs to make the flow more obvious: > > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _30 = (unsigned long) ∈ > ivtmp.27_29 = _30 + 1; > goto <bb 5>; [100.00%] > > So, 'in' becomes "live" here, it's address in _30 and _29. Fallthrough to > bb5, > which also uses in, but otherwise is uninteresting, except that it can reach > only BBs 6 and 7: > > ;; basic block 5, loop depth 1 > ... > _2 = check_zero (&in, _31); > if (_2 != 0) > goto <bb 7>; [99.96%] > else > goto <bb 6>; [0.04%] > > bb6 is a no-return block, hence uninteresting. bb7 _is_ interesting in that > it clobbers in: > > ;; basic block 7, loop depth 1 > ;; pred: 5 > in ={v} {CLOBBER}; > if (i_11 != 5) > goto <bb 8>; [83.33%] > else > goto <bb 9>; [16.67%] > > Note that the semantics of the clobber is not only that the former contents > are destroyed, but rather that the very storage associated with the clobbered > entity is gone. So, from now on, any pointers into 'in', and memory accesses > into 'in' are invalid. Nevertheless the flow from bb7 goes to bb 8 and 9, > the latter being the return block, so: > > ;; basic block 8, loop depth 1 > ;; pred: 7 > if (i_11 > 0) > goto <bb 3>; [100.00%] > else > goto <bb 4>; [0.00%] > > and here we finally have a path into bb3, which is the other interesting one: > > ;; basic block 3, loop depth 2 > # ivtmp.20_6 = PHI <_30(8), ivtmp.20_18(3)> > > .... BOOM! .... Here _30 is used, and ... > > _4 = (void *) ivtmp.20_6; > MEM[base: _4, offset: 0B] = 0; > > ... even written into ... That's invalid. _30 is associated with an > object that is clobbered and gone ... > > set_one (&buf); > buf ={v} {CLOBBER}; > ivtmp.20_18 = ivtmp.20_6 + 1; > > ... and as the MEM[] write can't have possibly been into 'in' (as that is > invalid, as 'in' didn't exist at the MEM access), it's okay and sensible to > allocate 'in' and 'buf' into the same memory. > > It seems to be a common misconception of what the clobber is really about. > I designed it to mean what I wrote above, the storage associated with the > clobbered entities is gone after the clobber (not merely it's former > contents!). > > But ivopts or dom extends the lifetime of 'in' (by moving an address-taken > earlier) and hence the lifetime of its storage, but without doing anything > about the clobber (and hence not _really_ extending the lifetime). That > doesn't work. > It's basically a mirrored transformation of the usual take-address-of-local > and access it out of it's declared scope, just that here the _start_ of the > supposed lifetime is moved out of the declared scope, not the end. While I spotted this "issue" as well I respectfully disagree about the semantics. Not because they are not sound but because of the implementation challenge resolving around address-takens being values with no data dependence on the clobbers. I thought the liveness algorithm would be able to handle this situation. If not we need to prune clobbers that became "invalid" somehow. Not sure how - the address value never "dies", so we'd need to compute liveness of the things the address escapes to - SSA names are easy, calls - well, we have to give up. This possiby defeats the whole idea of doing stack sharing analysis (I remember kernel/fortran testcases passing the stack slots by reference to a function). Now if we would want to try forcing the original semantics we need a verifier verifying the IL state against bogus transforms - but then we would have a way to prune the invalid ones.
Created attachment 46312 [details] gcc10-pr90348.patch Untested patch that implements what was written in #c5. I agree that without further changes to the IL, determining if one can hoist addresses of local variables or not is going to be hard, would require computing the variable life info in each pass that would do something similar. On the other side, admittedly such hoisting results in worse code generation because if the address is hoisted earlier than where it used to be live before, then there will be more stack conflicts.
(In reply to Jakub Jelinek from comment #9) > Created attachment 46312 [details] > gcc10-pr90348.patch > > Untested patch that implements what was written in #c5. I agree that > without further changes to the IL, determining if one can hoist addresses of > local variables or not is going to be hard, would require computing the > variable life info in each pass that would do something similar. On the > other side, admittedly such hoisting results in worse code generation > because if the address is hoisted earlier than where it used to be live > before, then there will be more stack conflicts. Ick. You need to handle pt->anything and pt->escaped (walk the escaped solution) as well. And !SSA_NAME_PTR_INFO (op) is of course the same as pt->anything. I see it quickly degrading given the last PTA compute is far far away...
(In reply to Richard Biener from comment #10) > (In reply to Jakub Jelinek from comment #9) > > Created attachment 46312 [details] > > gcc10-pr90348.patch > > > > Untested patch that implements what was written in #c5. I agree that > > without further changes to the IL, determining if one can hoist addresses of > > local variables or not is going to be hard, would require computing the > > variable life info in each pass that would do something similar. On the > > other side, admittedly such hoisting results in worse code generation > > because if the address is hoisted earlier than where it used to be live > > before, then there will be more stack conflicts. > > Ick. You need to handle pt->anything and pt->escaped (walk the escaped > solution) as well. And !SSA_NAME_PTR_INFO (op) is of course the same > as pt->anything. Do I? I thought I don't. The thing is, I believe the partitioning only cares about where (originally in the source) the automatic variables are referenced. Now, if I have say: __attribute__((noipa)) type *foo (type *x) { return x; } or similar, in order for a variable to be live before the clobber, it needs to escape in the area where the variable is live, we don't care what we do with whatever it returned, unless we actually hoist such escaping calls before that region. If we can say for: for (...) { unsigned char v[10]; unsigned char *p = foo (v); *p = 1; unsigned char w[10]; bar (w); } hoist the p = foo (v); call before the loop, then indeed we are in big trouble. But if we can't, the attached patch was just meant as a shorthand for trying to do a dataflow propagation of the can a pointer SSA_NAME point to variable xyz, if we don't consider escaping (to other functions and to global state). What I'm trying to "undo" is just the hoisting of the address loads, not hoisting of those address loads plus function calls in which it escapes or where that address is saved to memory. If I have to consider pt->anything and pt->escaped, then it will be as useless for the variable conflicts as is removing the important clearing of the bitmap bit on clobber stmt, we won't share stack slots pretty much at all.
(In reply to Jakub Jelinek from comment #11) > before that region. If we can say for: > for (...) > { > unsigned char v[10]; > unsigned char *p = foo (v); > *p = 1; > unsigned char w[10]; > bar (w); > } > hoist the p = foo (v); call before the loop, then indeed we are in big > trouble. This is effectively what the testcase is doing (just that 'foo' is no call, but a normal address expression), so yes, we can do that, and yes we are in big trouble :-/ > If I have to consider pt->anything and pt->escaped, then it will be as > useless for the variable conflicts as is removing the important clearing of > the bitmap bit on clobber stmt, we won't share stack slots pretty much at > all. Yeah; if we don't want to patch the specific situation for this testcase (which might be okayish, we haven't seen this problem very often over the last years), but want to really fix it we might have to take more involved means like doing stack slot sharing before gimplification and rewriting the IL to reflect this. Or give up on sharing (which isn't a good idea). Gnah.
(In reply to Jakub Jelinek from comment #11) > (In reply to Richard Biener from comment #10) > > (In reply to Jakub Jelinek from comment #9) > > > Created attachment 46312 [details] > > > gcc10-pr90348.patch > > > > > > Untested patch that implements what was written in #c5. I agree that > > > without further changes to the IL, determining if one can hoist addresses of > > > local variables or not is going to be hard, would require computing the > > > variable life info in each pass that would do something similar. On the > > > other side, admittedly such hoisting results in worse code generation > > > because if the address is hoisted earlier than where it used to be live > > > before, then there will be more stack conflicts. > > > > Ick. You need to handle pt->anything and pt->escaped (walk the escaped > > solution) as well. And !SSA_NAME_PTR_INFO (op) is of course the same > > as pt->anything. > > Do I? > I thought I don't. The thing is, I believe the partitioning only cares > about where (originally in the source) the automatic variables are > referenced. > Now, if I have say: > __attribute__((noipa)) type *foo (type *x) { return x; } > or similar, in order for a variable to be live before the clobber, it needs > to escape in the area where the variable is live, we don't care what we do > with whatever it returned, unless we actually hoist such escaping calls > before that region. If we can say for: > for (...) > { > unsigned char v[10]; > unsigned char *p = foo (v); > *p = 1; > unsigned char w[10]; > bar (w); > } > hoist the p = foo (v); call before the loop, then indeed we are in big > trouble. > But if we can't, the attached patch was just meant as a shorthand for trying > to do a dataflow propagation of the can a pointer SSA_NAME point to variable > xyz, if we don't consider escaping (to other functions and to global state). > What I'm trying to "undo" is just the hoisting of the address loads, not > hoisting of those address loads plus function calls in which it escapes or > where that address is saved to memory. > > If I have to consider pt->anything and pt->escaped, then it will be as > useless for the variable conflicts as is removing the important clearing of > the bitmap bit on clobber stmt, we won't share stack slots pretty much at > all. The point about !SSA_NAME_PTR_INFO is that passes might clear it, replace such SSA name with a new one, forgetting to copy it, etc. The point about anything is that points-to analysis might just have given up (consider the provenance thing where we might end up with anything for going through some uintptr arithmetic), the point about escaped is that this is just a placeholder for a set of variables and points-to tends to shrink sets by removing bits also in escaped. Also global = ptr; ptr2 = global; will have escaped in the sets but not necessarily the bit of the original decl. So my comments are just about correctly interpreting points-to data.
(In reply to Michael Matz from comment #12) > (In reply to Jakub Jelinek from comment #11) > > before that region. If we can say for: > > for (...) > > { > > unsigned char v[10]; > > unsigned char *p = foo (v); > > *p = 1; > > unsigned char w[10]; > > bar (w); > > } > > hoist the p = foo (v); call before the loop, then indeed we are in big > > trouble. > > This is effectively what the testcase is doing (just that 'foo' is no call, > but a normal address expression), so yes, we can do that, and yes we are in > big > trouble :-/ Well, that p = foo (v); or store of memory is something that will have (unless const call, indeed) a vuse or even a vdef and the alias oracle should usually consider it to be conflicting with the clobber stmt, so I'd hope it isn't hoisted in that case, while for the pure address arithmetics there is nothing that can easily stop the hoisting. Now, if there isn't really an easy way out of this and given how rarely this has been reported (I think we have this PR and PR61203, don't remember anything else), the options can be do nothing, or do something simple that isn't that expensive, will fix this testcase and while not perfect solution will still allow variable sharing in the general case almost as often as before, or change the IL representation so that such hoisting or sinking of addresses is not possible.
Educating people about -fstack-reuse is also a possibility, thus leave the issue to workarounds like that, experimenting with full rewrites that are obviously not back-portable.
Even before IVOPTs we have an access to in[] "after" the clobber. I think the fact that there are multiple instances of in is not well represented and for the testcase at hand jump-threading essentially exposes two instances of in[] with the second being live across a backedge (DOM rotates the loop), crossing liveness with the instance live on loop entry. I believe in this situation even simple CSE can wreck our liveness analysis if you'd consider addr1 = ∈ in ={v} CLOBBER; addr2 = ∈ ... where the address might be easily exposed in a testcase and the CSE triggering after some unrolling. Ideas to do stack-slot sharing very early and expose it in the IL will have quite an impact throughout the compilation and would not allow sharing of stack slots only exposed after inlining for example (to be reliable the stack slot sharing would need to happen as early as gimple lowering where we remove BINDs and make all variables function-scope).
I think part of the problem is trying to make "deaths" explicit via CLOBBERs without making "births" also explicit in the IR. Doing both would have allowed a lifetime verifier that checks that births dominate all references to a variable and likewise deaths/clobbers postdominate all references, which would presumably catch this early and make the IR more rigorous overall.
(In reply to Alexander Monakov from comment #17) > I think part of the problem is trying to make "deaths" explicit via CLOBBERs > without making "births" also explicit in the IR. Yes, that's basically the crux (or rather, that our births (first mention of a decl name) don't conflict with address-taking or access), and without changing this we're bound to heap hacks onto hacks to make it all work. So, ideally we'd represent the births of local decls as well (i.e. "allocate" them), represent all accesses indirectly via pointers, and "free" them at deaths. Optionally, very late in the pipeline, we can rewrite alloc/free pairs whose pointers don't escape into direct accesses to decls again (not sure if that's worthwhile at all; the 'free/alloc' statements would be virtual anyway, and we would handle only decls that must lie in memory this way). With that we'd have to amend some optimizations nevertheless, as those alloc/free pairs would hinder optimizations (no moving of instructions accessing the locals over them), so they would have to be moved sometimes in order to allow them (if deemed profitable).
(In reply to Michael Matz from comment #18) > represent all accesses indirectly via pointers Would that be necessary in presence of a verifier that ensures that all references are dominated by births?
On January 23, 2020 6:00:02 PM GMT+01:00, "amonakov at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90348 > >--- Comment #19 from Alexander Monakov <amonakov at gcc dot gnu.org> >--- >(In reply to Michael Matz from comment #18) >> represent all accesses indirectly via pointers > >Would that be necessary in presence of a verifier that ensures that all >references are dominated by births? No, but it would automatically ensure this via the dependence.
GCC 8.4.0 has been released, adjusting target milestone.
GCC 8 branch is being closed.
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:551aa75778a4c5165d9533cd447c8fc822f583e1 commit r12-7044-g551aa75778a4c5165d9533cd447c8fc822f583e1 Author: Richard Biener <rguenther@suse.de> Date: Wed Feb 2 14:24:39 2022 +0100 Add CLOBBER_EOL to mark storage end-of-life clobbers This adds a flag to CONSTRUCTOR nodes indicating that for clobbers this marks the end-of-life of storage as opposed to just ending the lifetime of the object that occupied it. The dangling pointer diagnostics uses CLOBBERs but is confused by those emitted by the C++ frontend for example which emits them for the second purpose at the start of CTORs. The issue is also appearant for aarch64 in PR104092. Distinguishing the two cases is also necessary for the PR90348 fix. Since I'm going to add another flag I added an enum clobber_flags and a defaulted argument to build_clobber plus a convenient way to query the enum from the CTOR tree and specify it for gimple_clobber_p. Since 'CLOBBER' is already taken and I needed a name for the unspecified clobber we have now I used 'CLOBBER_UNDEF'. 2022-02-03 Richard Biener <rguenther@suse.de> PR middle-end/90348 PR middle-end/104092 gcc/ * tree-core.h (clobber_kind): New enum. (tree_base::u::bits::address_space): Document use in CONSTRUCTORs. * tree.h (CLOBBER_KIND): Add. (build_clobber): Add clobber kind argument, defaulted to CLOBBER_UNDEF. * tree.cc (build_clobber): Likewise. * gimple.h (gimple_clobber_p): New overload with specified kind. * tree-streamer-in.cc (streamer_read_tree_bitfields): Stream CLOBBER_KIND. * tree-streamer-out.cc (streamer_write_tree_bitfields): Likewise. * tree-pretty-print.cc (dump_generic_node): Mark EOL CLOBBERs. * gimplify.cc (gimplify_bind_expr): Build storage end-of-life clobbers with CLOBBER_EOL. (gimplify_target_expr): Likewise. * tree-inline.cc (expand_call_inline): Likewise. * tree-ssa-ccp.cc (insert_clobber_before_stack_restore): Likewise. * gimple-ssa-warn-access.cc (pass_waccess::check_stmt): Only treat CLOBBER_EOL clobbers as ending lifetime of storage. gcc/lto/ * lto-common.cc (compare_tree_sccs_1): Compare CLOBBER_KIND. gcc/testsuite/ * gcc.dg/pr87052.c: Adjust.
GCC 9 branch is being closed
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
The testcase in comment#2 still reproduces on the GCC 11 branch but no longer with GCC 12+ (on x86_64-linux, -O2).
GCC 10 branch is being closed.
> GCC 11 branch but no longer with GCC 12+ So is this fixed and can be closed? (Also, the title could be edited then to remove gcc 12-14)
On Wed, 19 Jul 2023, gnu.ojxq8 at dralias dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90348 > > maic <gnu.ojxq8 at dralias dot com> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |gnu.ojxq8 at dralias dot com > > --- Comment #29 from maic <gnu.ojxq8 at dralias dot com> --- > > GCC 11 branch but no longer with GCC 12+ > > So is this fixed and can be closed? (Also, the title could be edited then to > remove gcc 12-14) No it's not fixed, just the testcase doesn't expose the issue anymore.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:1251d3957de04dc9b023a23c09400217e13deadb commit r14-7274-g1251d3957de04dc9b023a23c09400217e13deadb Author: Jakub Jelinek <jakub@redhat.com> Date: Tue Jan 16 11:49:34 2024 +0100 cfgexpand: Workaround CSE of ADDR_EXPRs in VAR_DECL partitioning [PR113372] The following patch adds a quick workaround to bugs in VAR_DECL partitioning. The problem is that there is no dependency between ADDR_EXPRs of local decls and CLOBBERs of those vars, so VN can CSE uses of ADDR_EXPRs (including ivopts integral variants thereof), which can break add_scope_conflicts discovery of what variables are actually used in certain region. E.g. we can have ivtmp.40_3 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B]; ... uses of ivtmp.40_3 ... bitint.6 ={v} {CLOBBER(eos)}; ... ivtmp.28_43 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B]; ... uses of ivtmp.28_43 before VN (such as dom3), which the add_scope_conflicts code identifies as 2 independent uses of bitint.6 variable (which is correct), but then VN determines ivtmp.28_43 is the same as ivtmp.40_3 and just uses ivtmp.40_3 even in the second region; at that point add_scope_conflict thinks the bitint.6 variable is not used in that region anymore. The following patch does a simple single def-stmt check for such ADDR_EXPRs (rather than say trying to do a full propagation of what SSA_NAMEs can contain ADDR_EXPRs of local variables), which seems to workaround all 4 PRs. In addition to this patch I've used the attached one to gather statistics on the total size of all variable partitions in a function and seems besides the new testcases nothing is really affected compared to no patch (I've actually just modified the patch to == OMP_SCAN instead of == ADDR_EXPR, so it looks the same except that it never triggers). The comparison wasn't perfect because I've only gathered BITS_PER_WORD, main_input_filename (did some replacement of build directories and /tmp/ccXXXXXX names of LTO to make it more similar between the two bootstraps/regtests), current_function_name and the total size of all variable partitions if any, because I didn't record e.g. the optimization options and so e.g. torture tests which iterate over options could have different partition sizes even in one compiler when BITS_PER_WORD, main_input_filename and current_function_name are all equal. So had to write an awk script to check if the first triple in the second build appeared in the first one and the quadruple in the second build appeared in the first one too, otherwise print result and that only triggered in the new tests. Also, the cc1plus binary according to objdump -dr is identical between the two builds except for the ADDR_EXPR vs. OMP_SCAN constant in the two spots. 2024-01-16 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113372 PR middle-end/90348 PR middle-end/110115 PR middle-end/111422 * cfgexpand.cc (add_scope_conflicts_2): New function. (add_scope_conflicts_1): Use it. * gcc.dg/torture/bitint-49.c: New test. * gcc.c-torture/execute/pr90348.c: New test. * gcc.c-torture/execute/pr110115.c: New test. * gcc.c-torture/execute/pr111422.c: New test.
The releases/gcc-13 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:432708c306838fe1444da0df7d629a60468c0c73 commit r13-8383-g432708c306838fe1444da0df7d629a60468c0c73 Author: Jakub Jelinek <jakub@redhat.com> Date: Tue Jan 16 11:49:34 2024 +0100 cfgexpand: Workaround CSE of ADDR_EXPRs in VAR_DECL partitioning [PR113372] The following patch adds a quick workaround to bugs in VAR_DECL partitioning. The problem is that there is no dependency between ADDR_EXPRs of local decls and CLOBBERs of those vars, so VN can CSE uses of ADDR_EXPRs (including ivopts integral variants thereof), which can break add_scope_conflicts discovery of what variables are actually used in certain region. E.g. we can have ivtmp.40_3 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B]; ... uses of ivtmp.40_3 ... bitint.6 ={v} {CLOBBER(eos)}; ... ivtmp.28_43 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B]; ... uses of ivtmp.28_43 before VN (such as dom3), which the add_scope_conflicts code identifies as 2 independent uses of bitint.6 variable (which is correct), but then VN determines ivtmp.28_43 is the same as ivtmp.40_3 and just uses ivtmp.40_3 even in the second region; at that point add_scope_conflict thinks the bitint.6 variable is not used in that region anymore. The following patch does a simple single def-stmt check for such ADDR_EXPRs (rather than say trying to do a full propagation of what SSA_NAMEs can contain ADDR_EXPRs of local variables), which seems to workaround all 4 PRs. In addition to this patch I've used the attached one to gather statistics on the total size of all variable partitions in a function and seems besides the new testcases nothing is really affected compared to no patch (I've actually just modified the patch to == OMP_SCAN instead of == ADDR_EXPR, so it looks the same except that it never triggers). The comparison wasn't perfect because I've only gathered BITS_PER_WORD, main_input_filename (did some replacement of build directories and /tmp/ccXXXXXX names of LTO to make it more similar between the two bootstraps/regtests), current_function_name and the total size of all variable partitions if any, because I didn't record e.g. the optimization options and so e.g. torture tests which iterate over options could have different partition sizes even in one compiler when BITS_PER_WORD, main_input_filename and current_function_name are all equal. So had to write an awk script to check if the first triple in the second build appeared in the first one and the quadruple in the second build appeared in the first one too, otherwise print result and that only triggered in the new tests. Also, the cc1plus binary according to objdump -dr is identical between the two builds except for the ADDR_EXPR vs. OMP_SCAN constant in the two spots. 2024-01-16 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113372 PR middle-end/90348 PR middle-end/110115 PR middle-end/111422 * cfgexpand.cc (add_scope_conflicts_2): New function. (add_scope_conflicts_1): Use it. * gcc.c-torture/execute/pr90348.c: New test. * gcc.c-torture/execute/pr110115.c: New test. * gcc.c-torture/execute/pr111422.c: New test. (cherry picked from commit 1251d3957de04dc9b023a23c09400217e13deadb)
Worked around for GCC 13.3+ and 14+.
The releases/gcc-12 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:170c2bba7cb85b3ac9380a7d5a1c6d82b3c6aa63 commit r12-10506-g170c2bba7cb85b3ac9380a7d5a1c6d82b3c6aa63 Author: Jakub Jelinek <jakub@redhat.com> Date: Tue Jan 16 11:49:34 2024 +0100 cfgexpand: Workaround CSE of ADDR_EXPRs in VAR_DECL partitioning [PR113372] The following patch adds a quick workaround to bugs in VAR_DECL partitioning. The problem is that there is no dependency between ADDR_EXPRs of local decls and CLOBBERs of those vars, so VN can CSE uses of ADDR_EXPRs (including ivopts integral variants thereof), which can break add_scope_conflicts discovery of what variables are actually used in certain region. E.g. we can have ivtmp.40_3 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B]; ... uses of ivtmp.40_3 ... bitint.6 ={v} {CLOBBER(eos)}; ... ivtmp.28_43 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B]; ... uses of ivtmp.28_43 before VN (such as dom3), which the add_scope_conflicts code identifies as 2 independent uses of bitint.6 variable (which is correct), but then VN determines ivtmp.28_43 is the same as ivtmp.40_3 and just uses ivtmp.40_3 even in the second region; at that point add_scope_conflict thinks the bitint.6 variable is not used in that region anymore. The following patch does a simple single def-stmt check for such ADDR_EXPRs (rather than say trying to do a full propagation of what SSA_NAMEs can contain ADDR_EXPRs of local variables), which seems to workaround all 4 PRs. In addition to this patch I've used the attached one to gather statistics on the total size of all variable partitions in a function and seems besides the new testcases nothing is really affected compared to no patch (I've actually just modified the patch to == OMP_SCAN instead of == ADDR_EXPR, so it looks the same except that it never triggers). The comparison wasn't perfect because I've only gathered BITS_PER_WORD, main_input_filename (did some replacement of build directories and /tmp/ccXXXXXX names of LTO to make it more similar between the two bootstraps/regtests), current_function_name and the total size of all variable partitions if any, because I didn't record e.g. the optimization options and so e.g. torture tests which iterate over options could have different partition sizes even in one compiler when BITS_PER_WORD, main_input_filename and current_function_name are all equal. So had to write an awk script to check if the first triple in the second build appeared in the first one and the quadruple in the second build appeared in the first one too, otherwise print result and that only triggered in the new tests. Also, the cc1plus binary according to objdump -dr is identical between the two builds except for the ADDR_EXPR vs. OMP_SCAN constant in the two spots. 2024-01-16 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113372 PR middle-end/90348 PR middle-end/110115 PR middle-end/111422 * cfgexpand.cc (add_scope_conflicts_2): New function. (add_scope_conflicts_1): Use it. * gcc.c-torture/execute/pr90348.c: New test. * gcc.c-torture/execute/pr110115.c: New test. * gcc.c-torture/execute/pr111422.c: New test. (cherry picked from commit 1251d3957de04dc9b023a23c09400217e13deadb)
Should be worked around for 12.4+ too.
The releases/gcc-11 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:c845244dd7afae95689b8dd02a62c18932441583 commit r11-11490-gc845244dd7afae95689b8dd02a62c18932441583 Author: Jakub Jelinek <jakub@redhat.com> Date: Tue Jan 16 11:49:34 2024 +0100 cfgexpand: Workaround CSE of ADDR_EXPRs in VAR_DECL partitioning [PR113372] The following patch adds a quick workaround to bugs in VAR_DECL partitioning. The problem is that there is no dependency between ADDR_EXPRs of local decls and CLOBBERs of those vars, so VN can CSE uses of ADDR_EXPRs (including ivopts integral variants thereof), which can break add_scope_conflicts discovery of what variables are actually used in certain region. E.g. we can have ivtmp.40_3 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B]; ... uses of ivtmp.40_3 ... bitint.6 ={v} {CLOBBER(eos)}; ... ivtmp.28_43 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B]; ... uses of ivtmp.28_43 before VN (such as dom3), which the add_scope_conflicts code identifies as 2 independent uses of bitint.6 variable (which is correct), but then VN determines ivtmp.28_43 is the same as ivtmp.40_3 and just uses ivtmp.40_3 even in the second region; at that point add_scope_conflict thinks the bitint.6 variable is not used in that region anymore. The following patch does a simple single def-stmt check for such ADDR_EXPRs (rather than say trying to do a full propagation of what SSA_NAMEs can contain ADDR_EXPRs of local variables), which seems to workaround all 4 PRs. In addition to this patch I've used the attached one to gather statistics on the total size of all variable partitions in a function and seems besides the new testcases nothing is really affected compared to no patch (I've actually just modified the patch to == OMP_SCAN instead of == ADDR_EXPR, so it looks the same except that it never triggers). The comparison wasn't perfect because I've only gathered BITS_PER_WORD, main_input_filename (did some replacement of build directories and /tmp/ccXXXXXX names of LTO to make it more similar between the two bootstraps/regtests), current_function_name and the total size of all variable partitions if any, because I didn't record e.g. the optimization options and so e.g. torture tests which iterate over options could have different partition sizes even in one compiler when BITS_PER_WORD, main_input_filename and current_function_name are all equal. So had to write an awk script to check if the first triple in the second build appeared in the first one and the quadruple in the second build appeared in the first one too, otherwise print result and that only triggered in the new tests. Also, the cc1plus binary according to objdump -dr is identical between the two builds except for the ADDR_EXPR vs. OMP_SCAN constant in the two spots. 2024-01-16 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113372 PR middle-end/90348 PR middle-end/110115 PR middle-end/111422 * cfgexpand.c (add_scope_conflicts_2): New function. (add_scope_conflicts_1): Use it. * gcc.c-torture/execute/pr90348.c: New test. * gcc.c-torture/execute/pr110115.c: New test. * gcc.c-torture/execute/pr111422.c: New test. (cherry picked from commit 1251d3957de04dc9b023a23c09400217e13deadb)
Worked around for 11.5 as well.
The master branch has been updated by Sam James <sjames@gcc.gnu.org>: https://gcc.gnu.org/g:df09173e355f30089b97090b19c095907242b35e commit r15-4803-gdf09173e355f30089b97090b19c095907242b35e Author: Sam James <sam@gentoo.org> Date: Thu Oct 31 03:36:23 2024 +0000 testsuite: add testcase for fixed PR106073 This was fixed by r12-8835-ge8d5f3a1b5a583 which surely made it latent but richi points out it was likely an instance of PR90348. -fstack-reuse continues to be a menace, so let's add the testcase. gcc/testsuite/ChangeLog: PR middle-end/90348 PR tree-optimization/106073 * gcc.dg/pr106073.c: New test.