gcc at -O3 produced the wrong code. Compiler explorer: https://godbolt.org/z/3b4v478TG $ cat a.c int printf(const char *, ...); int a, b; int *c = &b; unsigned d; char e; int f=1; int i(int k, char *l) { if (k < 6) return a; l[0] = l[1] = l[k - 1] = 8; return 0; } int m(int k) { char g[11]; int h = i(k, g); return h; } int main() { for (; b < 8; b = b + 1) ; int j; int *n[8]; j = 0; for (;18446744073709551608U + m(*c) + *c + j < 2; j++){ n[j] = &f; } for (; e <= 4; e++) d = *n[0] == f; printf("%d\n", d); } $ $ gcc -O0 a.c && ./a.out 1 $ gcc -O3 a.c && ./a.out Segmentation fault $ gcc -O3 -fwrapv a.c && ./a.out Segmentation fault $ gcc -fsanitize=address,undefined a.c && ./a.out 1 $
Confirmed. We segfault at *n[0], also happens with -O2 -funswitch-loops, still happens with -O3 -fno-unswitch-loops.
Hmm, -fstack-reuse=none helps...
There's a missed optimization. We have # PT = { D.2843 } _44 = &g + _43; ... *_44 = 8; g ={v} {CLOBBER(eol)}; ... *_44 = 8; g ={v} {CLOBBER(eol)}; ... *_44 = 8; g ={v} {CLOBBER(eol)}; I guess the clobber doesn't kill the ref according to stmt_kills_ref_p, we'd have to special-case singleton points-to sets here. Optimizing the stores would avoid the bogus sharing of g and n.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:1251d3957de04dc9b023a23c09400217e13deadb commit r14-7274-g1251d3957de04dc9b023a23c09400217e13deadb Author: Jakub Jelinek <jakub@redhat.com> Date: Tue Jan 16 11:49:34 2024 +0100 cfgexpand: Workaround CSE of ADDR_EXPRs in VAR_DECL partitioning [PR113372] The following patch adds a quick workaround to bugs in VAR_DECL partitioning. The problem is that there is no dependency between ADDR_EXPRs of local decls and CLOBBERs of those vars, so VN can CSE uses of ADDR_EXPRs (including ivopts integral variants thereof), which can break add_scope_conflicts discovery of what variables are actually used in certain region. E.g. we can have ivtmp.40_3 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B]; ... uses of ivtmp.40_3 ... bitint.6 ={v} {CLOBBER(eos)}; ... ivtmp.28_43 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B]; ... uses of ivtmp.28_43 before VN (such as dom3), which the add_scope_conflicts code identifies as 2 independent uses of bitint.6 variable (which is correct), but then VN determines ivtmp.28_43 is the same as ivtmp.40_3 and just uses ivtmp.40_3 even in the second region; at that point add_scope_conflict thinks the bitint.6 variable is not used in that region anymore. The following patch does a simple single def-stmt check for such ADDR_EXPRs (rather than say trying to do a full propagation of what SSA_NAMEs can contain ADDR_EXPRs of local variables), which seems to workaround all 4 PRs. In addition to this patch I've used the attached one to gather statistics on the total size of all variable partitions in a function and seems besides the new testcases nothing is really affected compared to no patch (I've actually just modified the patch to == OMP_SCAN instead of == ADDR_EXPR, so it looks the same except that it never triggers). The comparison wasn't perfect because I've only gathered BITS_PER_WORD, main_input_filename (did some replacement of build directories and /tmp/ccXXXXXX names of LTO to make it more similar between the two bootstraps/regtests), current_function_name and the total size of all variable partitions if any, because I didn't record e.g. the optimization options and so e.g. torture tests which iterate over options could have different partition sizes even in one compiler when BITS_PER_WORD, main_input_filename and current_function_name are all equal. So had to write an awk script to check if the first triple in the second build appeared in the first one and the quadruple in the second build appeared in the first one too, otherwise print result and that only triggered in the new tests. Also, the cc1plus binary according to objdump -dr is identical between the two builds except for the ADDR_EXPR vs. OMP_SCAN constant in the two spots. 2024-01-16 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113372 PR middle-end/90348 PR middle-end/110115 PR middle-end/111422 * cfgexpand.cc (add_scope_conflicts_2): New function. (add_scope_conflicts_1): Use it. * gcc.dg/torture/bitint-49.c: New test. * gcc.c-torture/execute/pr90348.c: New test. * gcc.c-torture/execute/pr110115.c: New test. * gcc.c-torture/execute/pr111422.c: New test.
The releases/gcc-13 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:432708c306838fe1444da0df7d629a60468c0c73 commit r13-8383-g432708c306838fe1444da0df7d629a60468c0c73 Author: Jakub Jelinek <jakub@redhat.com> Date: Tue Jan 16 11:49:34 2024 +0100 cfgexpand: Workaround CSE of ADDR_EXPRs in VAR_DECL partitioning [PR113372] The following patch adds a quick workaround to bugs in VAR_DECL partitioning. The problem is that there is no dependency between ADDR_EXPRs of local decls and CLOBBERs of those vars, so VN can CSE uses of ADDR_EXPRs (including ivopts integral variants thereof), which can break add_scope_conflicts discovery of what variables are actually used in certain region. E.g. we can have ivtmp.40_3 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B]; ... uses of ivtmp.40_3 ... bitint.6 ={v} {CLOBBER(eos)}; ... ivtmp.28_43 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B]; ... uses of ivtmp.28_43 before VN (such as dom3), which the add_scope_conflicts code identifies as 2 independent uses of bitint.6 variable (which is correct), but then VN determines ivtmp.28_43 is the same as ivtmp.40_3 and just uses ivtmp.40_3 even in the second region; at that point add_scope_conflict thinks the bitint.6 variable is not used in that region anymore. The following patch does a simple single def-stmt check for such ADDR_EXPRs (rather than say trying to do a full propagation of what SSA_NAMEs can contain ADDR_EXPRs of local variables), which seems to workaround all 4 PRs. In addition to this patch I've used the attached one to gather statistics on the total size of all variable partitions in a function and seems besides the new testcases nothing is really affected compared to no patch (I've actually just modified the patch to == OMP_SCAN instead of == ADDR_EXPR, so it looks the same except that it never triggers). The comparison wasn't perfect because I've only gathered BITS_PER_WORD, main_input_filename (did some replacement of build directories and /tmp/ccXXXXXX names of LTO to make it more similar between the two bootstraps/regtests), current_function_name and the total size of all variable partitions if any, because I didn't record e.g. the optimization options and so e.g. torture tests which iterate over options could have different partition sizes even in one compiler when BITS_PER_WORD, main_input_filename and current_function_name are all equal. So had to write an awk script to check if the first triple in the second build appeared in the first one and the quadruple in the second build appeared in the first one too, otherwise print result and that only triggered in the new tests. Also, the cc1plus binary according to objdump -dr is identical between the two builds except for the ADDR_EXPR vs. OMP_SCAN constant in the two spots. 2024-01-16 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113372 PR middle-end/90348 PR middle-end/110115 PR middle-end/111422 * cfgexpand.cc (add_scope_conflicts_2): New function. (add_scope_conflicts_1): Use it. * gcc.c-torture/execute/pr90348.c: New test. * gcc.c-torture/execute/pr110115.c: New test. * gcc.c-torture/execute/pr111422.c: New test. (cherry picked from commit 1251d3957de04dc9b023a23c09400217e13deadb)
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:ab25eef36400e8c1d28e3ed059c5f95a38b45f17 commit r15-517-gab25eef36400e8c1d28e3ed059c5f95a38b45f17 Author: Richard Biener <rguenther@suse.de> Date: Wed May 15 13:06:30 2024 +0200 middle-end/111422 - wrong stack var coalescing, handle PHIs The gcc.c-torture/execute/pr111422.c testcase after installing the sink pass improvement reveals that we also need to handle _65 = &g + _58; _44 = &g + _43; # _59 = PHI <_65, _44> *_59 = 8; g = {v} {CLOBBER(eos)}; ... n[0] = &f; *_59 = 8; g = {v} {CLOBBER(eos)}; where we fail to see the conflict between n and g after the first clobber of g. Before the sinking improvement there was a conflict recorded on a path where _65/_44 are unused, so the real conflict was missed but the fake one avoided the miscompile. The following handles PHI defs in add_scope_conflicts_2 which fixes the issue. PR middle-end/111422 * cfgexpand.cc (add_scope_conflicts_2): Handle PHIs by recursing to their arguments.
The releases/gcc-12 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:170c2bba7cb85b3ac9380a7d5a1c6d82b3c6aa63 commit r12-10506-g170c2bba7cb85b3ac9380a7d5a1c6d82b3c6aa63 Author: Jakub Jelinek <jakub@redhat.com> Date: Tue Jan 16 11:49:34 2024 +0100 cfgexpand: Workaround CSE of ADDR_EXPRs in VAR_DECL partitioning [PR113372] The following patch adds a quick workaround to bugs in VAR_DECL partitioning. The problem is that there is no dependency between ADDR_EXPRs of local decls and CLOBBERs of those vars, so VN can CSE uses of ADDR_EXPRs (including ivopts integral variants thereof), which can break add_scope_conflicts discovery of what variables are actually used in certain region. E.g. we can have ivtmp.40_3 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B]; ... uses of ivtmp.40_3 ... bitint.6 ={v} {CLOBBER(eos)}; ... ivtmp.28_43 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B]; ... uses of ivtmp.28_43 before VN (such as dom3), which the add_scope_conflicts code identifies as 2 independent uses of bitint.6 variable (which is correct), but then VN determines ivtmp.28_43 is the same as ivtmp.40_3 and just uses ivtmp.40_3 even in the second region; at that point add_scope_conflict thinks the bitint.6 variable is not used in that region anymore. The following patch does a simple single def-stmt check for such ADDR_EXPRs (rather than say trying to do a full propagation of what SSA_NAMEs can contain ADDR_EXPRs of local variables), which seems to workaround all 4 PRs. In addition to this patch I've used the attached one to gather statistics on the total size of all variable partitions in a function and seems besides the new testcases nothing is really affected compared to no patch (I've actually just modified the patch to == OMP_SCAN instead of == ADDR_EXPR, so it looks the same except that it never triggers). The comparison wasn't perfect because I've only gathered BITS_PER_WORD, main_input_filename (did some replacement of build directories and /tmp/ccXXXXXX names of LTO to make it more similar between the two bootstraps/regtests), current_function_name and the total size of all variable partitions if any, because I didn't record e.g. the optimization options and so e.g. torture tests which iterate over options could have different partition sizes even in one compiler when BITS_PER_WORD, main_input_filename and current_function_name are all equal. So had to write an awk script to check if the first triple in the second build appeared in the first one and the quadruple in the second build appeared in the first one too, otherwise print result and that only triggered in the new tests. Also, the cc1plus binary according to objdump -dr is identical between the two builds except for the ADDR_EXPR vs. OMP_SCAN constant in the two spots. 2024-01-16 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113372 PR middle-end/90348 PR middle-end/110115 PR middle-end/111422 * cfgexpand.cc (add_scope_conflicts_2): New function. (add_scope_conflicts_1): Use it. * gcc.c-torture/execute/pr90348.c: New test. * gcc.c-torture/execute/pr110115.c: New test. * gcc.c-torture/execute/pr111422.c: New test. (cherry picked from commit 1251d3957de04dc9b023a23c09400217e13deadb)
The releases/gcc-11 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:c845244dd7afae95689b8dd02a62c18932441583 commit r11-11490-gc845244dd7afae95689b8dd02a62c18932441583 Author: Jakub Jelinek <jakub@redhat.com> Date: Tue Jan 16 11:49:34 2024 +0100 cfgexpand: Workaround CSE of ADDR_EXPRs in VAR_DECL partitioning [PR113372] The following patch adds a quick workaround to bugs in VAR_DECL partitioning. The problem is that there is no dependency between ADDR_EXPRs of local decls and CLOBBERs of those vars, so VN can CSE uses of ADDR_EXPRs (including ivopts integral variants thereof), which can break add_scope_conflicts discovery of what variables are actually used in certain region. E.g. we can have ivtmp.40_3 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B]; ... uses of ivtmp.40_3 ... bitint.6 ={v} {CLOBBER(eos)}; ... ivtmp.28_43 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B]; ... uses of ivtmp.28_43 before VN (such as dom3), which the add_scope_conflicts code identifies as 2 independent uses of bitint.6 variable (which is correct), but then VN determines ivtmp.28_43 is the same as ivtmp.40_3 and just uses ivtmp.40_3 even in the second region; at that point add_scope_conflict thinks the bitint.6 variable is not used in that region anymore. The following patch does a simple single def-stmt check for such ADDR_EXPRs (rather than say trying to do a full propagation of what SSA_NAMEs can contain ADDR_EXPRs of local variables), which seems to workaround all 4 PRs. In addition to this patch I've used the attached one to gather statistics on the total size of all variable partitions in a function and seems besides the new testcases nothing is really affected compared to no patch (I've actually just modified the patch to == OMP_SCAN instead of == ADDR_EXPR, so it looks the same except that it never triggers). The comparison wasn't perfect because I've only gathered BITS_PER_WORD, main_input_filename (did some replacement of build directories and /tmp/ccXXXXXX names of LTO to make it more similar between the two bootstraps/regtests), current_function_name and the total size of all variable partitions if any, because I didn't record e.g. the optimization options and so e.g. torture tests which iterate over options could have different partition sizes even in one compiler when BITS_PER_WORD, main_input_filename and current_function_name are all equal. So had to write an awk script to check if the first triple in the second build appeared in the first one and the quadruple in the second build appeared in the first one too, otherwise print result and that only triggered in the new tests. Also, the cc1plus binary according to objdump -dr is identical between the two builds except for the ADDR_EXPR vs. OMP_SCAN constant in the two spots. 2024-01-16 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113372 PR middle-end/90348 PR middle-end/110115 PR middle-end/111422 * cfgexpand.c (add_scope_conflicts_2): New function. (add_scope_conflicts_1): Use it. * gcc.c-torture/execute/pr90348.c: New test. * gcc.c-torture/execute/pr110115.c: New test. * gcc.c-torture/execute/pr111422.c: New test. (cherry picked from commit 1251d3957de04dc9b023a23c09400217e13deadb)
Fixed for 11.5 as well.
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>: https://gcc.gnu.org/g:0014a858a14b825818d6b557c3d5193f85790bde commit r15-6655-g0014a858a14b825818d6b557c3d5193f85790bde Author: Andrew Pinski <quic_apinski@quicinc.com> Date: Fri Nov 15 20:22:03 2024 -0800 cfgexpand: Rewrite add_scope_conflicts_2 to use cache and look back further [PR111422] After fixing loop-im to do the correct overflow rewriting for pointer types too. We end up with code like: ``` _9 = (unsigned long) &g; _84 = _9 + 18446744073709551615; _11 = _42 + _84; _44 = (signed char *) _11; ... *_44 = 10; g ={v} {CLOBBER(eos)}; ... n[0] = &f; *_44 = 8; g ={v} {CLOBBER(eos)}; ``` Which was not being recongized by the scope conflicts code. This was because it only handled one level walk backs rather than multiple ones. This fixes the issue by having a cache which records all references to addresses of stack variables. Unlike the previous patch, this only records and looks at addresses of stack variables. The cache uses a bitmap and uses the index as the bit to look at. PR middle-end/117426 PR middle-end/111422 gcc/ChangeLog: * cfgexpand.cc (struct vars_ssa_cache): New class. (vars_ssa_cache::vars_ssa_cache): New constructor. (vars_ssa_cache::~vars_ssa_cache): New deconstructor. (vars_ssa_cache::create): New method. (vars_ssa_cache::exists): New method. (vars_ssa_cache::add_one): New method. (vars_ssa_cache::update): New method. (vars_ssa_cache::dump): New method. (add_scope_conflicts_2): Factor mostly out to vars_ssa_cache::operator(). New cache argument. Walk the bitmap cache for the stack variables addresses. (vars_ssa_cache::operator()): New method factored out from add_scope_conflicts_2. Rewrite to be a full walk of all operands and use a worklist. (add_scope_conflicts_1): Add cache new argument for the addr cache. Just call add_scope_conflicts_2 for the phi result instead of calling for the uses and don't call walk_stmt_load_store_addr_ops for phis. Update call to add_scope_conflicts_2 to add cache argument. (add_scope_conflicts): Add cache argument and update calls to add_scope_conflicts_1. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr117426-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>