Created attachment 48361 [details] Test case program The attached program crashes when it has been compiled with GCC 10 and -O2. With -O3 or -Os it returns an incorrect value instead. There are no warnings with `-Wall -Wextra` or `-fanalyzer`, and strangely enough `-fsanitize=undefined` hides the crash without reporting any errors. -fno-aggressive-loop-optimizations also makes the crash go away which made me suspect that the code is wrong somehow, but `frama-c` insists that it's correct and should always return 0. If there's a bug in there, I'm not seeing it. :( Steps to reproduce: ---- $ gcc -O2 test.c -o bug $ ./bug Segmentation fault (core dumped) ---- GCC version: ---- $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/10.0.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ./configure Thread model: posix Supported LTO compression algorithms: zlib gcc version 10.0.1 20200423 (experimental) (GCC) ---- It appears to work fine with GCC 7, I tested it with: ---- $ gcc-7 -v Using built-in specs. COLLECT_GCC=gcc-7 COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.5.0-3ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) ----
Created attachment 48363 [details] Improved testcase
Confirmed, ASAN is also fine. Started with r10-1892-gb9ef6a2e04bfd013.
Looks like we're running past the end of the array with the stores in the loop.
+ int cstore_31; + int cstore_32; <bb 2> [local count: 114863530]: goto <bb 7>; [100.00%] <bb 3> [local count: 1014686026]: _1 = (long unsigned int) sum_a_7; _2 = _1 * 8; _3 = input_21(D) + _2; _4 = *_3; if (_4 == 0B) goto <bb 15>; [5.50%] else goto <bb 4>; [94.50%] <bb 4> [local count: 958878296]: if (sum_a_7 <= 1) - goto <bb 5>; [28.10%] + goto <bb 6>; [28.10%] else - goto <bb 6>; [71.90%] + goto <bb 5>; [71.90%] - <bb 5> [local count: 269444804]: - arr[sum_a_7] = 1; + <bb 5> [local count: 689433492]: + cstore_32 = MEM <int[2]> [(void *)&arr][sum_a_7]; <bb 6> [local count: 958878296]: + # cstore_31 = PHI <1(4), cstore_32(5)> + MEM <int[2]> [(void *)&arr][sum_a_7] = cstore_31; sum_a_23 = sum_a_7 + 1; done by cselim looks just plain wrong, there is no dominating load from that memory, so even when the variable is an automatic variable, there is no guarantee it won't be out of bounds and thus crash already on the load, or just modify random unrelated memory.
All the PR89430 testcases have such a load.
THe whole point of that change is to not require a dominating load if the object comes from the stack. We conditionally load from the same location, then have a PHI which selects that loaded value or "1" and we store the result. That's precisely what is supposed to be happening. The loop in question looks like: ;; basic block 3, loop depth 1 ;; pred: 2 ;; 6 # ivtmp.13_9 = PHI <0(2), ivtmp.13_7(6)> _4 = MEM[base: in_21(D), index: ivtmp.13_9, step: 8, offset: 0B]; if (_4 == 0B) goto <bb 13>; [5.50%] else goto <bb 4>; [94.50%] ;; succ: 13 ;; 4 ;; basic block 4, loop depth 1 ;; pred: 3 if (ivtmp.13_9 != 2) goto <bb 6>; [28.10%] else goto <bb 5>; [71.90%] ;; succ: 6 ;; 5 ;; basic block 5, loop depth 1 ;; pred: 4 cstore_32 = MEM[symbol: arr, index: ivtmp.13_9, step: 4, offset: 0B]; ;; succ: 6 ;; basic block 6, loop depth 1 ;; pred: 5 ;; 4 # cstore_31 = PHI <cstore_32(5), 1(4)> MEM[symbol: arr, index: ivtmp.13_9, step: 4, offset: 0B] = cstore_31; _40 = (unsigned int) ivtmp.13_9; _38 = _40 + 1; _37 = (int) _38; ivtmp.13_7 = ivtmp.13_9 + 1; sum_a_27 = (int) ivtmp.13_7; if (n_16(D) > sum_a_27) goto <bb 3>; [94.50%] else goto <bb 7>; [5.50%] ;; succ: 3 ;; 7 You can see the load from the same stock slot in bb5, the selecting PHI in bb6 and the store in bb6. THe test in bb4 looks weird and is the source of the problem I believe.
(In reply to Jeffrey A. Law from comment #6) > THe whole point of that change is to not require a dominating load if the > object comes from the stack. Yeah, but I find that flawed. One can do it after performing get_ref_base_and_extent and verifying the offset is const and the whole access must be within the object, or one could do it with VR verification again verifying the access is fully within the object, but otherwise it can't be done and one needs to find dominating load, just the noload needs to handle also ARRAY_REFs etc., not only MEM_REFs.
In particular tree_could_trap_p woudl consider the load possibly trapping due to the variable indexing but the patch seems to override that which I agree is bogus. I think we need to revert it and re-implement for GCC 11. All testcases contain a dominating load in the condition and what we _can_ do is treat loads of stack variables the same as stores as far as non-trappingness is concerned.
(In reply to Richard Biener from comment #8) > In particular tree_could_trap_p woudl consider the load possibly trapping > due to the variable indexing but the patch seems to override that which > I agree is bogus. I think we need to revert it and re-implement for GCC 11. > All testcases contain a dominating load in the condition and what we _can_ > do is treat loads of stack variables the same as stores as far as > non-trappingness is concerned. Actually the non-trapping machinery only considers plain SSA name dereferences so there's much more to extend. Definitely nothing for GCC 10.
Yeah. add_or_mark_expr could be extended to handle more complex addresses (perhaps get_inner_reference and hash on the decl + offset expression and taking into account the bitpos/bitsize then? Further testcase. Both foo and bar are miscompiled, baz is fine because arr[7] is known to be always within the object bounds. __attribute__((noipa)) int foo (int n) { int arr[16], s = 0; for (int i = 0; i < n; i++) { if (i < 16) arr[i] = i; } for (int i = 0; i < 16; i++) s += arr[i]; return s; } __attribute__((noipa)) int bar (int n, int x, unsigned long y, unsigned long z) { int arr[16], s = 0; arr[4] = 42; for (int i = 0; i < n; i++) { if (x == (i & 0x25)) arr[y] = i; } return arr[z]; } __attribute__((noipa)) int baz (int n, int x, unsigned long z) { int arr[16], s = 0; arr[12] = 42; for (int i = 0; i < n; i++) { if (x == (i & 0x25)) arr[7] = i; } return arr[z]; } int main () { if (foo (10374) != 15 * 16 / 2) __builtin_abort (); if (bar (25, 0x25, (unsigned long) 0xdeadbeefbeefdeadULL, 4) != 42) __builtin_abort (); if (bar (25, 4, 17, 17) != 22) __builtin_abort (); if (baz (25, 0x25, 12) != 42) __builtin_abort (); if (baz (25, 4, 7) != 22) __builtin_abort (); if (baz (25, 4, 12) != 42) __builtin_abort (); return 0; }
If we don't want to revert the change completely, could we perhaps do: --- gcc/tree-ssa-phiopt.c.jj 2020-03-19 10:23:50.542872359 +0100 +++ gcc/tree-ssa-phiopt.c 2020-04-24 10:54:10.341716841 +0200 @@ -2237,10 +2237,26 @@ cond_store_replacement (basic_block midd whose value is not available readily, which we want to avoid. */ if (!nontrap->contains (lhs)) { - /* If LHS is a local variable without address-taken, we could - always safely move down the store. */ - tree base = get_base_address (lhs); - if (!auto_var_p (base) || TREE_ADDRESSABLE (base)) + /* If LHS is an access to a local variable without address-taken + or its part and the access is provably within the bounds of the + local variable, we could always safely move down the store. */ + HOST_WIDE_INT offset, size, decl_size; + bool reverse; + tree base = get_ref_base_and_extent_hwi (lhs, &offset, &size, + &reverse); + if (base == NULL_TREE || !auto_var_p (base) || TREE_ADDRESSABLE (base)) + return false; + if (!DECL_SIZE (base) + || !tree_fits_shwi_p (DECL_SIZE (base))) + return false; + decl_size = tree_to_shwi (DECL_SIZE (base)); + if (offset < 0 + || size < 0 + || decl_size < 0 + || offset >= decl_size + || size > decl_size + || ((unsigned HOST_WIDE_INT) offset + size + > (unsigned HOST_WIDE_INT) decl_size)) return false; } + xfail the tests from the PR89430 because they all need the nontrap ARRAY_REF etc. handling? With that, in the above testcase baz is still cselim optimized , but foo and bar are not. bar is still miscompiled by some other optimization though (and GCC 9 didn't do that), so we have some other regression.
On Fri, 24 Apr 2020, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94734 > > --- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > If we don't want to revert the change completely, could we perhaps do: > --- gcc/tree-ssa-phiopt.c.jj 2020-03-19 10:23:50.542872359 +0100 > +++ gcc/tree-ssa-phiopt.c 2020-04-24 10:54:10.341716841 +0200 > @@ -2237,10 +2237,26 @@ cond_store_replacement (basic_block midd > whose value is not available readily, which we want to avoid. */ > if (!nontrap->contains (lhs)) > { > - /* If LHS is a local variable without address-taken, we could > - always safely move down the store. */ > - tree base = get_base_address (lhs); > - if (!auto_var_p (base) || TREE_ADDRESSABLE (base)) > + /* If LHS is an access to a local variable without address-taken > + or its part and the access is provably within the bounds of the > + local variable, we could always safely move down the store. */ > + HOST_WIDE_INT offset, size, decl_size; > + bool reverse; > + tree base = get_ref_base_and_extent_hwi (lhs, &offset, &size, > + &reverse); > + if (base == NULL_TREE || !auto_var_p (base) || TREE_ADDRESSABLE (base)) > + return false; > + if (!DECL_SIZE (base) > + || !tree_fits_shwi_p (DECL_SIZE (base))) > + return false; > + decl_size = tree_to_shwi (DECL_SIZE (base)); > + if (offset < 0 > + || size < 0 > + || decl_size < 0 > + || offset >= decl_size > + || size > decl_size > + || ((unsigned HOST_WIDE_INT) offset + size > + > (unsigned HOST_WIDE_INT) decl_size)) > return false; > } That should be roughly equivalent to if (!nontrap->contains (lhs) || !tree_could_trap_p (lhs)) which for ARRAY_REFs checks in_array_bounds_p.
Even better.
(In reply to Jakub Jelinek from comment #10) > bar is still miscompiled by some other optimization though > (and GCC 9 didn't do that), so we have some other regression. Sorry for the false alarm, the testcase was buggy, 17 is out of bounds. Let's sed -i -e 's/17, 17/15, 15/' in the testcase.
(In reply to Jakub Jelinek from comment #13) > Even better. Note none of the committed testcases would be handled with this. There's also the issue of store data races (not sure if the notrap handling is correct there, it looks like not) - for auto variables that would not be an issue (but would need to be checked extra). A dominating store to the same location isn't enough to prove validity either since that might be inside a different protected region.
Created attachment 48366 [details] gcc10-pr94734.patch So like this? The store data races thing can be covered by the non-addressable auto var checks.
Created attachment 48367 [details] gcc10-pr94734.patch Updated patch.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:cf39dccf9284d2fd9f9aa7050760adea110c8d88 commit r10-7952-gcf39dccf9284d2fd9f9aa7050760adea110c8d88 Author: Jakub Jelinek <jakub@redhat.com> Date: Sat Apr 25 00:10:01 2020 +0200 cselim: Don't assume it is safe to cstore replace a store to a local variable with unknown offset [PR94734] As the new testcase shows, it is not safe to assume we can optimize a conditional store into an automatic non-addressable var, we can do it only if we can prove that the unconditional load or store actually will not be outside of the boundaries of the variable. If the offset and size are constant, we can, but this is already all checked in !tree_could_trap_p, otherwise we really need to check for a dominating unconditional store, or for the specific case of automatic non-addressable variables, it is enough if there is a dominating load (that is what those 4 testcases have). tree-ssa-phiopt.c has some infrastructure for this already, see the add_or_mark_expr method etc., but right now it handles only MEM_REFs with SSA_NAME first operand and some integral offset. So, I think it can be for GCC11 extended to handle other memory references, possibly up to just doing get_inner_reference and hasing based on the base, offset expressions and bit_offset and bit_size, and have also a special case that for !TREE_ADDRESSABLE automatic variables it could ignore whether something is a load or store because the local stack should be always writable. But it feels way too dangerous to do this this late for GCC10, so this patch just restricts the optimization to the safe case (where lhs doesn't trap), and on Richi's request also ignores TREE_ADDRESSABLE bit if flag_store_data_races, because my understanding the reason for TREE_ADDRESSABLE check is that we want to avoid introducing store data races (if address of an automatic var escapes, some other thread could be accessing it concurrently). 2020-04-25 Jakub Jelinek <jakub@redhat.com> Richard Biener <rguenther@suse.de> PR tree-optimization/94734 PR tree-optimization/89430 * tree-ssa-phiopt.c: Include tree-eh.h. (cond_store_replacement): Return false if an automatic variable access could trap. If -fstore-data-races, don't return false just because an automatic variable is addressable. * gcc.dg/tree-ssa/pr89430-1.c: Add xfail. * gcc.dg/tree-ssa/pr89430-2.c: Add xfail. * gcc.dg/tree-ssa/pr89430-5.c: Add xfail. * gcc.dg/tree-ssa/pr89430-6.c: Add xfail. * gcc.c-torture/execute/pr94734.c: New test.
Fixed.