This is reduced from https://github.com/JuliaLang/julia/issues/18162. It appears that GCC miscompiles LLVM when targeting i686-w64-mingw32. This behavior was observed on at least GCC 5.4.0 and 6.1.0. GCC 4.9 was confirmed to not have this problem. To reproduce: ``` git clone https://github.com/llvm-mirror/llvm (cd llvm && git checkout release_37) mkdir llvm-build (cd llvm-build && \ ../llvm/configure --build=x86_64-linux-gnu --host=i686-w64-mingw32 --enable-optimized CC=i686-w64-mingw32-gcc CXX="i686-w64-mingw32-g++ -g1 -funwind-tables -fasynchronous-unwind-tables" --enable-shared && \ make -j8) cp llvm-build/Release+Asserts/bin/opt.exe llvm-build/Release+Asserts/bin/LLVM-3.7.dll . cp ~/cross-w32/i686-w64-mingw32/lib/libgcc_s_sjlj-1.dll ~/cross-w32/i686-w64-mingw32/lib/libstdc++-6.dll . wine ./opt.exe -slp-vectorizer -S ~/llvm/test/Transforms/SLPVectorizer/X86/vector.ll ``` (-g and unwind tables are only to make debugging easier, also happens without) The problem appears to be the following: ``` Stopped in function __ZN12_GLOBAL__N_113SLPVectorizer18tryToVectorizePairEPN4llvm5ValueES3_RNS_7BoUpSLPE.part.774 0x00000000025cf853<+51>: leal 24(%esp), %eax 0x00000000025cf857<+55>: calll -2284 => 0x00000000025cf85c<+60>: subl $12, %esp 0x00000000025cf85f<+63>: addl $44, %esp 0x00000000025cf862<+66>: retl $12 ``` This code jumps to an invalid address once it hits the `retl`. I believe the `subl 12` is incorrect, perhaps supposed to clean up a corresponding retl $12` in the callee, when there actually isn't one [^1]. Moreover, stepping back one instruction (to the end of function being called here) shows the stack intact and a plausible backtrace, lending further credibility to this theory: ``` 1|debug > rsi Stopped in function __ZN12_GLOBAL__N_113SLPVectorizer18tryToVectorizeListEN4llvm8ArrayRefIPNS1_5ValueEEERNS_7BoUpSLPES5_b.constprop.793 0x00000000025cf051<+225>: popl %edi 0x00000000025cf052<+226>: popl %ebp => 0x00000000025cf053<+227>: retl 0x00000000025cf054<+228>: addl $4, %ebx 0x00000000025cf057<+231>: cmpl %ebx, %esi 1|debug > bt [1] __ZN12_GLOBAL__N_113SLPVectorizer18tryToVectorizeListEN4llvm8ArrayRefIPNS1_5ValueEEERNS_7BoUpSLPES5_b.constprop.793 [2] __ZN12_GLOBAL__N_113SLPVectorizer18tryToVectorizePairEPN4llvm5ValueES3_RNS_7BoUpSLPE.part.774 [3] __ZN12_GLOBAL__N_113SLPVectorizer22vectorizeChainsInBlockEPN4llvm10BasicBlockERNS_7BoUpSLPE [4] __ZN12_GLOBAL__N_113SLPVectorizer13runOnFunctionERN4llvm8FunctionE.part.786 [5] __ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE [6] __ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE [7] __ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE [8] Most likely <Unknown Module> ``` (full assembly of that function is here: https://gist.github.com/Keno/cb45c3f1e925c6ae3484ca98f7300a4a) Lastly, looking at the CFI of the crashing frame shows that it indeed expected the stack pointer to have been adjusted up by 12 bytes after the call: ``` Stopped in function __ZN12_GLOBAL__N_113SLPVectorizer18tryToVectorizePairEPN4llvm5ValueES3_RNS_7BoUpSLPE.part.774 0x00000000025cf853<+51>: leal 24(%esp), %eax 0x00000000025cf857<+55>: calll -2284 => 0x00000000025cf85c<+60>: subl $12, %esp 0x00000000025cf85f<+63>: addl $44, %esp 0x00000000025cf862<+66>: retl $12 1|debug > cfi DW_CFA_def_cfa %esp 4 DW_CFA_offset %eip 1 DW_CFA_nop DW_CFA_nop -------------- DW_CFA_advance_loc 3 (=> 3) DW_CFA_def_cfa_offset 48 DW_CFA_advance_loc 57 (=> 60) DW_CFA_def_cfa_offset 36 DW_CFA_advance_loc 3 (=> 63) -------------- DW_CFA_def_cfa_offset 48 DW_CFA_advance_loc 3 (=> 66) DW_CFA_def_cfa_offset 4 ``` [^1]: such cleanup happens when calling __ZN12_GLOBAL__N_113SLPVectorizer18tryToVectorizePairEPN4llvm5ValueES3_RNS_7BoUpSLPE.part.774 ``` 1|debug > f 3 Stopped in function __ZN12_GLOBAL__N_113SLPVectorizer22vectorizeChainsInBlockEPN4llvm10BasicBlockERNS_7BoUpSLPE 0x00000000025d0727<+3319>: movl %ecx, 8(%esp) 0x00000000025d072b<+3323>: movl 40(%esp), %ecx => 0x00000000025d072f<+3327>: calll -3860 0x00000000025d0734<+3332>: subl $12, %esp 0x00000000025d0737<+3335>: testb %al, %al ``` Please let me know if I can provide any further information.
Try -fno-lifetime-dse and/or -fno-delete-null-pointer-checks.
Still broken adding both options to the compile.
Created attachment 39588 [details] Dockerfile
I managed to bisect this to SVN revision 217587 by Martin Jambor <mjambor@suse.cz> using the attached Dockerfile and bisect-run.sh. This was cross-compiling from openSUSE 42.1 which happened to have most of the build-deps easily available but should be reproducible with a cross-compiler most places you can get wine.
Created attachment 39589 [details] bisect-run.sh
Created attachment 39590 [details] bisect-log.txt
Try -fno-derivization (I think that is the spelling).
`git grep -n deriviz` returns nothing anywhere in the gcc source code and -fno-derivization gives an error: unrecognized command line option.
How can we help get this moving towards resolution? This has kept us stuck on GCC 4.9, which is getting increasingly problematic. We can attempt to reduce this to "minimal working piece of opt.exe with gcc 4.9" vs "broken equivalent piece with gcc 5+," but given the IPA nature of r217587 I'm not sure how that will go.
Interesting to see stack issues after IPA-CP changes. Please try compiling with -fno-devirtualize If it does not help, try -fno-ipa-cp (or both). If any of the above does help, adding it to just the file in which the miscompiled function is should help, so please verify that. If devirtualization causes the weird behavior, trying what -fsanitize-undefined-trap-on-error does might also shed some light on it. After we have identified the miscompiled source file and confirmed that an IPA option causes it, we may try to minimize the testcase or inspect dumps (-fump-tree-release_ssa -fdump-ipa-all-details -fdump-tree-fixup_cfg4) for suspect behavior. Also, we will need to confirm this is indeed a compiler bug and not source code with undefined behavior.
Thank you! -fno-devirtualize on its own did not help. -fno-ipa-cp on its own did fix the problem. Adding -fno-ipa-cp only when compiling lib/Transforms/Vectorize/SLPVectorizer.cpp was enough to fix the problem. Adding -fsanitize-undefined-trap-on-error as well for that same file didn't make a difference. I can post dump output for compiling that same file, if that's what you meant. In terms of whether or not this is caused by undefined behavior in the source being compiled, the commit https://github.com/llvm-mirror/llvm/commit/225dd82d634ca277 made a small modification to the file in question that prevented the problem from manifesting. If anything jumps out at you as being definitely undefined behavior prior to that commit, then we could look into carrying that patch, but it's very strange that this only happens with this one particular target. That commit was even intended to not be changing any behavior except when a newly-added feature is used.
Output of dump options uploaded here (8.2 MB file): https://github.com/tkelman/docker-gcc-bisect/raw/07f6fa56e2f6d60ff90613b9c036f830fb8a422a/LLVMVectorize.tar.gz
So although I have not gone as far building my own cross-compiler and just used the gcc 6 based one from OpenSUSE, I managed to reproduce the issue myself and can confirm that just marking method SLPVectorizer::tryToVectorizeList with noclone attribute avoids the issue. But, from the logs I gather that the only reason why IPA-CP did that was to remove the unused this pointer. This makes me wonder whether this somehow could push its callers and the callee out of sync regarding what ABI they use for the call? Apart from that, I have no idea what else could be wrong. I have quickly skimmed through the diff of the gimple dumps of the function before and after cloning and did not see any meaningful difference (lot's of UIDs were different so something might have escaped me but since this confirmed my observations from the IPA-CP dump, I doubt it). One more thing, IPA-CP does this (clone to remove a parameter when there are no real constants coming from all callers) only since r231540 (i.e. gcc 6), so you should not really see the issue with gcc 5, at least not because of this.
I'll look into whether the same flag changes the behavior in the same way on gcc 5. Using the opensuse repo's current cross compiler version, I reduced SLPVectorizer.cpp down to the attached version. It's maybe so small that there's UB making it useless to look at, but replacing SLPVectorizer.cpp with this copy reproduces the known-good behavior with -fno-ipa-cp and the segfault without.
Created attachment 40608 [details] smaller copy of SLPVectorizer.cpp that reproduces issue
Please check whether the following patch (it is against trunk should apply on top of GCC 6 too) fixes the issue for you. It can potentially open a whole can of worms so I want to make sure it helps before taking it further. Even if it does, I would also like to ask you to run the whole gcc testsuite (as many languages as you can) with and without the patch on i686-w64-mingw32 and compare the results before I propose it on the mailing list. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index ef2dc50282c..29ec61b3d0d 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -1425,7 +1425,7 @@ cgraph_edge::redirect_call_stmt_to_callee (void) new_stmt = chkp_copy_call_skip_bounds (new_stmt); gimple_call_set_fndecl (new_stmt, e->callee->decl); - gimple_call_set_fntype (new_stmt, gimple_call_fntype (e->call_stmt)); + gimple_call_set_fntype (new_stmt, TREE_TYPE (e->callee->decl)); if (gimple_vdef (new_stmt) && TREE_CODE (gimple_vdef (new_stmt)) == SSA_NAME)
OK, so it turns out I botched my testing and the patch from the previous commit cause PR57330 and we would need something like the patch below (with a big fat comment why the condition is necessary, if we go this way at all, that is). Still, I need someone to test this really helps to fix the problem and to do a testsuite run. [PR 77333] Set clone call fntype to callee type 2017-02-03 Martin Jambor <mjambor@suse.cz> PR target/77333 * cgraph.c (redirect_call_stmt_to_callee): Set call stmt fntype to type of the decl if lhs has compatible type. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index ef2dc50282c..b91de6788c7 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -1425,7 +1425,15 @@ cgraph_edge::redirect_call_stmt_to_callee (void) new_stmt = chkp_copy_call_skip_bounds (new_stmt); gimple_call_set_fndecl (new_stmt, e->callee->decl); - gimple_call_set_fntype (new_stmt, gimple_call_fntype (e->call_stmt)); + + tree lhs = gimple_call_lhs (new_stmt); + tree decl_restype = TREE_TYPE (TREE_TYPE (e->callee->decl)); + if (lhs + && (VOID_TYPE_P (decl_restype) + || !useless_type_conversion_p (TREE_TYPE (lhs), decl_restype))) + gimple_call_set_fntype (new_stmt, gimple_call_fntype (e->call_stmt)); + else + gimple_call_set_fntype (new_stmt, TREE_TYPE (e->callee->decl)); if (gimple_vdef (new_stmt) && TREE_CODE (gimple_vdef (new_stmt)) == SSA_NAME)
Sorry for the delay in getting to trying out your patch. Yes, it looks like it does fix the test case here when applied to the gcc 6 branch. Can running the test suite provide meaningful information from a cross-compile build?
The patch also applies and fixes the problem on the gcc-5-branch. I haven't tried with trunk.
Created attachment 40864 [details] A small self-contained testcase I have finally managed to put together a small self-contained testcase. Compile with -O2 -fno-ipa-sra to trigger the bug (under Wine), add -fno-ipa-cp to make it disappear.
I have proposed two ways to fix this on the mailing list: https://gcc.gnu.org/ml/gcc-patches/2017-03/msg00535.html (I have not yet looked at how much back-portable they are but I do not expect any issues.)
Author: jamborm Date: Thu Mar 30 13:51:02 2017 New Revision: 246589 URL: https://gcc.gnu.org/viewcvs?rev=246589&root=gcc&view=rev Log: [PR 77333] Fixup fntypes of gimple calls of clones 2017-03-30 Martin Jambor <mjambor@suse.cz> PR ipa/77333 * cgraph.h (cgraph_build_function_type_skip_args): Declare. * cgraph.c (redirect_call_stmt_to_callee): Set gimple fntype so that it reflects the signature changes performed at the callee side. * cgraphclones.c (build_function_type_skip_args): Make public, renamed to cgraph_build_function_type_skip_args. (build_function_decl_skip_args): Adjust call to the above function. testsuite/ * g++.dg/ipa/pr77333.C: New test. Added: trunk/gcc/testsuite/g++.dg/ipa/pr77333.C Modified: trunk/gcc/ChangeLog trunk/gcc/cgraph.c trunk/gcc/cgraph.h trunk/gcc/cgraphclones.c trunk/gcc/testsuite/ChangeLog
Fixed on trunk so far. I will prepare & test backports.
Author: jamborm Date: Tue Apr 11 13:23:48 2017 New Revision: 246838 URL: https://gcc.gnu.org/viewcvs?rev=246838&root=gcc&view=rev Log: [PR 77333] Fixup fntypes of gimple calls of clones 2017-04-11 Martin Jambor <mjambor@suse.cz> Backport from mainline 2017-03-30 Martin Jambor <mjambor@suse.cz> PR ipa/77333 * cgraph.h (cgraph_build_function_type_skip_args): Declare. * cgraph.c (redirect_call_stmt_to_callee): Set gimple fntype so that it reflects the signature changes performed at the callee side. * cgraphclones.c (build_function_type_skip_args): Make public, renamed to cgraph_build_function_type_skip_args. (build_function_decl_skip_args): Adjust call to the above function. testsuite/ * g++.dg/ipa/pr77333.C: New test. Added: branches/gcc-6-branch/gcc/testsuite/g++.dg/ipa/pr77333.C Modified: branches/gcc-6-branch/gcc/ChangeLog branches/gcc-6-branch/gcc/cgraph.c branches/gcc-6-branch/gcc/cgraph.h branches/gcc-6-branch/gcc/cgraphclones.c branches/gcc-6-branch/gcc/testsuite/ChangeLog
Author: jamborm Date: Tue Apr 11 13:31:16 2017 New Revision: 246839 URL: https://gcc.gnu.org/viewcvs?rev=246839&root=gcc&view=rev Log: [PR 77333] Fixup fntypes of gimple calls of clones 2017-04-11 Martin Jambor <mjambor@suse.cz> Backport from mainline 2017-03-30 Martin Jambor <mjambor@suse.cz> PR ipa/77333 * cgraph.h (cgraph_build_function_type_skip_args): Declare. * cgraph.c (redirect_call_stmt_to_callee): Set gimple fntype so that it reflects the signature changes performed at the callee side. * cgraphclones.c (build_function_type_skip_args): Make public, renamed to cgraph_build_function_type_skip_args. (build_function_decl_skip_args): Adjust call to the above function. testsuite/ * g++.dg/ipa/pr77333.C: New test. Added: branches/gcc-5-branch/gcc/testsuite/g++.dg/ipa/pr77333.C Modified: branches/gcc-5-branch/gcc/ChangeLog branches/gcc-5-branch/gcc/cgraph.c branches/gcc-5-branch/gcc/cgraph.h branches/gcc-5-branch/gcc/cgraphclones.c branches/gcc-5-branch/gcc/testsuite/ChangeLog
Fixed everywhere.