Hi, myself and others have been trying to track down problems with compiling (and using) wine64 with GCC-4.8.0. ~ Optimization level -O2 - which is usually the default optimization level for building wine, breaks wine64. Obviously, with gcc-4.7.2/3 -O2 works just fine. Here is the Wine bug report; http://bugs.winehq.org/show_bug.cgi?id=33307 Austin English (Wine-dev) ran a bisect on GCC to try to narrow down the problems. here is his relevant post on that, regarding this bug report; So the bad commit is: commit c8010b803d34fa7e096747067e33c650b36ecc06 Author: bernds <bernds@138bc75d-0d04-0410-961f-82ee72b054a4> Date: Mon May 14 12:08:31 2012 +0000 * attribs.c (decl_attributes): Avoid emitting a warning if ATTR_FLAG_BUILT_IN. * doc/rtl.texi (CALL_INSN_FUNCTION_USAGE): Use lowercase for rtx codes. Document meaning of sets inside CALL_INSN_FUNCTION_USAGE. * c-family/c-common.c (DEF_ATTR_STRING): Define and undefine as necessary. * builtin-attrs.def (DEF_ATTR_FOR_STRING): Define. Use it to define a string "1". (ATTR_RET1_NOTHROW_NONNULL_LEAF): New attr definition. * builtins.def (BUILT_IN_MEMCPY, BUILT_IN_MEMMOVE, BUILT_IN_MEMSET, BUILT_IN_STRCPY): Use it for these functions. * postreload.c (reload_combine): Deal with SETs inside CALL_INSN_FUNCTION_USAGE. * caller-save.c (setup_save_areas, save_call_clobbered_regs): Look for REG_RETURNED notes and use a cheap restore if possible. * ira-int.h (struct ira_allocno): New member cheap_calls_crossed_num. (ALLOCNO_CHEAP_CALLS_CROSSED_NUM): New macro. * ira-build.c (ira_create_allocno, create_cap_allocno, propagate_allocno_info, propagate_some_info_from_allocno, copy_info_to_removed_store_destination, ira_flattening): Handle it. * ira-lives.c (pseudo_regno_single_word_and_live_p, find_call_crossed_cheap_reg): New static functions. (process_bb_node_lives): Look for SETs in CALL_INSN_FUNCTION_USAGE, and set ALLOCNO_CHEAP_CALLS_CROSSED_NUM if possible. Also make a REG_RETURNED note in that case. * ira.c (setup_reg_renumber): Change assert to allow cases where allocnos only cross calls for which they are cheap to restore. * ira-costs.c (ira_tune_allocno_costs): Compare ALLOCNO_CALLS_CROSSED_NUM to ALLOCNO_CHEAP_CALLS_CROSSED_NUM rather than 0. * reg-notes.def (REG_RETURNED): New note. * cse.c (cse_insn): Likewise. * sched-deps.c (sched_analyze_insn): Likewise. * expr.c (init_block_move_fn): Set a "fn spec" attribute. * calls.c (decl_return_flags): New static function. (expand_call): Generate a SET in CALL_INSN_FUNCTION_USAGE for functions that return one of their arguments. * lto/lto-lang.c (handle_fnspec_attribute): New static function. (lto_attribute_table): Add "fn spec". (DEF_ATTR_STRING): Define and undefine along with the other macros. * regcprop.c (struct kill_set_value_data): New. (kill_set_value): Interpret data as a pointer to such a struct. Do nothing if the caller wants the register to be ignored. (copyprop_hardreg_forward_1): Handle SETs in CALL_INSN_FUNCTION_USAGE. testsuite/ * gcc.target/i386/retarg.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@187459 138bc75d-0d04-0410-961f-82ee72b054a4 after this commit, wine will fail to build (with default -O2 settings, gcc will crash with an internal error). The compile itself is fixed by a later commit: commit d73df2920c77116fd88e03cd95dd352f16511a3f Author: bernds <bernds@138bc75d-0d04-0410-961f-82ee72b054a4> Date: Mon May 21 21:37:01 2012 +0000 PR rtl-optimization/53373 * caller-save.c (save_call_clobbered_regs): Look into a possible PARALLEL manually rather than using single_set on a call insn. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@187745 138bc75d-0d04-0410-96 after this, wine will compile fine, but segfaults on launch. A quick search on http://gcc.gnu.org/bugzilla/ doesn't show anything for that commit. If someone else would file a bug upstream with this info, it would be appreciated. ______ If i can provide any more info - please let me know, or even better ~ post the the WineHQ bug report that i have linked to above ~ as Austin or possibly another Wine-Developer may be able to help further. thanks. Jordan
IIRC this is usually due to mis writing of memcpy/memset where you return null rather than the dest (first argument).
(In reply to comment #1) > IIRC this is usually due to mis writing of memcpy/memset where you return null > rather than the dest (first argument). See PR56881.
To narrow down the problem more you can cherry-pick d73df2920c77116fd88e03cd95dd352f16511a3f to after the commit that broke build and continue bi-secting?
Austin English said "I did...that's what led to the original commit showing the segfault. IOW: $ git reset --hard c8010b803d34fa7e096747067e33c650b36ecc06 # build gcc/wine - gcc fails to build wine, internal compiler error $ git show d73df2920c77116fd88e03cd95dd352f16511a3f | patch -p1 # build gcc/wine - wine segfaults on launch" So bisecting seems complete. Also, Kirill Smirnov reports that building wine using just -O2 runs into this bug for him, but building wine with "-O2 -fno-builtin-memcpy" doesn't. Next step may be to bisect wine's source code to see which file or files misbehave when built with -O2 but without -fno-builtin-memcpy.
(In reply to comment #4) > Next step may be to bisect wine's source code to see which file or files > misbehave when built with -O2 but without -fno-builtin-memcpy. Actually it should be easier than that. Look at the implementation of memcpy that wine has included in it and see if it returns the correct value.
You'd think... but I didn't find any obvious memcpy replacement. I spent some time bisecting the wine source yesterday. There appear to be at least three or four affected sites. I bisected one of the affected files with #pragma GCC optimize("-O2") ... #pragma GCC optimize("-O1") and the offending memcpy() there seems to be http://source.winehq.org/source/dlls/kernel32/process.c#L1316 There are plenty of clues to chase down, hope I have time to spend on it sometime soon.
It seems gcc over-optimizes series of memcpy() function calls one after another. The piece of code does not work: memcpy( buffer, DIR_Windows, len * sizeof(WCHAR) ); memcpy( buffer + len, default_syswow64W, sizeof(default_syswow64W) There is a wrapper around memcpy() called memcpy_unaligned() to avoid builtin/inlining. And these pieces of code work: memcpy( buffer, DIR_Windows, len * sizeof(WCHAR) ); memcpy_unaligned( buffer + len, default_syswow64W, sizeof(default_syswow64W) ); and memcpy_unaligned( buffer, DIR_Windows, len * sizeof(WCHAR) ); memcpy( buffer + len, default_syswow64W, sizeof(default_syswow64W) ); I'm sorry for copy-and-pasting wine code as is, I tried but failed to create a refined test case. So this case is opposite as previously suggested: the memcpy_unaligned() wrapper is OK, but native memcpy() is failing.
The important question is what that memcpy( buffer, DIR_Windows, len * sizeof(WCHAR) ); memcpy( buffer + len, default_syswow64W, sizeof(default_syswow64W) ); compiles to. If it is ... call memcpy leal (%rax, ...), %rdi ! or similar, the important is that buffer is preserved in return value of the previous memcpy call ... call memcpy Then if it doesn't work, you need to look at whatever memcpy implementation you are calling and see whether it correctly returns the first argument it has been passed to it in all cases.
>... whatever memcpy implementation you are calling and see whether it correctly returns the first argument it has been passed to it in all cases. Fails (gcc version of memcpy): __builtin_memcpy( buffer, DIR_Windows, len * sizeof(WCHAR) ); __builtin_memcpy( buffer + len, default_syswow64W, sizeof(default_syswow64W) ); Works (glibc version of memcpy): memcpy( buffer, DIR_Windows, len * sizeof(WCHAR) ); memcpy( buffer + len, default_syswow64W, sizeof(default_syswow64W) );
But __builtin_memcpy isn't necessarily the inline memcpy code, it can very well be a library call too. Anyway, this bugreport doesn't have a preprocessed source attached to it, nor list of all gcc options to compile it, so there is nothing to look at.
I'm sorry I cannot reproduce invalid behaviour within a refined test case. Instead I can provide commented asm dump from wine. This block of code works: the returned value (rax register) is used as a pointer to destination buffer. // memcpy( buffer, DIR_Windows, len * sizeof(WCHAR) ); mov rsi,QWORD PTR [rip+0x3e80aa] # 455c30 <DIR_Windows> mov rdx,rbx mov rdi,rax call 26000 <memcpy@plt> // HERE rax points to destination buffer and rdi is corrupted by memcpy mov rcx,rax // memcpy( buffer + len, default_syswow64W, sizeof(default_syswow64W) ); mov rax,QWORD PTR [rip+0x33d95] # a1930 <default_syswow64W.21831> xor edx,edx mov QWORD PTR [rcx+rbx*1],rax mov rax,QWORD PTR [rip+0x33d90] # a1938<default_syswow64W.21831+0x8> mov QWORD PTR [rip+0x3e8071],rcx # 455c20 <DIR_SysWow64> mov QWORD PTR [rcx+rbx*1+0x8],rax This block of code does not work: the returned value (rax) is immediately overwritten and rdi (corrupted by memcpy()) is used as a destination buffer. // memcpy( buffer, DIR_Windows, len * sizeof(WCHAR) ); mov rsi,QWORD PTR [rip+0x3e296a] # 455d10 <DIR_Windows> mov rdi,rax mov rdx,rbx call 26060 <memcpy@plt> // memcpy( buffer + len, default_syswow64W, sizeof(default_syswow64W) ); // HERE rdi is corrupted my GLIBC memcpy and rax is going to be overwritten mov rax,QWORD PTR [rip+0x2e698] # a1a50 <default_syswow64W.21831> xor edx,edx mov rcx,rdi mov QWORD PTR [rdi+rbx*1],rax I'm not sure whether registers with arguments must be kept intact after function returns, but it seems the bug is found - in gcc or glibc.
I' sorry, forgot to mention compiler flags: -O2 -g
We need at least preprocessed source of the failing code, produced with gcc -E.
Created attachment 29929 [details] gcc -E output. Attached gcc -E output. Lines around 22354 are being investigated.
Ah, ok, I can see it. To look at 22354 it helps to sed -i 's/^# .*$//', then look for 22354 in the dump. Seems the code is still correct at *.ce3 pass, Bernd's optimization kicks in during RA time before that and the pseudo holding buffer is assigned to %rdi before the call memcpy and assumed to live in %rax right after the call. But cprop_hardreg breaks this, changes the %rax after the memcpy call into %rdi, even when %rdi is call clobbered register, and even the call pattern contains (clobber (reg:DI 5 di)) and REG_DEAD note for the same register.
Reduced testcase (non-runtime, but one can see that %rdi which should be clobbered by the memcpy call is used immediately after the memcpy call). Probably wouldn't be too hard to turn this into an executable testcase, by adding some noinline/noclone attributes, define bar, baz functions, and in main initialize b. const unsigned short *b, *c; unsigned bar (void); unsigned short *baz (unsigned long); void __attribute__ ((ms_abi)) test (void) { unsigned d; unsigned short *e; if ((d = bar ())) { e = baz (d * sizeof (unsigned short) + 20); __builtin_memcpy (e, b, d * sizeof (unsigned short)); c = e; } } The ms_abi attribute seems to be essential for this, so perhaps something is broken in the ms ABI support or when mixing the two ABIs?
Runtime testcase for -O2, works with -O2 -fno-cprop-registers. It doesn't fail always, but around in 50% of cases, and heavily depends on what glibc is in use and what CPU too (as contemporary glibc's have IFUNC memcpy and select implementations based on cpuid). #define N 2001 unsigned short *b, *c, *d; __attribute__ ((noinline, noclone)) unsigned bar (void) { asm volatile ("" : : : "memory"); return N; } __attribute__ ((noinline, noclone)) unsigned short * baz (unsigned long x) { if (x != N * sizeof (unsigned short) + 20) __builtin_abort (); asm volatile ("" : : : "memory"); return d; } __attribute__ ((ms_abi, noinline, noclone)) foo (void) { unsigned d; unsigned short *e; if ((d = bar ())) { e = baz (d * sizeof (unsigned short) + 20); __builtin_memcpy (e, b, d * sizeof (unsigned short)); c = e; } } int main () { unsigned short a[2 * N]; int i; for (i = 0; i < 2 * N; i++) a[i] = i + 1; b = a; d = a + N; asm volatile ("" : : : "memory"); foo (); for (i = 0; i < N; i++) if (a[i] != i + 1 || a[i + N] != i + 1) __builtin_abort (); if (c != a + N) __builtin_abort (); return 0; }
Ah, so the issue is related to the fact that %rdi/%rsi aren't call clobbered in ms_abi, but are call clobbered in the sysv abi. The CALL_INSN pattern has clobbers for rsi/rdi, but copyprop_hardreg_forward_1 processes those early, perhaps then the CALL_INSN_FUNCTION_USAGE changes something and relies on regs_invalidated_by_call processing to invalidate whatever is necessary.
Created attachment 29936 [details] gcc49-pr57003.patch Untested fix. While we kill_clobbered_value early, for SET in CALL_INSN_FUNCTION_USAGE we add a value equivalence. In standard x86_64 ABI %rdi is invalidated by the call and present in the hard reg set to invalidate, so it is removed again, and perhaps if there is some target where the first argument is passed in non-call-clobbered register, the equivalence is correct. So, what the patch does is in this case apply the clobbers again. Bernd, does this look reasonable to you as the author of those changes?
Patch is OK, thanks Jakub - you were too fast for me on this one.
Author: jakub Date: Thu Apr 25 21:49:22 2013 New Revision: 198320 URL: http://gcc.gnu.org/viewcvs?rev=198320&root=gcc&view=rev Log: PR rtl-optimization/57003 * regcprop.c (copyprop_hardreg_forward_1): If ksvd.ignore_set_reg, call note_stores with kill_clobbered_value callback again after killing regs_invalidated_by_call. * gcc.target/i386/pr57003.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr57003.c Modified: trunk/gcc/ChangeLog trunk/gcc/regcprop.c trunk/gcc/testsuite/ChangeLog Author: jakub Date: Thu Apr 25 21:50:26 2013 New Revision: 198321 URL: http://gcc.gnu.org/viewcvs?rev=198321&root=gcc&view=rev Log: PR rtl-optimization/57003 * regcprop.c (copyprop_hardreg_forward_1): If ksvd.ignore_set_reg, call note_stores with kill_clobbered_value callback again after killing regs_invalidated_by_call. * gcc.target/i386/pr57003.c: New test. Added: branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr57003.c Modified: branches/gcc-4_8-branch/gcc/ChangeLog branches/gcc-4_8-branch/gcc/regcprop.c branches/gcc-4_8-branch/gcc/testsuite/ChangeLog
Confirming: the attached patch fixes the problem with wine. Thank you!
My r215428 change regressed this PR again. The difference is: --- pr57003.s 2014-10-03 15:08:24.000000000 +0200 +++ pr57003_.s 2014-10-03 15:08:19.000000000 +0200 @@ -78,7 +78,7 @@ leaq -20(%rbx), %rdx movq %rax, %rdi call memcpy - movq %rdi, c(%rip) + movq %rax, c(%rip) .L8: movaps (%rsp), %xmm6 movaps 16(%rsp), %xmm7 @@ -321,5 +321,5 @@ .byte 0xb .align 8 .LEFDE7: - .ident "GCC: (GNU) 5.0.0 20141002 (experimental) [trunk revision 215797]" + .ident "GCC: (GNU) 4.9.2 20141001 (prerelease) [gcc-4_9-branch revision 215749]" .section .note.GNU-stack,"",@progbits So, gcc-5.0 does not detect that %rdi is clobbered in ELF ABI. The runtime failure happens only on CentOS5 (and not in Fedora20), which supports findings in Comment #17. The difference is, that previously we emit memcpy call as: #(call_insn:TI 24 23 27 3 (set (reg:DI 0 ax) # (call (mem:QI (symbol_ref:DI ("memcpy") [flags 0x41] <function_decl 0x7fce6f586438 memcpy>) [0 memcpy S1 A8]) # (const_int 0 [0]))) pr57003.c:32 661 {*call_value} # (expr_list:REG_DEAD (reg:DI 5 di) # (expr_list:REG_DEAD (reg:DI 4 si) # (expr_list:REG_DEAD (reg:DI 1 dx) # (expr_list:REG_UNUSED (reg:DI 0 ax) # (expr_list:REG_RETURNED (reg/v/f:DI 2 cx [orig:87 e ] [87]) # (expr_list:REG_CALL_DECL (symbol_ref:DI ("memcpy") [flags 0x41] <function_decl 0x7fce6f586438 memcpy>) # (expr_list:REG_EH_REGION (const_int 0 [0]) # (nil)))))))) # (expr_list (clobber (reg:TI 52 xmm15)) # (expr_list (clobber (reg:TI 51 xmm14)) # (expr_list (clobber (reg:TI 50 xmm13)) # (expr_list (clobber (reg:TI 49 xmm12)) # (expr_list (clobber (reg:TI 48 xmm11)) # (expr_list (clobber (reg:TI 47 xmm10)) # (expr_list (clobber (reg:TI 46 xmm9)) # (expr_list (clobber (reg:TI 45 xmm8)) # (expr_list (clobber (reg:TI 28 xmm7)) # (expr_list (clobber (reg:TI 27 xmm6)) # (expr_list (clobber (reg:DI 5 di)) # (expr_list (clobber (reg:DI 4 si)) # (expr_list:DI (set (reg:DI 0 ax) # (reg:DI 5 di)) # (expr_list:DI (use (reg:DI 5 di)) # (expr_list:DI (use (reg:DI 4 si)) # (expr_list:DI (use (reg:DI 1 dx)) # (nil)))))))))))))))))) which is alternate, but equivalent form of what was generated previously: #(call_insn:TI 24 23 27 3 (parallel [ # (set (reg:DI 0 ax) # (call (mem:QI (symbol_ref:DI ("memcpy") [flags 0x41] <function_decl 0x7fd91824a800 memcpy>) [0 memcpy S1 A8]) # (const_int 0 [0]))) # (unspec [ # (const_int 0 [0]) # ] UNSPEC_MS_TO_SYSV_CALL) # (clobber (reg:DI 4 si)) # (clobber (reg:DI 5 di)) # (clobber (reg:TI 27 xmm6)) # (clobber (reg:TI 28 xmm7)) # (clobber (reg:TI 45 xmm8)) # (clobber (reg:TI 46 xmm9)) # (clobber (reg:TI 47 xmm10)) # (clobber (reg:TI 48 xmm11)) # (clobber (reg:TI 49 xmm12)) # (clobber (reg:TI 50 xmm13)) # (clobber (reg:TI 51 xmm14)) # (clobber (reg:TI 52 xmm15)) # ]) pr57003.c:32 652 {*call_value_rex64_ms_sysv} # (expr_list:REG_DEAD (reg:DI 5 di) # (expr_list:REG_DEAD (reg:DI 4 si) # (expr_list:REG_DEAD (reg:DI 1 dx) # (expr_list:REG_RETURNED (reg/v/f:DI 2 cx [orig:87 e ] [87]) # (expr_list:REG_EH_REGION (const_int 0 [0]) # (nil)))))) # (expr_list:DI (set (reg:DI 0 ax) # (reg:DI 5 di)) # (expr_list:DI (use (reg:DI 5 di)) # (expr_list:DI (use (reg:DI 4 si)) # (expr_list:DI (use (reg:DI 1 dx)) # (nil)))))) It looks that Jakub's patch, proposed in Comment #21 doesn't cover alternative form, so it doesn't record clobbers in the alternative form properly. Reopened as 5.0 regression.
(In reply to Uroš Bizjak from comment #23) > The difference is, that previously we emit memcpy call as: Slip of the tongue, this should read: ... that now we emit memcpy call as:
Patch in testing: --cut here-- Index: regcprop.c =================================================================== --- regcprop.c (revision 215852) +++ regcprop.c (working copy) @@ -1030,6 +1030,12 @@ copyprop_hardreg_forward_1 (basic_block bb, struct assume the value in it is still live. */ if (ksvd.ignore_set_reg) note_stores (PATTERN (insn), kill_clobbered_value, vd); + for (exp = CALL_INSN_FUNCTION_USAGE (insn); exp; exp = XEXP (exp, 1)) + { + rtx x = XEXP (exp, 0); + if (GET_CODE (x) == CLOBBER) + kill_value (SET_DEST (x), vd); + } } /* Notice stores. */ --cut here--
Additional patch at [1] [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00332.html
Shouldn't it be guarded by if (ksvd.ignored_set_reg) too?
(In reply to Jakub Jelinek from comment #27) > Shouldn't it be guarded by if (ksvd.ignored_set_reg) too? Yes, and the published patch implements just that.
Author: uros Date: Thu Oct 9 06:36:08 2014 New Revision: 216026 URL: https://gcc.gnu.org/viewcvs?rev=216026&root=gcc&view=rev Log: PR rtl-optimization/57003 * regcprop.c (copyprop_hardreg_forward_1): If ksvd.ignore_set_reg, also check CALL_INSN_FUNCTION_USAGE for clobbers again after killing regs_invalidated_by_call. Modified: trunk/gcc/ChangeLog trunk/gcc/regcprop.c
Author: uros Date: Thu Oct 9 08:22:23 2014 New Revision: 216030 URL: https://gcc.gnu.org/viewcvs?rev=216030&root=gcc&view=rev Log: Backport from mainline 2014-10-09 Uros Bizjak <ubizjak@gmail.com> PR rtl-optimization/57003 * regcprop.c (copyprop_hardreg_forward_1): If ksvd.ignore_set_reg, also check CALL_INSN_FUNCTION_USAGE for clobbers again after killing regs_invalidated_by_call. Modified: branches/gcc-4_9-branch/gcc/ChangeLog branches/gcc-4_9-branch/gcc/regcprop.c
Author: uros Date: Thu Oct 9 09:05:37 2014 New Revision: 216035 URL: https://gcc.gnu.org/viewcvs?rev=216035&root=gcc&view=rev Log: Backport from mainline 2014-10-09 Uros Bizjak <ubizjak@gmail.com> PR rtl-optimization/57003 * regcprop.c (copyprop_hardreg_forward_1): If ksvd.ignore_set_reg, also check CALL_INSN_FUNCTION_USAGE for clobbers again after killing regs_invalidated_by_call. Modified: branches/gcc-4_8-branch/gcc/ChangeLog branches/gcc-4_8-branch/gcc/regcprop.c
Again fixed everywhere.