Created attachment 33118 [details] testcase from glibc trunk gcc-4.8 -S bug-887141_pthread_create.i -m32 -std=gnu99 -fgnu89-inline -O2 -fmerge-all-constants -frounding-math -fPIC -mpreferred-stack-boundary=4 -fverbose-asm -da -fdump-tree-all -g sched2 moves the load from 20(%esp) up across the spill. __nptl_setxid: ... .LBB347: .loc 1 1174 0 movl 80(%esp), %eax # cmdp, tmp189 movl 20(%esp), %esi # %sfp, result <---- bogus location .LVL184: movl (%eax), %eax # cmdp_33(D)->syscall_no, cmdp_33(D)->syscall_no movl %eax, 20(%esp) # cmdp_33(D)->syscall_no, %sfp .LVL185: movl 80(%esp), %eax # cmdp, tmp191 movl 4(%eax), %edi # cmdp_33(D)->id, cmdp_33(D)->id movl 8(%eax), %ecx # cmdp_33(D)->id, cmdp_33(D)->id movl 12(%eax), %edx # cmdp_33(D)->id, cmdp_33(D)->id <---- moved from here movl %esi, %eax # result, result #APP # 1174 "allocatestack.c" 1 xchgl %ebx, %edi int $0x80 xchgl %edi, %ebx before sched2 everything looks ok (apart from odd debug-insn with asm): (code_label 308 344 309 40 194 "" [1 uses]) (note 309 308 531 40 [bb 40] NOTE_INSN_BASIC_BLOCK) (insn 531 309 310 40 (set (reg:SI 0 ax [189]) (mem/f/c:SI (plus:SI (reg/f:SI 7 sp) (const_int 80 [0x50])) [4 cmdp+0 S4 A32])) allocatestack.c:1174 89 {*movsi_internal} (nil)) (insn 310 531 532 40 (set (reg:SI 0 ax [orig:137 cmdp_33(D)->syscall_no ] [137]) (mem:SI (reg:SI 0 ax [189]) [2 cmdp_33(D)->syscall_no+0 S4 A32])) allocatestack.c:1174 89 {*movsi_internal} (nil)) (insn 532 310 533 40 (set (mem/c:SI (plus:SI (reg/f:SI 7 sp) (const_int 20 [0x14])) [54 %sfp+-12 S4 A32]) (reg:SI 0 ax [orig:137 cmdp_33(D)->syscall_no ] [137])) allocatestack.c:1174 89 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 0 ax [orig:137 cmdp_33(D)->syscall_no ] [137]) (nil))) (insn 533 532 311 40 (set (reg:SI 0 ax [191]) (mem/f/c:SI (plus:SI (reg/f:SI 7 sp) (const_int 80 [0x50])) [4 cmdp+0 S4 A32])) allocatestack.c:1174 89 {*movsi_internal} (nil)) (insn 311 533 312 40 (set (reg:SI 5 di [orig:138 cmdp_33(D)->id ] [138]) (mem:SI (plus:SI (reg:SI 0 ax [191]) (const_int 4 [0x4])) [5 cmdp_33(D)->id+0 S4 A32])) allocatestack.c:1174 89 {*movsi_internal} (nil)) (insn 312 311 313 40 (set (reg:SI 2 cx [orig:139 cmdp_33(D)->id+4 ] [139]) (mem:SI (plus:SI (reg:SI 0 ax [192]) (const_int 8 [0x8])) [5 cmdp_33(D)->id+4 S4 A32])) allocatestack.c:1174 89 {*movsi_internal} (nil)) (insn 313 312 314 40 (set (reg:SI 1 dx [orig:140 cmdp_33(D)->id+8 ] [140]) (mem:SI (plus:SI (reg:SI 0 ax [193]) (const_int 12 [0xc])) [5 cmdp_33(D)->id+8 S4 A32])) allocatestack.c:1174 89 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 0 ax [193]) (nil))) (note 314 313 316 40 NOTE_INSN_DELETED) (debug_insn 316 314 477 40 (var_location:SI resultvar (asm_operands/v:SI ("xchgl %%ebx, %%edi int $0x80 xchgl %%edi, %%ebx ") ("=a") 0 [ (mem/c:SI (plus:SI (reg/f:SI 7 sp) (const_int 20 [0x14])) [54 %sfp+-12 S4 A32]) (reg:SI 5 di [orig:138 cmdp_33(D)->id ] [138]) (reg:SI 2 cx [orig:139 cmdp_33(D)->id+4 ] [139]) (reg:SI 1 dx [orig:140 cmdp_33(D)->id+8 ] [140]) ] [ (asm_input:SI ("0") (null):0) (asm_input:SI ("D") (null):0) (asm_input:SI ("c") (null):0) (asm_input:SI ("d") (null):0) ] [] allocatestack.c:1174)) allocatestack.c:1174 -1 (nil)) (insn 477 316 536 40 (set (reg/v:SI 4 si [orig:60 result ] [60]) (mem/c:SI (plus:SI (reg/f:SI 7 sp) (const_int 20 [0x14])) [54 %sfp+-12 S4 A32])) allocatestack.c:1174 89 {*movsi_internal} (nil)) (insn 536 477 317 40 (set (reg/v:SI 0 ax [orig:60 result ] [60]) (reg/v:SI 4 si [orig:60 result ] [60])) allocatestack.c:1174 89 {*movsi_internal} (expr_list:REG_DEAD (reg/v:SI 4 si [orig:60 result ] [60]) (nil))) (insn 317 536 537 40 (parallel [ (set (reg/v:SI 0 ax [orig:60 result ] [60]) (asm_operands/v:SI ("xchgl %%ebx, %%edi int $0x80 xchgl %%edi, %%ebx
Auto-reduring (matching the bogus assembler pattern).
Created attachment 33122 [details] autoreduced testcase Autoreduced testcase.
Reduced testcase also reproduces the bug with 4.9 and trunk (but not 4.7): movl 64(%esp), %eax # cmdp, tmp206 movl 4(%esp), %esi # %sfp, result .LVL36: movl (%eax), %eax # cmdp_30(D)->syscall_no, cmdp_30(D)->syscall_no movl %eax, 4(%esp) # cmdp_30(D)->syscall_no, %sfp .LVL37: movl 64(%esp), %eax # cmdp, tmp208 movl 4(%eax), %edi # cmdp_30(D)->id, cmdp_30(D)->id movl 8(%eax), %ecx # cmdp_30(D)->id, cmdp_30(D)->id movl 12(%eax), %edx # cmdp_30(D)->id, cmdp_30(D)->id movl %esi, %eax # result, result #APP # 75 "bug-887141_pthread_create.1.min.i" 1 xchgl %ebx, %edi int $0x80 xchgl %edi, %ebx Note that 4.7 doesn't spill %eax but generates movl 0(%ebp), %esi # cmdp_18(D)->syscall_no, result .LVL45: movl 4(%ebp), %edi # cmdp_18(D)->id, cmdp_18(D)->id movl 8(%ebp), %ecx # cmdp_18(D)->id, cmdp_18(D)->id movl 12(%ebp), %edx # cmdp_18(D)->id, cmdp_18(D)->id movl %esi, %eax # result, #APP # 75 "bug-887141_pthread_create.1.min.i" 1 xchgl %ebx, %edi int $0x80 xchgl %edi, %ebx which would be an IRA issue?
Created attachment 33123 [details] more reduced On trunk reproduces with the following slightly more manual reduced testcase and -O2 -m32 -g (so even without -fPIC).
;; --- Region Dependences --- b 12 bb 0 ;; insn code bb dep prio cost reservation ;; ---- ---- -- --- ---- ---- ----------- ... ;; 239 90 12 1 5 1 athlon-direct,athlon-agu,athlon-store : 127 123n 122nm 240 ... ;; 122 -1 12 7 0 0 nothing : 124 123nm 243 216m ;; 216 90 12 0 5 3 athlon-direct,athlon-load : 127 123nm 243 ... which maps to: (insn 239 116 240 12 (set (mem/c:SI (reg/f:SI 7 sp) [11 %sfp+-16 S4 A32]) (reg:SI 0 ax [orig:127 cmdp_14(D)->syscall_no ] [127])) bug-887141_pthread_create.1.min.i:77 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 0 ax [orig:127 cmdp_14(D)->syscall_no ] [127]) (nil))) ... (debug_insn 122 120 216 12 (var_location:SI resultvar (asm_operands/v:SI ("xchgl %%ebx, %%edi int $0x80 xchgl %%edi, %%ebx ") ("=a") 0 [ (mem/c:SI (reg/f:SI 7 sp) [11 %sfp+-16 S4 A32]) (reg:SI 5 di [orig:128 cmdp_14(D)->id ] [128]) (reg:SI 2 cx [orig:129 cmdp_14(D)->id+4 ] [129]) (reg:SI 1 dx [orig:130 cmdp_14(D)->id+8 ] [130]) ] [ (asm_input:SI ("0") bug-887141_pthread_create.1.min.i:77) (asm_input:SI ("D") bug-887141_pthread_create.1.min.i:77) (asm_input:SI ("c") bug-887141_pthread_create.1.min.i:77) (asm_input:SI ("d") bug-887141_pthread_create.1.min.i:77) ] [] bug-887141_pthread_create.1.min.i:77)) bug-887141_pthread_create.1.min.i:77 -1 (nil)) (insn 216 122 243 12 (set (reg/v:SI 4 si [orig:84 result ] [84]) (mem/c:SI (reg/f:SI 7 sp) [11 %sfp+-16 S4 A32])) bug-887141_pthread_create.1.min.i:77 90 {*movsi_internal} (nil)) insn 123 is the real asm. Not sure if the dependence of 239 via 122 to 216 is supposed to prevent scheduling 216 before 239. If so, then dependence information is correct. The only forward dependence to 216 is really from the debug insn. But then: ;; dependencies resolved: insn 238 ;; tick updated: insn 238 into ready ;; dependencies resolved: insn 216 ;; tick updated: insn 216 into ready ;; Advanced a state. ;; Ready list after queue_to_ready: 216:67:prio=5 238:59:prio=11 what? 216 is already ready? ;; Ready list (t = 0): 216:67:prio=5 238:59:prio=11 ;; 0--> b 0: i 238 ax=[sp+0x30] :athlon-direct,athlon-load ;; dependencies resolved: insn 116 ;; Ready-->Q: insn 116: queued for 3 cycles (change queue index). ;; tick updated: insn 116 into queue with cost=3 ;; Ready list after ready_sort: 216:67:prio=5 ;; Ready list (t = 0): 216:67:prio=5 ;; 0--> b 0: i 216 si=[sp] :athlon-direct,athlon-load ;; resetting: debug insn 122 yeah, so we reset the debug insn. But ignored the indirect dependence from 239. Now, of course I'm lost in the scheduler code, not knowing how it is intended to work with debug-insns. As a band-aid fix I'd simply never generate debug_insns with asms ...
It wouldn't be that much of a band-aid, as we can't do anything reasonable with asm in debug_insn anyway, there is no way to emit it into DWARF4 nor upcoming DWARF version.
It's combine combining 120 into 123 and on the way via propagate_for_debug replacing reg:SI 126 with the asm in the debug_insn. (insn 120 119 122 15 (parallel [ (set (reg:SI 126 [ resultvar ]) (asm_operands/v:SI ("xchgl %%ebx, %%edi int $0x80 xchgl %%edi, %%ebx ... (debug_insn 122 120 123 15 (var_location:SI resultvar (reg:SI 126 [ resultvar ])) bug-887141_pthread_create.1.min.i:77 -1 (nil)) (insn 123 122 124 15 (set (reg/v:SI 84 [ result ]) (reg:SI 126 [ resultvar ])) bug-887141_pthread_create.1.min.i:77 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 126 [ resultvar ]) (nil))) So to avoid generating debug-insns with asm_operands loc maybe do Index: valtrack.c =================================================================== --- valtrack.c (revision 212580) +++ valtrack.c (working copy) @@ -197,6 +197,12 @@ propagate_for_debug (rtx insn, rtx last, next = NEXT_INSN (insn); if (DEBUG_INSN_P (insn)) { + if (GET_CODE (src) == ASM_OPERANDS) + { + INSN_VAR_LOCATION_LOC (insn) = gen_rtx_UNKNOWN_VAR_LOC (); + df_insn_rescan (insn); + continue; + } loc = simplify_replace_fn_rtx (INSN_VAR_LOCATION_LOC (insn), dest, propagate_for_debug_subst, &p); if (loc == INSN_VAR_LOCATION_LOC (insn)) which "fixes" the bug.
The following fixes it as well, in the scheduler. Index: gcc/sched-deps.c =================================================================== --- gcc/sched-deps.c (revision 212580) +++ gcc/sched-deps.c (working copy) @@ -2713,7 +2713,8 @@ sched_analyze_2 (struct deps_desc *deps, break; case PREFETCH: - if (PREFETCH_SCHEDULE_BARRIER_P (x)) + if (PREFETCH_SCHEDULE_BARRIER_P (x) + && !DEBUG_INSN_P (insn)) reg_pending_barrier = TRUE_BARRIER; /* Prefetch insn contains addresses only. So if the prefetch address has no registers, there will be no dependencies on @@ -2750,7 +2751,8 @@ sched_analyze_2 (struct deps_desc *deps, Consider for instance a volatile asm that changes the fpu rounding mode. An insn should not be moved across this even if it only uses pseudo-regs because it might give an incorrectly rounded result. */ - if (code != ASM_OPERANDS || MEM_VOLATILE_P (x)) + if ((code != ASM_OPERANDS || MEM_VOLATILE_P (x)) + && !DEBUG_INSN_P (insn)) reg_pending_barrier = TRUE_BARRIER; /* For all ASM_OPERANDS, we must traverse the vector of input operands. we then have ;; --- Region Dependences --- b 12 bb 0 ;; insn code bb dep prio cost reservation ;; ---- ---- -- --- ---- ---- ----------- ;; 239 90 12 1 5 1 athlon-direct,athlon-agu,athlon-store : 127 123nm 216n 240 ... ;; 122 -1 12 3 0 0 nothing : 124 123nm ;; 216 90 12 1 5 3 athlon-direct,athlon-load : 127 123nm 243 so the false forward dependence of the store to the debug-insn is gone and instead a proper dependence on the load is there. Testing the fix.
Author: rguenth Date: Thu Jul 17 07:47:19 2014 New Revision: 212738 URL: https://gcc.gnu.org/viewcvs?rev=212738&root=gcc&view=rev Log: 2014-07-17 Richard Biener <rguenther@suse.de> PR rtl-optimization/61801 * sched-deps.c (sched_analyze_2): For ASM_OPERANDS and ASM_INPUT don't set reg_pending_barrier if it appears in a debug-insn. Modified: trunk/gcc/ChangeLog trunk/gcc/sched-deps.c
Author: rguenth Date: Thu Jul 17 07:48:49 2014 New Revision: 212739 URL: https://gcc.gnu.org/viewcvs?rev=212739&root=gcc&view=rev Log: 2014-07-17 Richard Biener <rguenther@suse.de> PR rtl-optimization/61801 * sched-deps.c (sched_analyze_2): For ASM_OPERANDS and ASM_INPUT don't set reg_pending_barrier if it appears in a debug-insn. Modified: branches/gcc-4_9-branch/gcc/ChangeLog branches/gcc-4_9-branch/gcc/sched-deps.c
Author: rguenth Date: Thu Jul 17 07:49:44 2014 New Revision: 212740 URL: https://gcc.gnu.org/viewcvs?rev=212740&root=gcc&view=rev Log: 2014-07-17 Richard Biener <rguenther@suse.de> PR rtl-optimization/61801 * sched-deps.c (sched_analyze_2): For ASM_OPERANDS and ASM_INPUT don't set reg_pending_barrier if it appears in a debug-insn. Modified: branches/gcc-4_8-branch/gcc/ChangeLog branches/gcc-4_8-branch/gcc/sched-deps.c
Fixed.
*** Bug 61904 has been marked as a duplicate of this bug. ***
Here's a simple testcase for this issue (triggers for 4.8.3 and 4.9.1): markus@x4 linux % cat exit.i int a, b, c; void fn1 () { int d; if (fn2 () && !0) { b = ( { int e; fn3 (); switch (0) default: asm volatile("" : "=a"(e) : "0"(a), ""(0)); e; }); d = b; } c = d; } markus@x4 linux % gcc -fcompare-debug -O2 -c exit.i gcc: error: exit.i: -fcompare-debug failure (length) markus@x4 linux % gcc -fcompare-debug -Os -c exit.i gcc: error: exit.i: -fcompare-debug failure (length) markus@x4 linux %
Author: rguenth Date: Mon Jul 28 07:54:08 2014 New Revision: 213111 URL: https://gcc.gnu.org/viewcvs?rev=213111&root=gcc&view=rev Log: 2014-07-28 Richard Biener <rguenther@suse.de> PR rtl-optimization/61801 * gcc.target/i386/pr61801.c: New testcase. Added: trunk/gcc/testsuite/gcc.target/i386/pr61801.c Modified: trunk/gcc/testsuite/ChangeLog
Author: rguenth Date: Mon Jul 28 07:54:57 2014 New Revision: 213112 URL: https://gcc.gnu.org/viewcvs?rev=213112&root=gcc&view=rev Log: 2014-07-28 Richard Biener <rguenther@suse.de> PR rtl-optimization/61801 * gcc.target/i386/pr61801.c: New testcase. Added: branches/gcc-4_9-branch/gcc/testsuite/gcc.target/i386/pr61801.c Modified: branches/gcc-4_9-branch/gcc/testsuite/ChangeLog
Author: rguenth Date: Mon Jul 28 07:59:22 2014 New Revision: 213113 URL: https://gcc.gnu.org/viewcvs?rev=213113&root=gcc&view=rev Log: 2014-07-28 Richard Biener <rguenther@suse.de> PR rtl-optimization/61801 * gcc.target/i386/pr61801.c: New testcase. Added: branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr61801.c Modified: branches/gcc-4_8-branch/gcc/testsuite/ChangeLog
Author: rguenth Date: Mon Jul 28 09:01:54 2014 New Revision: 213119 URL: https://gcc.gnu.org/viewcvs?rev=213119&root=gcc&view=rev Log: 2014-07-28 Richard Biener <rguenther@suse.de> PR rtl-optimization/61801 * gcc.target/i386/pr61801.c: Fix testcase. Modified: branches/gcc-4_9-branch/gcc/testsuite/ChangeLog branches/gcc-4_9-branch/gcc/testsuite/gcc.target/i386/pr61801.c
Author: rguenth Date: Mon Jul 28 09:02:23 2014 New Revision: 213120 URL: https://gcc.gnu.org/viewcvs?rev=213120&root=gcc&view=rev Log: 2014-07-28 Richard Biener <rguenther@suse.de> PR rtl-optimization/61801 * gcc.target/i386/pr61801.c: Fix testcase. Modified: branches/gcc-4_8-branch/gcc/testsuite/ChangeLog branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr61801.c
Author: rguenth Date: Mon Jul 28 09:02:39 2014 New Revision: 213121 URL: https://gcc.gnu.org/viewcvs?rev=213121&root=gcc&view=rev Log: 2014-07-28 Richard Biener <rguenther@suse.de> PR rtl-optimization/61801 * gcc.target/i386/pr61801.c: Fix testcase. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/pr61801.c
Author: jakub Date: Wed Aug 6 08:40:19 2014 New Revision: 213652 URL: https://gcc.gnu.org/viewcvs?rev=213652&root=gcc&view=rev Log: PR rtl-optimization/61801 * gcc.target/i386/pr61801.c: Rewritten. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/pr61801.c
Author: jakub Date: Wed Aug 6 08:44:05 2014 New Revision: 213653 URL: https://gcc.gnu.org/viewcvs?rev=213653&root=gcc&view=rev Log: PR rtl-optimization/61801 * gcc.target/i386/pr61801.c: Rewritten. Modified: branches/gcc-4_9-branch/gcc/testsuite/ChangeLog branches/gcc-4_9-branch/gcc/testsuite/gcc.target/i386/pr61801.c
Author: jakub Date: Wed Aug 6 08:50:12 2014 New Revision: 213654 URL: https://gcc.gnu.org/viewcvs?rev=213654&root=gcc&view=rev Log: PR rtl-optimization/61801 * gcc.target/i386/pr61801.c: Rewritten. Modified: branches/gcc-4_8-branch/gcc/testsuite/ChangeLog branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr61801.c