Summary: | [4.4/4.5/4.6 Regression] Overlooked dependency causes wrong scheduling, wrong code | ||
---|---|---|---|
Product: | gcc | Reporter: | Kaveh Ghazi <ghazi> |
Component: | rtl-optimization | Assignee: | Alexandre Oliva <aoliva> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | amonakov, aoliva, bernds, ebotcazou, gcc-bugs, jakub, rguenth, sje, steven |
Priority: | P2 | Keywords: | alias, wrong-code |
Version: | 4.4.4 | ||
Target Milestone: | 4.4.6 | ||
Host: | Target: | ia64-unknown-linux-gnu | |
Build: | Known to work: | 4.3.5 | |
Known to fail: | 4.4.4, 4.5.0, 4.6.0 | Last reconfirmed: | 2010-07-13 09:43:15 |
Attachments: |
debug log
Updated patch, still awaiting review |
Description
Kaveh Ghazi
2010-03-23 17:34:38 UTC
4.4.4 ia64 results with error: http://gcc.gnu.org/ml/gcc-testresults/2010-03/msg01631.html 4.5.0 ia64 results with error: http://gcc.gnu.org/ml/gcc-testresults/2010-03/msg01997.html This sounds like an aliasing issue which is only exposed on ia64. I'll have a look. Confirmed with r162278. Scheduling after reload (which is bundling for the ia64 backend) seems to cause this: $ ./xgcc -B. -O2 vector-2.c -fno-inline -fPIC ; echo Testing ; ./a.out Testing Aborted $ ./xgcc -B. -O2 -fno-schedule-insns2 vector-2.c -fno-inline -fPIC -fschedule-insns ; echo Testing ; ./a.out Testing $ Goes away with selective scheduling: $ ./xgcc -B. -O2 -fselective-scheduling2 vector-2.c -fno-inline -fPIC ; echo Testing ; ./a.out Testing $ Makes this a scheduler bug. It looks like a store is scheduled wrong. Slightly reduced test case: ---------------------- 8< --------------------#define vector __attribute__((vector_size(16) )) vector int f1(vector int t, int a) { ((int*)&t)[1] = a; return t; } int main(void) { vector int a = {0, 0, 0, 0}; vector int c = {0, 1, 0, 0}; vector int a0; a0 = f1(a, 1); if (memcmp (&a0, &c, sizeof(a0))) __builtin_abort (); return 0; } ---------------------- 8< -------------------- Compiled at "-O2 -fno-inline -fpic" this will abort. The assembler output for f1 is this: 6 .global f1# 7 .type f1#, @function 8 .proc f1# 9 f1: 10 .prologue 11 .body 12 .mmi 13 mov r15 = r12 14 nop 0 15 mov r14 = r12 16 ;; 17 .mmi 18 st8 [r15] = r32, 8 19 ;; 20 st8 [r15] = r33, -4 21 nop 0 22 .mii 23 ld8 r8 = [r14], 8 24 nop 0 25 ;; 26 nop 0 27 .mmb 28 ld8 r9 = [r14] 29 st4 [r15] = r34 30 br.ret.sptk.many b0 31 .endp f1# The store on line 29 looked suspicious because the three stores (lines 18, 20, and 29) are together in the unscheduled version (i.e. with -fno-schedule-insns2, lines 15, 17, and 19): 6 .global f1# 7 .type f1#, @function 8 .proc f1# 9 f1: 10 .prologue 11 .body 12 mov r15 = r12 13 mov r14 = r12 14 ;; 15 st8 [r15] = r32, 8 16 ;; 17 st8 [r15] = r33, -4 18 ;; 19 st4 [r15] = r34 20 ld8 r8 = [r14], 8 21 ;; 22 ld8 r9 = [r14] 23 br.ret.sptk.many b0 24 ;; 25 .endp f1# The abort goes away if I hack the assembly manually with an extra bundle to move the three stores back together: .global f1# .global f1# .type f1#, @function .type f1#, @function .proc f1# .proc f1# f1: f1: .prologue .prologue .body .body .mmi .mmi mov r15 = r12 mov r15 = r12 nop 0 nop 0 mov r14 = r12 mov r14 = r12 ;; ;; .mmi .mmi st8 [r15] = r32, 8 st8 [r15] = r32, 8 ;; ;; st8 [r15] = r33, -4 st8 [r15] = r33, -4 nop 0 nop 0 > ;; > .mii > st4 [r15] = r34 > nop 0 > nop 0 .mii .mii ld8 r8 = [r14], 8 ld8 r8 = [r14], 8 nop 0 nop 0 ;; ;; nop 0 nop 0 .mmb .mmb ld8 r9 = [r14] ld8 r9 = [r14] st4 [r15] = r34 | nop 0 br.ret.sptk.many b0 br.ret.sptk.many b0 .endp f1# .endp f1# I am not an ia64 expert, but I see no reason why moving the store is a bad idea. Will have to look at the RTL, and if that doesn't help I'll leave this to a target specialist. Smaller hand-changed assembly without new bundle (left aborts, right does not): .global f1# .global f1# .type f1#, @function .type f1#, @function .proc f1# .proc f1# f1: f1: .prologue .prologue .body .body .mmi .mmi mov r15 = r12 mov r15 = r12 nop 0 nop 0 mov r14 = r12 mov r14 = r12 ;; ;; .mmi .mmi st8 [r15] = r32, 8 st8 [r15] = r32, 8 ;; ;; st8 [r15] = r33, -4 st8 [r15] = r33, -4 nop 0 nop 0 .mii | ;; > .mmb > st4 [r15] = r34 > ;; ld8 r8 = [r14], 8 ld8 r8 = [r14], 8 nop 0 < ;; ;; nop 0 nop 0 .mmb .mmb ld8 r9 = [r14] ld8 r9 = [r14] st4 [r15] = r34 | nop 0 br.ret.sptk.many b0 br.ret.sptk.many b0 .endp f1# .endp f1# The only thing that seems to matter, is that "st4 [r15] = r34" comes before "ld8 r9 = [r14]". Ah, and since both r14 and r15 are initially copies of r12, they point to the same memory area (modulo auto-increments/decrements). So indeed an alias thing. Offending insns that are scheduled in the wrong order: (insn:TI 28 48 9 2 vector-2.c:7 (set (reg:DI 9 r9 [+8 ]) (mem/c/i:DI (reg/f:DI 14 r14 [351]) [2 t+8 S8 A64])) 5 {movdi_internal} (expr_list:REG_DEAD (reg/f:DI 14 r14 [351]) (nil))) (insn 9 28 18 2 vector-2.c:5 (set (mem/s/j/c:SI (reg/f:DI 15 r15 [343]) [2 t+4 S4 A32]) (reg:SI 114 r34 [ a ])) 4 {movsi_internal} (expr_list:REG_DEAD (reg:SI 114 r34 [ a ]) (expr_list:REG_DEAD (reg/f:DI 15 r15 [343]) (nil)))) So the MEMs are: load from "(mem/c/i:DI (reg/f:DI 14 r14 [351]) [2 t+8 S8 A64])" store to "(mem/s/j/c:SI (reg/f:DI 15 r15 [343]) [2 t+4 S4 A32])" There is no dependency of insn 28 on insn 9, even though this is a rather obvious read-after-write dependency. ;; ====================================================== ;; -- basic block 2 from 30 to 39 -- after reload ;; ====================================================== ;; --------------- forward dependences: ------------ ;; --- EBB Dependences --- from bb2 to bb2 ;; insn code bb dep prio cost reservation ;; ---- ---- -- --- ---- ---- ----------- ;; 30 5 2 0 3 1 2_A : 39 9 36 35 ;; 29 5 2 0 2 1 2_A : 39 28 21 ;; 35 5 2 1 2 1 2_M_only_um23 : 39 21 9 36 ;; 36 5 2 2 1 1 2_M_only_um23 : 39 28 9 ;; 9 4 2 3 0 1 2_M_only_um23 : 39 ;; 21 5 2 2 1 1 2_M_only_um01 : 39 18 28 ;; 28 5 2 3 0 1 2_M_only_um01 : 39 18 ;; 18 -1 2 2 0 0 nothing : 39 ;; 39 334 2 8 0 0 2_B : It's probably worth noting that the disambiguated MEMs are of different widths: > load from "(mem/c/i:DI (reg/f:DI 14 r14 [351]) [2 t+8 S8 A64])" > store to "(mem/s/j/c:SI (reg/f:DI 15 r15 [343]) [2 t+4 S4 A32])" (btw, IIUC the above mems do not alias indeed, and the problem is in disambiguating the above store and a load from (mem/c/i:DI (post_inc:DI (reg/f:DI 14 r14 [351])) [2 t+0 S8 A128]) ) Also, with -fno-strict-aliasing the failure vanishes. Re. comment 9: Well, the order of *this* store and *this* load is the difference between the test case failing or passing. So I do not think the problem is between this load and another store. Before sched2 (or actually, .mach on ia64): "vector-2.c.213r.compgotos" (notes, NOPs, bundle markers removed): ;; Function f1 (f1) 30 r15:DI=r12:DI 29 r14:DI=r12:DI 35 [r15:DI++]=r32:DI 36 [post r15:DI+=0xfffffffffffffffc]=r33:DI 9 [r15:DI]=r34:SI 21 r8:DI=[r14:DI++] 28 r9:DI=[r14:DI] 18 use r8:TI 39 {return;use b0:DI;} After .sched2: "vector-2.c.215r.mach" (notes, NOPs, bundle markers removed): 30 r15:DI=r12:DI 29 r14:DI=r12:DI 35 [r15:DI++]=r32:DI 36 [post r15:DI+=0xfffffffffffffffc]=r33:DI 21 r8:DI=[r14:DI++] 28 r9:DI=[r14:DI] 9 [r15:DI]=r34:SI 18 use r8:TI 39 {return;use b0:DI;} Note that the only real change in the scheduling is that insn 9 is moved after insn 21 and insn 28. insn 9 is the store "[r15:DI]=r34:SI" that later expands to the st4 instruction. insn 28 and insn 9 in non-slim form, with assembler output: //(insn:TI 28 48 9 2 vector-2.c:7 (set (reg:DI 9 r9 [+8 ]) // (mem/c/i:DI (reg/f:DI 14 r14 [351]) [2 t+8 S8 A64])) // 5 {movdi_internal} (expr_list:REG_DEAD (reg/f:DI 14 r14 [351]) // (nil))) ld8 r9 = [r14] // 28 movdi_internal/5 //(insn 9 28 18 2 vector-2.c:5 (set (mem/s/j/c:SI (reg/f:DI 15 r15 [343]) // [2 t+4 S4 A32]) // (reg:SI 114 r34 [ a ])) 4 {movsi_internal} // (expr_list:REG_DEAD (reg:SI 114 r34 [ a ]) // (expr_list:REG_DEAD (reg/f:DI 15 r15 [343]) // (nil)))) st4 [r15] = r34 // 9 movsi_internal/6 Bug 35658 is another case where a store is scheduled after a load that depends on the store. It may well be that bug 35658 and this bug are the same issue. (In reply to comment #10) > Re. comment 9: Well, the order of *this* store and *this* load is the > difference between the test case failing or passing. So I do not think the > problem is between this load and another store. To clarify, I'm saying that the problem is in moving insn 9 (the store) past insn 21, not past insn 28. Your simplified dumps in comment #10 support that (r15 is incremented and decremented, r14 is post-modified; thus, both insn 21 and insn 9 touch memory[r12]). Insn 28 is not relevant to the miscompilation. The options "-O2 -fno-inline" is enough to make the test case of comment #5 fail. This bug is not specific to -fpic / -fPIC, inlining of f1 just hides the bug for the non-PIC case. Works with "gcc-4.3 (Debian 4.3.5-1) 4.3.5", both with -fpic and with -fno-inline. That makes this bug a regression. CC:RM - who happens to know a thing or two about the alias analysis code also. OK, I think I finally understand what Alexander tried to explain, and I've annotated the code. Alexander, does this look right to you? f1: // vector int f1(vector int t) .mmi mov r15 = r12 // 30: r12 = @temp1 mov r14 = r12 // 29: r14 = @temp1 addl r16 = 1, r0 // 34: r16 = 1 ;; .mmi st8 [r15] = r32, 8 // 36: temp1[0:1] = t[0:1], r15=@temp[2] ;; st8 [r15] = r33, -4 // 37: temp1[2:3] = t[2:3], r15=@temp[1] nop 0 .mii ld8 r8 = [r14], 8 // 21: r8 = temp[0:1] nop 0 ;; nop 0 .mmb ld8 r9 = [r14] // 28: r9 = temp[2:3] st4 [r15] = r16 // 9: temp[1] = 1 br.ret.sptk.many b0 // 40: return r8:r9 .endp f1# (In reply to comment #16) > OK, I think I finally understand what Alexander tried to explain, and I've > annotated the code. Alexander, does this look right to you? Yes, thanks. Created attachment 21277 [details]
debug log
I think the problem is that fixed_scalar_and_varying_struct_p is called with a VALUE address, i.e. not expanded. See attached debug log.
x_addr is a VALUE that has no locs: Breakpoint 4, true_dependence (mem=0x20000000005ddf68, mem_mode=VOIDmode, x=0x20000000005ddfb0, varies=0x2000000000496720) at ../../trunk/gcc/alias.c:2330 2330 if (MEM_VOLATILE_P (x) && MEM_VOLATILE_P (mem)) (gdb) cont Continuing. Breakpoint 11, get_addr (x=0x6000000000316e28) at ../../trunk/gcc/alias.c:1726 1726 if (GET_CODE (x) != VALUE) (gdb) up #1 0x400000000036eb50 in true_dependence (mem=0x20000000005ddf68, mem_mode=SImode, x=0x20000000005ddfb0, varies=0x2000000000496720) at ../../trunk/gcc/alias.c:2367 2367 x_addr = get_addr (x_addr); (gdb) down #0 get_addr (x=0x6000000000316e28) at ../../trunk/gcc/alias.c:1726 1726 if (GET_CODE (x) != VALUE) (gdb) next 1728 v = CSELIB_VAL_PTR (x); (gdb) 1729 if (v) (gdb) 1731 for (l = v->locs; l; l = l->next) (gdb) p v->locs $72 = (struct elt_loc_list *) 0x0 (gdb) p debug_rtx(x) (value:DI 10:10 @0x6000000000316e28/0x6000000000316d10) $73 = void (gdb) So get_addr just returns the VALUE. Is it *ever* OK for get_addr to return a VALUE rtx? It seems to me this should never happen, and we should assert that. Since this bug only triggers if cselib is used, the bug affects schedule_ebbs only. The other schedulers are !use_cselib schedulers. (Even sel-sched apparently does not use cselib, that's surprising!) OTOH, this bug can probably be triggered with -fsched2-use-superblocks on any target. (In reply to comment #20) > (Even sel-sched apparently does not use cselib, that's surprising!) Offtopic: yes, using cselib in sel-sched is not quite straightforward, since we need it to work on arbitrary regions (as I understand cselib is designed to work on EBBs). (In reply to comment #19) > x_addr is a VALUE that has no locs: That happens because it's an autoincrement, and cselib_subst_to_values just creates an empty value. It seems to me that we simply need to add a VALUE case to rtx_varies_p. Can you test that? So the scheduler uses cselib to get a better view of the address, but cselib doesn't actually give a better address. And the solution is to just give up in that case? It seems to me that if cselib doesn't give a better address than XEXP(MEM,0) then the original address should be used. It should be possible to do better in cselib_subst_to_values - for POST_* we could look up the value of the inner expression, and for PRE_* we could probably construct a PLUS of some kind. That would be an enhancement, but it's unrelated to the bug. Alex Oliva posted some patches to make cselib handle autoinc stuff. No idea whether http://gcc.gnu.org/ml/gcc-patches/2010-03/msg01038.html is the latest version or if he has a newer one. (In reply to comment #25) > Alex Oliva posted some patches to make cselib handle autoinc stuff. > No idea whether http://gcc.gnu.org/ml/gcc-patches/2010-03/msg01038.html > is the latest version or if he has a newer one. Doesn't look bad, but I dislike the name change to _addr variants for functions that look up not just addresses. I think we should just fix up all the callers to pass in a mode. I'm also not sure we need a global cselib_record_autoinc flag - why not just do that always? Shouldn't we do something else when hashing PRE_MODIFY? > Shouldn't we do something else when hashing PRE_MODIFY?
I don't know, what else do you have in mind?
I'm posting an updated patch that implements your other suggestions momentarily.
Created attachment 22957 [details]
Updated patch, still awaiting review
The patch to fix this bug, posted on Sept 21, is still awaiting review :-(
This updated version fixes just a whitespace difference that caused it to fail to apply.
On Thu, 13 Jan 2011, aoliva at gcc dot gnu.org wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43494
>
> --- Comment #28 from Alexandre Oliva <aoliva at gcc dot gnu.org> 2011-01-13 10:31:21 UTC ---
> Created attachment 22957 [details]
> --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22957
> Updated patch, still awaiting review
>
> The patch to fix this bug, posted on Sept 21, is still awaiting review :-(
>
> This updated version fixes just a whitespace difference that caused it to fail
> to apply.
I suppose pinging it on the ML is better than pinging it here.
I'll review Alexandre's patch later today. Author: aoliva Date: Thu Feb 3 06:04:04 2011 New Revision: 169782 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169782 Log: PR debug/43092 PR rtl-optimization/43494 * rtl.h (for_each_inc_dec_fn): New type. (for_each_inc_dec): Declare. * rtlanal.c (struct for_each_inc_dec_ops): New type. (for_each_inc_dec_find_inc_dec): New fn. (for_each_inc_dec_find_mem): New fn. (for_each_inc_dec): New fn. * dse.c (struct insn_size): Remove. (replace_inc_dec, replace_inc_dec_mem): Remove. (emit_inc_dec_insn_before): New fn. (check_for_inc_dec): Use it, along with for_each_inc_dec. (canon_address): Pass mem modes to cselib_lookup. * cselib.h (cselib_lookup): Add memmode argument. Adjust callers. (cselib_lookup_from_insn): Likewise. (cselib_subst_to_values): Likewise. * cselib.c (find_slot_memmode): New var. (cselib_find_slot): New fn. Use it instead of htab_find_slot_with_hash everywhere. (entry_and_rtx_equal_p): Use find_slot_memmode. (autoinc_split): New fn. (rtx_equal_for_cselib_p): Rename and implement in terms of... (rtx_equal_for_cselib_1): ... this. Take memmode, pass it on. Deal with autoinc. Special-case recursion into MEMs. (cselib_hash_rtx): Likewise. (cselib_lookup_mem): Infer pmode from address mode. Distinguish address and MEM modes. (cselib_subst_to_values): Add memmode, pass it on. Deal with autoinc. (cselib_lookup): Add memmode argument, pass it on. (cselib_lookup_from_insn): Add memmode. (cselib_invalidate_rtx): Discard obsolete push_operand handling. (struct cselib_record_autoinc_data): New. (cselib_record_autoinc_cb): New fn. (cselib_record_sets): Use it, along with for_each_inc_dec. Pass MEM mode to cselib_lookup. Reset autoinced REGs here instead of... (cselib_process_insn): ... here. * var-tracking.c (replace_expr_with_values, use_type): Pass MEM mode to cselib_lookup. (add_uses): Likewise, also to cselib_subst_to_values. (add_stores): Likewise. * sched-deps.c (add_insn_mem_dependence): Pass mode to cselib_subst_to_values. (sched_analyze_1, sched_analyze_2): Likewise. Adjusted. * gcse.c (do_local_cprop): Adjusted. * postreload.c (reload_cse_simplify_set): Adjusted. (reload_cse_simplify_operands): Adjusted. * sel-sched-dump (debug_mem_addr_value): Pass mode. Modified: trunk/gcc/ChangeLog trunk/gcc/cselib.c trunk/gcc/cselib.h trunk/gcc/dse.c trunk/gcc/gcse.c trunk/gcc/postreload.c trunk/gcc/rtl.h trunk/gcc/rtlanal.c trunk/gcc/sched-deps.c trunk/gcc/sel-sched-dump.c trunk/gcc/var-tracking.c Fixed I am reopening this bug because I am getting a failure of gcc.dg/torture/vector-2.c that started when this patch was checked in. I think this is the same test that used to be gcc.c-torture/execute/vector-2.c. I have a cut down test case that fails (prints 'one') when compiled with -O1 -fno-guess-branch-probability starting at r169782. The -fno-guess-branch-probability is needed to generate code that uses auto-inc and trigger the bug. This fails on IA64 Linux and HP-UX. It does not require the -fpic flag. Let me know if I should open a new bug instead of re-opening this one. extern int printf(const char *, ...); extern void exit(int); struct __attribute__((packed)) B { unsigned short j : 1, k : 11; }; struct B sB; void testB (void) { int i; unsigned int mask, v, a, r; struct B x; sB.k = 750; r = 750; x.j = sB.j = 0; printf("%d %d\n", (int) x.j, (int) sB.j); printf("%d %d\n", (int) sB.k, (int) r); if (x.j != sB.j || sB.k != r) printf ("one\n"); } int main (void) { testB (); exit (0); } Maybe you're hitting PR47614? I think it is PR47614, I tried the first patch in that defect report and it fixed my test case. . |