seen when build scipy on s390x with the current GCC 8 branch. According to the Debian report [1] this is tracked down to miscompilation of one Fortran file with -funroll-loops. The Ubuntu report [2] has a "standalone" python test case. [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=915738 [2] https://bugs.launchpad.net/ubuntu/+source/python-scipy/+bug/1811798
I guess we need something better than a python testcase.
Yeah, like that problematic source, all gfortran options used to compile that, any needed modules too + stubbed whatever it calls and whatever is needed in MAIN__ or main to reproduce, ideally with minimal dependencies. If you know the exact problematic routine, see in the debugger how many times it is called and in which invocation of the routine it misbehaves and try to capture on which arguments it is called. If you don't know the exact problematic routine, one can e.g. play with assembly bisection between -funroll-loops and no -funroll-loops, rename .L* labels in one of them so that it can be merged by hand more easily. I think gfortran doesn't have optimize (0) attribute yet.
I've tried building scipy 1.1.0 from github on a Fedora installation. The build already uses -funroll-loops. But I couldn't reproduce the problem with the resulting binary. gcc version 8.0.1 20180324 Aurelien already tracked it down to a miscompilation of scipy/optimize/minpack/qrsolv.f This source file appears to contain just a single function (qrsolv) which is not too big. I think I can work with that after being able to reproduce the problem. As Jakub mentioned the exact compiler cmdline would be good.
I'm able to reproduce the problem now and will try to have a look.
Created attachment 45586 [details] qrsolv-reduc.f the miscompiled fortran file autoreduced from scipy
Created attachment 45587 [details] A C wrapper to call the qrsolv function in the fortran snippet
gfortran -O3 -march=zEC12 -funroll-loops -fpie qrsolv-reduc.f -c gcc qrsolv-caller.c -c gcc qrsolv-caller.o qrsolv-reduc.o -o t r265191 ./t 1.359429 r265193 ./t 0.000000
The r265193 patch was found via reghunt. However, it just reveals an underlying issue. The problem can also be seen with mainline. The miscompile happens in the following loop: do 110 j = 1, n if (sdiag(j) .eq. zero .and. nsing .eq. n) nsing = j - 1 if (nsing .lt. n) wa(j) = zero 110 continue The problem appears to be rather related to ifcvt. ifcvt generates a load on condition for the sdiag(j) .eq. zero comparison by inserting insns: 2480, 2481, 2482: 265.ce2 (insn 915 918 916 88 (set (reg:DF 590 [ MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ]) (mem:DF (reg/v/f:DI 239 [ sdiag ]) [2 MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B]+0 S8 A64])) "min.qrsolv.f":51 1289 {*movdf_64dfp} (nil)) (insn 916 915 2480 88 (set (reg:CCZ 33 %cc) (compare:CCZ (reg:DF 590 [ MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ]) (const_double:DF 0.0 [0x0.0p+0]))) "min.qrsolv.f":51 1255 {*cmpdf_ccs} (expr_list:REG_DEAD (reg:DF 590 [ MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ]) (nil))) (insn 2480 916 2481 88 (set (reg:SI 733) (const_int 0 [0])) 1274 {*movsi_zarch} (nil)) (insn 2481 2480 2482 88 (set (reg:CCZ 33 %cc) (compare:CCZ (reg:DF 590 [ MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ]) (const_double:DF 0.0 [0x0.0p+0]))) 1255 {*cmpdf_ccs} (nil)) (insn 2482 2481 927 88 (set (reg/v:SI 109 [ nsing ]) (if_then_else:SI (ne (reg:CCZ 33 %cc) (const_int 0 [0])) (reg/v:SI 109 [ nsing ]) (reg:SI 733))) 1676 {*movsicc} (nil)) (note 927 2482 928 88 NOTE_INSN_DELETED) (jump_insn 928 927 932 88 (parallel [ (set (pc) (if_then_else (le (reg:SI 320 [ _444 ]) (reg/v:SI 109 [ nsing ])) (label_ref:DI 943) (pc))) (clobber (reg:CC 33 %cc)) ]) "min.qrsolv.f":52 1260 {*cmp_and_br_signed_si} (expr_list:REG_UNUSED (reg:CC 33 %cc) (int_list:REG_BR_PROB 536870916 (nil))) -> 943) In the backend we have that interesting splitter which triggers for the old and now obsolete compare in insn 916 (define_split [(set (match_operand 0 "cc_reg_operand") (compare (match_operand:FP 1 "register_operand") (match_operand:FP 2 "const0_operand")))] "TARGET_HARD_FLOAT && REG_P (operands[1]) && dead_or_set_p (insn, operands[1])" [(parallel [(set (match_dup 0) (match_dup 3)) (clobber (match_dup 1))])] { /* s390_match_ccmode requires the compare to have the same CC mode as the CC destination register. */ operands[3] = gen_rtx_COMPARE (GET_MODE (operands[0]), operands[1], operands[2]); }) 268.split1 insn 916 -> insn 2677 The REG_DEAD note becomes a clobber due to that (insn 915 918 2677 105 (set (reg:DF 590 [ MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ]) (mem:DF (reg/v/f:DI 239 [ sdiag ]) [2 MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B]+0 S8 A64])) "min.qrsolv.f":51 1289 {*movdf_64dfp} (nil)) (insn 2677 915 2480 105 (parallel [ (set (reg:CCZ 33 %cc) (compare:CCZ (reg:DF 590 [ MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ]) (const_double:DF 0.0 [0x0.0p+0]))) (clobber (reg:DF 590 [ MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ])) ]) "min.qrsolv.f":51 -1 (nil)) (insn 2480 2677 2481 105 (set (reg:SI 733) (const_int 0 [0])) 1274 {*movsi_zarch} (nil)) (insn 2481 2480 2482 105 (set (reg:CCZ 33 %cc) (compare:CCZ (reg:DF 590 [ MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ]) (const_double:DF 0.0 [0x0.0p+0]))) 1255 {*cmpdf_ccs} (nil)) 294.cprop_hardreg appears to mess up things: REG_DEAD note in insn 2677 does not appear to fit the used reg but CC is unused now (insn 3107 927 915 121 (set (reg:DI 3 %r3 [1060]) (mem/f/c:DI (plus:DI (reg/f:DI 15 %r15) (const_int 520 [0x208])) [3 sdiag+0 S8 A64])) "min.qrsolv.f":51 1270 {*movdi_64} (nil)) (insn 915 3107 2677 121 (set (reg:DF 19 %f6 [orig:590 MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ] [590]) (mem:DF (reg:DI 3 %r3 [1060]) [2 MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B]+0 S8 A64])) "min.qrsolv.f":51 1289 {*movdf_64dfp} (expr_list:REG_DEAD (reg:DI 3 %r3 [1060]) (nil))) (insn 2677 915 3108 121 (parallel [ (set (reg:CCZ 33 %cc) (compare:CCZ (reg:DF 19 %f6 [orig:590 MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ] [590]) (const_double:DF 0.0 [0x0.0p+0]))) (clobber (reg:DF 19 %f6 [orig:590 MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ] [590])) ]) "min.qrsolv.f":51 1250 {*cmpdf_ccs_0} (expr_list:REG_DEAD (reg:DF 16 %f0 [orig:590 MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ] [590]) (expr_list:REG_UNUSED (reg:CCZ 33 %cc) (nil)))) (insn 3108 2677 2481 121 (set (reg:DF 20 %f1 [1061]) (const_double:DF 0.0 [0x0.0p+0])) 1289 {*movdf_64dfp} (nil)) (insn 2481 3108 2480 121 (set (reg:CCZ 33 %cc) (compare:CCZ (reg:DF 16 %f0 [orig:590 MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ] [590]) (reg:DF 20 %f1 [1061]))) 1255 {*cmpdf_ccs} (expr_list:REG_DEAD (reg:DF 16 %f0 [orig:590 MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ] [590]) (nil))) 295.rtl_dce consequently removes insn 3107, 915, 2677 (insn 3108 927 2481 121 (set (reg:DF 20 %f1 [1061]) (const_double:DF 0.0 [0x0.0p+0])) 1289 {*movdf_64dfp} (nil)) (insn 2481 3108 2480 121 (set (reg:CCZ 33 %cc) (compare:CCZ (reg:DF 16 %f0 [orig:590 MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ] [590]) (reg:DF 20 %f1 [1061]))) 1255 {*cmpdf_ccs} (expr_list:REG_DEAD (reg:DF 16 %f0 [orig:590 MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ] [590]) (nil))) (insn 2480 2481 2482 121 (set (reg:SI 0 %r0 [733]) (const_int 0 [0])) 1274 {*movsi_zarch} (expr_list:REG_EQUIV (const_int 0 [0]) (nil))) (insn 2482 2480 928 121 (set (reg/v:SI 2 %r2 [orig:109 nsing ] [109]) (if_then_else:SI (ne (reg:CCZ 33 %cc) (const_int 0 [0])) (reg/v:SI 2 %r2 [orig:109 nsing ] [109]) (reg:SI 0 %r0 [733]))) 1676 {*movsicc} (expr_list:REG_DEAD (reg:CCZ 33 %cc) (expr_list:REG_DEAD (reg:SI 3 %r3 [733]) (nil)))) To my understanding the data flow is already broken after ifcvt inserted the new compare while leaving the REG_DEAD note in the old. The new compare already appears to read a dead register here and this appears to trigger the rest of the unfortunate events.
Created attachment 45588 [details] experimental patch That patch appears to fix the problem for me.
I admit I have just a vague recollection of this, but I thought since df has been added, usually if a pass wants REG_DEAD notes, it needs to df_note_add_problem () and df_analyze should rebuild the REG_DEAD/REG_UNUSED notes.
So, to me this looks like a backend bug, using dead_or_set_p in a splitter when the split passes don't really compute the note problem. Seems s390 is the only backend that does this, other backends use dead_or_set_p either only in peephole2s (which is fine, peephole2 pass starts with df_set_flags (DF_LR_RUN_DCE); df_note_add_problem (); df_analyze (); and even when many targets don't use df_or_set_p, they do use peep2_dead*), or (cris) in delayed branch scheduling (I believe that doesn't guarantee that either). Can't what you are doing in the splitters be done in define_peephole2 instead?
Segher on IRC says that removing REG_DEAD notes that aren't valid is the right thing, so paging others what they think.
> Segher on IRC says that removing REG_DEAD notes that aren't valid is the > right thing, so paging others what they think. Definitely not, passes are not required to maintain REG_DEAD/REG_UNUSED notes, it's the exclusive job of DF and we're better not open this can of worms.
(In reply to Jakub Jelinek from comment #11) > ... Can't what you are doing in the splitters be done in > define_peephole2 instead? Not that easy unfortunately. peephole2 will run after reload. So the FP constant ok 0.0 will already be reloaded into a register first or pushed into literal pool. The point of doing the transformation is to avoid this.
(In reply to Andreas Krebbel from comment #14) > (In reply to Jakub Jelinek from comment #11) > > ... Can't what you are doing in the splitters be done in > > define_peephole2 instead? > > Not that easy unfortunately. peephole2 will run after reload. So the FP > constant ok 0.0 will already be reloaded into a register first or pushed > into literal pool. The point of doing the transformation is to avoid this. But it is still invalid. If those splitters are essentially to you and worth slowing the compiler on s390*, then you could e.g. add a custom target pass right before split1, where you'd just df_note_addr_problem (); df_analyze (); and thus ensured that during the splitting you could use the REG_DEAD/REG_UNUSED notes safely. In the splitters you'd likely need to use current_pass to verify you are in the pass for which you've done this.
I'll commit a patch which just removes the splitter for now. I'll try to come up with a nicer testcase.
(In reply to Andreas Krebbel from comment #16) > I'll commit a patch which just removes the splitter for now. I'll try to > come up with a nicer testcase. All 3 s390 splitters that do this?
Author: krebbel Date: Tue Feb 5 17:14:11 2019 New Revision: 268550 URL: https://gcc.gnu.org/viewcvs?rev=268550&root=gcc&view=rev Log: S/390: Remove load and test fp splitter gcc/ChangeLog: 2019-02-05 Andreas Krebbel <krebbel@linux.ibm.com> PR target/88856 * config/s390/s390.md: Remove load and test FP splitter. Modified: trunk/gcc/ChangeLog trunk/gcc/config/s390/s390.md
Author: krebbel Date: Tue Feb 5 17:17:00 2019 New Revision: 268551 URL: https://gcc.gnu.org/viewcvs?rev=268551&root=gcc&view=rev Log: S/390: Remove load and test fp splitter gcc/ChangeLog: 2019-02-05 Andreas Krebbel <krebbel@linux.ibm.com> Backport from mainline 2019-02-05 Andreas Krebbel <krebbel@linux.ibm.com> PR target/88856 * config/s390/s390.md: Remove load and test FP splitter. Modified: branches/gcc-8-branch/gcc/ChangeLog branches/gcc-8-branch/gcc/config/s390/s390.md
Author: krebbel Date: Tue Feb 5 17:19:26 2019 New Revision: 268552 URL: https://gcc.gnu.org/viewcvs?rev=268552&root=gcc&view=rev Log: S/390: Remove load and test fp splitter gcc/ChangeLog: 2019-02-05 Andreas Krebbel <krebbel@linux.ibm.com> Backport from mainline 2019-02-05 Andreas Krebbel <krebbel@linux.ibm.com> PR target/88856 * config/s390/s390.md: Remove load and test FP splitter. Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/config/s390/s390.md
(In reply to Jakub Jelinek from comment #17) > (In reply to Andreas Krebbel from comment #16) > > I'll commit a patch which just removes the splitter for now. I'll try to > > come up with a nicer testcase. > > All 3 s390 splitters that do this? I've only removed the load and test splitter for now. The other two are only used for access register setters. There is only that one user in Glibc and we have it that way since the very beginning. I will revisit these for GCC >9 but would rather leave them in for now.
What we could do there is remove the first of those two splitters, remove the && !dead_or_set_p (insn, operands[1]) test from the second, and add peephole2 that would transform (set (access part 1) (subreg:SI (match_dup 1) low)) (set (match_dup 1) (rotate:DI (match_dup 1) (const_int 32))) (set (access part 2) (subreg:SI (match_dup 1) low)) with a lshiftrt instead of rotate if reg 1 is dead after the third insn (assuming rotate is more expensive as right shift, if it is the same expensive/same size, then having the two splitters makes no sense). The last rotate should have been removed by DCE already if it was truly dead (though, of course, if it for some reason isn't yet, you could have another peephole2 for that too).
That said, the regression is fixed now.