Hi, [This bug was initially submitted to the Debian BTS at http://bugs.debian.org/466948 -- at the request of Debian's gcc maintainer, I am also sending it here] There is apparently a miscompilation of CERNLIB code in the "kernlib" library on ia64 with gfortran-4.3 at optimization level -O2 or higher. (Oddly, compiling with -O2 -funroll-loops does not expose the problem, which is why I didn't notice it earlier.) I've verified that this problem occurs on ia64 with gfortran-4.3 versions 4.3-20080202-1, 4.3-20080219-1 and 4.3.0. For the most recent trial: (sid)kmccarty@merulo:~$ gfortran-4.3 -v Using built-in specs. Target: ia64-linux-gnu Configured with: ../src/configure linux gnu Thread model: posix gcc version 4.3.1 20080309 (prerelease) (Debian 4.3.0-1) Please see the complete test case I've provided at http://people.debian.org/~kmccarty/tlsc-test-case.tar.gz [Note, the files libkernlib.a and libkerngent.a in the tarball were compiled on ia64 from CERNLIB source code, which is GPL; it is available from the Debian source package of "cernlib" version 2006.dfsg.2-10] [Note 2, I have updated this test case tarball to the gfortran version noted above since I originally submitted this bug to the Debian BTS.] The file tlsc.F in the test case is where the optimization bug happens. I saved files tlsc.f, tlsc.s and tlsc.o into subdirectories of the test case for three different combinations of compile flags: -O2 (causes miscompiled code) in obj-fail subdir -O0 (seems fine) -O2 -funroll-loops (seems fine) both in obj-success subdir. Also the output of the test case, saved as output.txt in each subdir. You can regenerate all of these files with these three combinations of compiler flags quickly with "make output". The presence or absence of -fno-automatic on the gfortran command line (I omitted it in all three cases) appears to make no difference. best regards, Kevin McCarty
Does it work with gcc 4.2?
(In reply to comment #1) > Does it work with gcc 4.2? Yes, it does, at least with this version: (sid)kmccarty@merulo:~$ gfortran-4.2 -v Using built-in specs. Target: ia64-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --disable-libmudflap --disable-libssp --with-system-libunwind --enable-checking=release --build=ia64-linux-gnu --host=ia64-linux-gnu --target=ia64-linux-gnu Thread model: posix gcc version 4.2.3 (Debian 4.2.3-2) The test case works in all cases with gfortran 4.2, regardless of whether I use libkernlib.a and libkerngent.a compiled with gfortran 4.2 or 4.3. (But please let me know if you need copies of these libraries built with gfortran 4.2 anyway.) I also just noticed, surprisingly, that with gfortran 4.3, if I compile the test program with -O3 or with -O3 -funroll-loops, the test succeeds. So oddly enough it is ONLY with -O2 (without -funroll-loops) that the test fails. (Again, this behavior is independent of which gfortran version I use to build libkernlib.a and libkerngent.a.) Please also let me know if you need the intermediate .f and .o files built with gfortran-4.3 -O3.
Possibly a scheduling issue.
I wonder if this is related to PR target/35695, the floating point division bug that Jim Wilson fixed. Could you try it with ToT or with the latest 4.3 branch, both of which have Jim's fix in them.
(In reply to comment #4) > I wonder if this is related to PR target/35695, the floating point division bug > that Jim Wilson fixed. Could you try it with ToT or with the latest 4.3 > branch, both of which have Jim's fix in them. I tried with the gcc-4_3-branch from Subversion today, but it doesn't change things: plain -O2 fails but both -O2 -funroll-loops and -O1 succeed on the test case. For the record: (sid)kmccarty@merulo:~$ ~/gcc-4.3-branch/bin/gfortran -v Using built-in specs. Target: ia64-unknown-linux-gnu Configured with: ./configure --enable-languages=c,fortran --prefix=/home/kmccarty/gcc-4.3-branch --enable-shared --with-mpfr=/home/kmccarty/gcc-4.3-branch Thread model: posix gcc version 4.3.1 20080521 (prerelease) (GCC)
Mark, you made this P5, but ia64-linux is a secondary platform. P3 again to get it on the radar. But not higher priority because we don't have exactly what I would call a testcase.
I agree on both points: (1) I should not have marked this P5, and (2) we do not yet have a test case. Marked as "WAITING".
I looked at this bug and I can reproduce it using the precompiled archives from the link. I have not tried to get the CERN sources to create a small 'real' test case. I noticed that the bug does not happen on ToT and found that the test started working correctly with version 133081. The patch submitted in this version is for PR 34677 which claims to be a missed optimization issue as opposed to a codegen bug fix so I am not sure if the bug is really fixed by that change or if that change just masks the problem like -funroll-loops does. 2008-03-10 Richard Guenther <rguenther@suse.de> PR tree-optimization/34677 * tree-ssa-pre.c (modify_expr_node_pool): Remove. (poolify_tree): Likewise. (modify_expr_template): Likewise. (poolify_modify_stmt): Likewise. (insert_fake_stores): Handle all component-ref style stores in addition to INDIRECT_REF. Also handle complex types. Do not poolify the inserted load. (realify_fake_stores): Do not rebuild the tree but only make it a SSA_NAME copy. (init_pre): Remove initialzation of modify_expr_template. Do not allocate modify_expr_node_pool. (fini_pre): Do not free modify_expr_node_pool.
The only code-generation changing part of the patch is (insert_fake_stores): Handle all component-ref style stores in addition to INDIRECT_REF. Also handle complex types. but that only fixes a missed-optimization, so it probably makes the problem latent on the trunk.
4.3.1 is being released, adjusting target milestone.
PROGRAM PR35659 DIMENSION A(1000), B(1010), AUX(8), IPIV(8), X(16) COMMON /TLSDIM/ M1,M,N,L,IER COMMON /SLATE/ V1,V2,IAR(24),DUM(14) DATA A/0, 1, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0, 1, 0.200000003, 1 0.0399999991, 0.00800000038, 1, 0.400000006, 0.159999996, 2 0.064000003, 1, 0.600000024, 0.360000014, 0.216000006, 1, 3 0.800000012, 0.639999986, 0.512000024, 1, 1, 1, 1, 968*0./ DATA B/1, 2, 1, 1.22140002, 1.49179995, 1.82210004, 4 2.22550011, 2.7183001, 0, 0, 1000*0./ M1 = 2 M = 8 N = 4 L = 1 IER = 0 V1 = 0 V2 = 1.40129846e-45 IAR(:) = 538976288 DUM(:) = 1.35631564e-19 CALL TLSC(A,B,AUX,IPIV,1.,X) IF (ABS(X(1) - 0.99785352).GE.0.1) CALL ABORT IF (ABS(X(2) - 1.0).GE.0.1) CALL ABORT IF (ABS(X(3) - 0.50107324).GE.0.1) CALL ABORT IF (ABS(X(4) - 0.21670136).GE.0.1) CALL ABORT END SUBROUTINE TLSMSQ (B,L,M,F) COMMON /SLATE/ DUM(38),I,JB DIMENSION B(*) F = 0. JB = 1 DO 10 I=1,M F = F + B(JB)*B(JB) 10 JB = JB + L RETURN END SUBROUTINE TLSWOP (A,AD,N,NR) COMMON /SLATE/ DUM(37),H,I,JA DIMENSION A(*), AD(*) JA = 1 DO 10 I=1,NR H = A(JA) A(JA) = AD(JA) AD(JA) = H 10 JA = JA + N RETURN END SUBROUTINE TLUK (A,IASEP,NR,SIG,BETA) COMMON /SLATE/ DUM(37),I,JA,LL DIMENSION A(*) SIG= 0. JA = 1 LL = 0 DO 10 I=1,NR IF (A(JA).EQ.0.) GO TO 10 LL = I SIG= SIG + A(JA)* A(JA) 10 JA = JA + IASEP NR = LL IF (NR.EQ.0) RETURN SIG = SIGN (SQRT (SIG),A(1)) BETA = A(1) + SIG A(1) = BETA BETA = 1. / (SIG * BETA) RETURN END SUBROUTINE TLSTEP (A,B,IASEP,IBSEP,NR,NC,BETA) COMMON /SLATE/ DUM(34),H,I,IB,J,JA,JB DIMENSION A(*), B(*) IB = 0 DO 30 J=1,NC IB = IB + 1 H = 0. JA = 1 JB = IB DO 10 I=1,NR H = H + A(JA) * B(JB) JA = JA +IASEP 10 JB = JB + IBSEP H = H * BETA JA = 1 JB = IB DO 20 I=1,NR B(JB) = B(JB) - A(JA) * H JA = JA +IASEP 20 JB = JB + IBSEP 30 CONTINUE RETURN END together with tlsc.f can work as a testcase, and reproduces the problem with current 4.3 branch.
Even smaller reproducer: PROGRAM PR35659 DIMENSION A(1000), B(1010), AUX(8), IPIV(8), X(16) COMMON /TLSDIM/ M1,M,N,L,IER COMMON /SLATE/ V1,V2,IAR(24),DUM(14) DATA A/0, 1, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0, 1, 0.200000003, 1 0.0399999991, 0.00800000038, 1, 0.400000006, 0.159999996, 2 0.064000003, 1, 0.600000024, 0.360000014, 0.216000006, 1, 3 0.800000012, 0.639999986, 0.512000024, 1, 1, 1, 1, 968*0./ DATA B/1, 2, 1, 1.22140002, 1.49179995, 1.82210004, 4 2.22550011, 2.7183001, 0, 0, 1000*0./ M1 = 2 M = 8 N = 4 L = 1 IER = 0 V1 = 0 V2 = 1.40129846e-45 IAR(:) = 538976288 DUM(:) = 1.35631564e-19 CALL TLSC(A,B,AUX,IPIV,1.,X) END SUBROUTINE TLSMSQ (B,L,M,F) DIMENSION B(*) IF (M.NE.2) CALL ABORT STOP 0 END SUBROUTINE TLSWOP (A,AD,N,NR) DIMENSION A(*), AD(*) CALL ABORT END SUBROUTINE TLUK (A,IASEP,NR,SIG,BETA) DIMENSION A(*) CALL ABORT END SUBROUTINE TLSTEP (A,B,IASEP,IBSEP,NR,NC,BETA) DIMENSION A(*), B(*) CALL ABORT END The miscompiled TLSC calls the first TLSMSQ routine with 8.0 rather than 2.0 as the 3rd argument.
And the miscompiled tlsc.f inline (compile with just -O2): SUBROUTINE TLSC (A,B,AUX,IPIV,EPS,X) COMMON /TLSDIM/ M1,M,N,L,IER COMMON /SLATE/ BETA,H,I,IB,IB1,ID,ID1,IEND,II,IST,J,JA,JB,JK + ,JST,K,KPIV,KR,KST,KT,K1,LV,MR,M11,NK,NR,PIV,PIVT + ,SIG,DUM(11) DIMENSION A(*), AUX(*), B(*), IPIV(*), X(*) IF (N.GT.M.OR.M1.GT.N) GO TO 90 K1 = MAX (N,L) IER = 1 DO 5 K=1,N 5 IPIV(K) = K IST = - N JB = 1 - L M11 = M1 + 1 MR = M1 DO 50 K=1,N IF (K.GT.M1) MR = M IST = IST + N + 1 JB = JB + L LV = MR - K + 1 PIV = 0. ID = IST - N DO 20 J=K,N IF (K.EQ.1 .OR. K.EQ.M11) GO TO 10 PIVT = AUX(J) - A(ID)*A(ID) GO TO 15 10 I = ID + N IF (LV .EQ. 1) GO TO 12 CALL TLSMSQ (A(I),N,LV,PIVT) GO TO 15 12 PIVT= A(I)*A(I) 15 AUX(J) = PIVT ID = ID + 1 IF (PIVT*EPS.LE.PIV) GO TO 20 PIV = PIVT KPIV = J 20 CONTINUE I = KPIV - K IF (I.LE.0) GO TO 25 H = AUX(K) AUX(K) = AUX(KPIV) AUX(KPIV) = H ID = IST + I NR = M - K + 1 CALL TLSWOP (A(IST),A(ID),N,NR) 25 CALL TLUK (A(IST),N,LV,SIG,BETA) IF (LV.EQ.0) GO TO 90 J = K1 + K AUX(J)=-SIG IF (K.GE.N) GO TO 30 NK = N - K IF (LV.EQ.1) GO TO 27 CALL TLSTEP (A(IST),A(IST+1),N,N,LV,NK,BETA) GO TO 30 27 DO 28 J=1,NK JST = IST + J 28 A(JST) = A(JST)*(1.-BETA*A(IST)**2) 30 IB = (K-1) * L + 1 IF (LV.EQ.1) GO TO 32 CALL TLSTEP (A(IST),B(IB),N,L,LV,L,BETA) GO TO 34 32 DO 33 J=1,L JST = IB + J - 1 33 B(JST) = B(JST)*(1.-BETA*A(IST)**2) 34 IPIV(KPIV) = IPIV(K) IPIV(K) = KPIV IF (K.GT.M1) GO TO 50 DO 45 I=M11,M ID1 = IST + (I-K)*N IF (A(ID1).EQ.0) GO TO 45 H = - A(ID1)/SIG A(ID1) = H ID1 = ID1 + 1 ID = IST + 1 DO 35 J=1,NK A(ID1) = A(ID1) - H*A(ID) ID1 = ID1 + 1 35 ID = ID + 1 IB1 = 1 + (I-1)*L IB = JB DO 40 J=1,L B(IB1) = B(IB1) - H*B(IB) IB1 = IB1 + 1 40 IB = IB + 1 45 CONTINUE 50 CONTINUE IER = N * IER KT = N JK = (N-1) * L K = K1 + N PIV = 1./AUX(K) DO 55 K=1,L JK = JK + 1 55 X(JK) = PIV * B(JK) KR = N - 1 IF (KR.LE.0) GO TO 70 JST = KR * (N+1) + 2 DO 65 J=1,KR JST = JST - N - 1 IEND= (KR-J+1) * N K = K1 + KR - J + 1 PIV = 1./AUX(K) KST = K-K1 ID = IPIV(KST)-KST KST = (KR-J) * L DO 65 K=1,L KST = KST + 1 H=B(KST) II = KST DO 60 I=JST,IEND II = II + L 60 H = H - A(I) * X(II) II = KST + ID *L X(KST) = X(II) X(II) = PIV * H 65 CONTINUE 70 IST = KT*L DO 80 J=1,L IST = IST + 1 H = 0. JA = IST IF (M.LE.KT) GO TO 80 NR = M - KT IF (NR.EQ.1) GO TO 75 CALL TLSMSQ (B(IST),L,NR,H) GO TO 80 75 H = B(IST)*B(IST) 80 AUX(J) = H RETURN 90 IER = -1001 RETURN END The problem is that slate.k (aka prephitmp.78) is read before it is stored, so it has the 0x20202020 value instead of 1. At tlsc.f.198r.compgotos the code still looks correct: (insn 1857 1434 1443 8 tlsc.f:16 (set (reg:SI 14 r14 [1392]) (const_int 1 [0x1])) 4 {*movsi_internal} (expr_list:REG_EQUIV (const_int 1 [0x1]) (nil))) (insn 1443 1857 242 8 tlsc.f:16 (set (mem/s/c:SI (post_modify:DI (reg/f:DI 53 r59 [1516]) (plus:DI (reg/f:DI 53 r59 [1516]) (const_int -60 [0xffffffffffffffc4]))) [2 slate.k+0 S4 A32]) (reg:SI 14 r14 [1392])) 4 {*movsi_internal} (expr_list:REG_INC (reg/f:DI 53 r59 [1516]) (expr_list:REG_EQUAL (const_int 1 [0x1]) (nil)))) ... (insn 197 228 247 8 tlsc.f:17 (set (reg:DI 18 r18) (zero_extend:DI (mem/s/c:SI (reg/f:DI 38 r44 [1517]) [2 slate.k+0 S4 A32]))) 103 {zero_extendsidi2} (expr_list:REG_EQUAL (mem/s/c:SI (const:DI (plus:DI (symbol_ref:DI ("slate_") <var_decl 0x2000000003c54c80 slate>) (const_int 60 [0x3c]))) [2 slate.k+0 S4 A32]) (nil))) (insn 247 197 198 8 tlsc.f:22 (set (reg:SI 19 r19 [665]) (minus:SI (reg:SI 20 r20 [orig:560 D.775 ] [560]) (reg:SI 32 r38 [orig:529 prephitmp.68 ] [529]))) 165 {subsi3} (nil)) (insn 198 247 253 8 tlsc.f:17 (set (reg:BI 262 p6 [626]) (le:BI (reg:SI 18 r18 [orig:522 prephitmp.78 ] [522]) (reg:SI 22 r22 [orig:459 prephitmp.258 ] [459]))) 298 {*cmpsi_normal} (nil)) but tlsc.f.200r.mach is already wrong: (insn:TI 197 2150 1443 8 tlsc.f:17 (set (reg:DI 18 r18) (zero_extend:DI (unspec:SI [ (mem/s/c:SI (reg/f:DI 38 r44 [1517]) [2 slate.k+0 S4 A32]) ] 40))) 17 {zero_extendsidi2_advanced} (expr_list:REG_EQUAL (mem/s/c:SI (const:DI (plus:DI (symbol_ref:DI ("slate_") <var_decl 0x2000000003c54c80 slate>) (const_int 60 [0x3c]))) [2 slate.k+0 S4 A32]) (nil))) (insn 1443 197 1427 8 tlsc.f:16 (set (mem/s/c:SI (post_modify:DI (reg/f:DI 53 r59 [1516]) (plus:DI (reg/f:DI 53 r59 [1516]) (const_int -60 [0xffffffffffffffc4]))) [2 slate.k+0 S4 A32]) (reg:SI 14 r14 [1392])) 4 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 14 r14 [1392]) (expr_list:REG_INC (reg/f:DI 53 r59 [1516]) (expr_list:REG_EQUAL (const_int 1 [0x1]) (nil))))) ... (insn:TI 198 2148 223 8 tlsc.f:17 (set (reg:BI 262 p6 [626]) (le:BI (reg:SI 18 r18 [orig:522 prephitmp.78 ] [522]) (reg:SI 22 r22 [orig:459 prephitmp.258 ] [459]))) 298 {*cmpsi_normal} (nil)) Note that the scheduler swapped the slate_.k = 1 store with prephitmp.78 = slate_.k read. The same can be seen in the assembly: ld4.a r18 = [r44] //, slate.k [.LBE19:] .loc 1 16 0 st4 [r59] = r14, -60 // slate.k, tmp1392 where r44 == r59. (gdb) disas $pc $pc+16 Dump of assembler code from 0x4000000000000ec0 to 0x4000000000000ed0: 0x4000000000000ec0 <tlsc+512>: [MMI] ld4.a r18=[r44] 0x4000000000000ec1 <tlsc+513>: st4 [r59]=r14,-60 0x4000000000000ec2 <tlsc+514>: adds r16=48,r41 End of assembler dump. (gdb) p/x $r44 $3 = 0x6000000000004e4c (gdb) p/x $r59 $4 = 0x6000000000004e4c insn 197 is the only ld4.a in tlsc_ routine.
I have no idea why is speculation even attempted here (it doesn't make any sense, the pointer is surely non-NULL, it points to a global variable), and apparently nothing checks whether it is safe to move over the speculative load over the store (at least, I've put a breakpoint on nonoverlapping_memrefs_p and {{,canon_}true,anti,output}_dependence and none of them hit with any MEMs with r44 or POST_MODIFY r59, -60. Maxim, speculation is your baby, could you please have a look?
Wrong-code bug on secondary arch.
I can't reproduce the error with today mainline. When I put in one file 'PROGRAM PR35659' and 'SUBROUTINE TLSC (A,B,AUX,IPIV,EPS,X)' and compile it with any optimization level I get the same "STOP 0" message. Am I doing something wrong? I can't spot the problem in the dumps you posted at debian.org. tlsc.s has a single example of data speculation which seems to be fine. Scheduler speculatively moves ld4.a r18 = [r44] before st4 [r59] = r14, -60 and then also speculates several uses of r18: cmp4.ge p6, p7 = r22, r18 ... (p7) ld4 r14 = [r62] ... (p7) st4 [r77] = r14 then it checks the speculation: chk.a.clr r18, .L69 and recovers if speculation failed: .L69: .mmi ld4 r18 = [r44] ;; cmp4.ge p6, p7 = r22, r18 nop 0 ;; .mmi nop 0 (p7) ld4 r14 = [r62] nop 0 ;; .mib (p7) st4 [r77] = r14 nop 0 br .L70 Anyway, can you help me reproduce the issue, so I can take a closer look?
Subject: Re: [4.3/4.4 Regression] Miscompiled code with -O2 (but not with -O2 -funroll-loops) on ia64 mkuvyrkov at gcc dot gnu dot org writes: > Anyway, can you help me reproduce the issue, so I can take a closer look? please email me a ssh key, if access to a Debian machine would help.
You can reproduce it with a cross compiler too. Just use 4.3 branch (IMHO unrelated changes on the trunk made this bug latent) and don't combine everything into the same file. The #c12 program+routines have to be in one file, #c13 in a different one. The latter is miscompiled, if you are on ia64, you can compile the former with any optimization options, the latter with -O2, link, run. If you have just a cross compiler, just compile the #c13 source and inspect the assembly (look at ld4.a: addl r41 = @ltoff(slate_#), r1 ... adds r14 = 60, r41 ... .mmi mov r44 = r14 mov r50 = r15 mov r59 = r14 ... ! Instructions that don't modify r44 nor r59, no labels .mmi ld4.a r18 = [r44] ! *r44 aka slate_.k is uninitialized here, 0x20202020 st4 [r59] = r14, -60 ! This stores 1 into slate_.k adds r16 = 48, r41 and look at the *.compgotos dump that right before scheduling that slate_.k = 1 preceeded the prephitmp.78 (== r18) = slate_.k load, while after scheduling it is the other way around. The alias set looks correct for both MEMs (4) and MEM_EXPR/MEM_OFFSET etc. too, so IMHO if alias.c was asked if the two MEMs can overlap, it would certainly say so.
To be more precise, the problem is in speculating conditional store: ld4.a r18 = [r44] st4 [r59] = r14, -60 ... cmp4.ge p6, p7 = r22, r18 (p7) ld4 r14 = [r62] ... (p7) st4 [r77] = r14 chk.a.clr r18, .L69 .L70: ... .L69: ld4 r18 = [r44] ;; cmp4.ge p6, p7 = r22, r18 ;; (p7) ld4 r14 = [r62] ;; (p7) st4 [r77] = r14 br .L70 Now, ld4.a returns the stale value in r18 (0x20202020), $r22 is 2 and so in the code before chk.a.clr p7 is 1, so [r62] is loaded into r14 and stored into [r77] (i.e. MR = M in IF (K.GT.M1) MR = M is executed). Then chk.a.clr realizes the [r44] memory changed, so branches to .L69 to do the speculated stuff again. This time r18 is 1, so p7 will be 0 and MR = M is not done. But this really doesn't and can't undo the st4 [r77] = r14 store that already happened before chk.a.clr and wasn't supposed to happen.
Subject: Re: [4.3/4.4 Regression] Miscompiled code with -O2 (but not with -O2 -funroll-loops) on ia64 jakub at gcc dot gnu dot org wrote: > ------- Comment #19 from jakub at gcc dot gnu dot org 2008-06-26 08:41 ------- > To be more precise, the problem is in speculating conditional store: > ld4.a r18 = [r44] > st4 [r59] = r14, -60 > ... > cmp4.ge p6, p7 = r22, r18 > (p7) ld4 r14 = [r62] > ... > (p7) st4 [r77] = r14 > chk.a.clr r18, .L69 > .L70: > ... > .L69: > ld4 r18 = [r44] > ;; > cmp4.ge p6, p7 = r22, r18 > ;; > (p7) ld4 r14 = [r62] > ;; > (p7) st4 [r77] = r14 > br .L70 > > Now, ld4.a returns the stale value in r18 (0x20202020), $r22 is 2 and so in the > code before chk.a.clr p7 is 1, so [r62] is loaded into r14 and stored into > [r77] > (i.e. MR = M in IF (K.GT.M1) MR = M is executed). > Then chk.a.clr realizes the [r44] memory changed, so branches to .L69 to do the > speculated stuff again. This time r18 is 1, so p7 will be 0 and MR = M is not > done. But this really doesn't and can't undo the st4 [r77] = r14 store that > already happened before chk.a.clr and wasn't supposed to happen. Oh, now I see the problem. I need some time to check what the best fix would be. Probably, the fix will be to exclude predicated instructions from speculation set, but I'd like to consider alternatives. Thanks, Maxim
Assign to self
Maxim, have you had time to look at this bug? Given that it is generating bad code and is in 4.3.0 and 4.3.1 I was wondering if it will be fixed for 4.3.2.
(In reply to comment #22) > Maxim, have you had time to look at this bug? Given that it is generating bad > code and is in 4.3.0 and 4.3.1 I was wondering if it will be fixed for 4.3.2. Sorry for the delay. I posted the fix at http://gcc.gnu.org/ml/gcc-patches/2008-08/msg00112.html. I would appreciate if someone could test the cumulative patch of three fixes for ia64 speculation support or provide a single-file executable testcase for this bug. Here are links to two other bugfixes for ia64 speculation support: http://gcc.gnu.org/ml/gcc-patches/2008-08/msg00110.html http://gcc.gnu.org/ml/gcc-patches/2008-08/msg00111.html
I bootstrapped the 3 patches on mainline and 4.3 branch and verified that they fix the problem that is reported on the 4.3 branch with the patches.
Subject: Bug 35659 Author: mkuvyrkov Date: Wed Aug 6 06:23:47 2008 New Revision: 138759 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=138759 Log: PR target/35659 * haifa-sched.c (sched_insn_is_legitimate_for_speculation_p): Move ... * sched-deps.c (sched_insn_is_legitimate_for_speculation_p): ... here. Don't allow predicated instructions for data speculation. * sched-int.h (sched_insn_is_legitimate_for_speculation_p): Move declaration. Modified: trunk/gcc/ChangeLog trunk/gcc/haifa-sched.c trunk/gcc/sched-deps.c trunk/gcc/sched-int.h
Subject: Bug 35659 Author: mkuvyrkov Date: Wed Aug 6 06:34:18 2008 New Revision: 138762 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=138762 Log: Backport from mainline: 2008-08-06 Maxim Kuvyrkov <maxim@codesourcery.com> PR target/35659 * haifa-sched.c (sched_insn_is_legitimate_for_speculation_p): Move ... * sched-deps.c (sched_insn_is_legitimate_for_speculation_p): ... here. Don't allow predicated instructions for data speculation. * sched-int.h (sched_insn_is_legitimate_for_speculation_p): Move declaration. Modified: branches/gcc-4_3-branch/gcc/ChangeLog branches/gcc-4_3-branch/gcc/haifa-sched.c branches/gcc-4_3-branch/gcc/sched-deps.c branches/gcc-4_3-branch/gcc/sched-int.h
Should be fixed on both trunk and 4_3-branch.
Subject: Bug 35659 Author: bonzini Date: Fri Feb 6 07:33:05 2009 New Revision: 143980 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=143980 Log: 2009-02-06 Paolo Bonzini <bonzini@gnu.org> PR tree-optimization/35659 * tree-ssa-sccvn.c (vn_constant_eq, vn_reference_eq, vn_nary_op_eq vn_phi_eq): Shortcut if hashcode does not match. (vn_reference_op_compute_hash): Do not call iterative_hash_expr for NULL operands. * tree-ssa-pre.c (pre_expr_hash): Look at hashcode if available, and avoid iterative_hash_expr. (FOR_EACH_VALUE_ID_IN_SET): New. (value_id_compare): Remove. (sorted_array_from_bitmap_set): Use FOR_EACH_VALUE_ID_IN_SET to sort expressions by value id. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-ssa-pre.c trunk/gcc/tree-ssa-sccvn.c