Revision 128239 causes some random libgomp failures on Linux/ia64: FAIL: libgomp.c/loop-1.c execution test FAIL: libgomp.c/nestedfn-4.c execution test FAIL: libgomp.c/omp-loop01.c execution test FAIL: libgomp.c/ordered-2.c execution test FAIL: libgomp.c/pr26943-4.c execution test FAIL: libgomp.c/pr29947-1.c execution test FAIL: libgomp.c/pr29947-2.c execution test FAIL: libgomp.c++/copyin-1.C -O0 execution test FAIL: libgomp.c++/copyin-1.C -O2 execution test FAIL: libgomp.c++/copyin-1.C -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test FAIL: libgomp.c++/copyin-2.C -O3 -fomit-frame-pointer -funroll-loops execution test FAIL: libgomp.c++/copyin-2.C -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test FAIL: libgomp.c++/copyin-2.C -Os execution test FAIL: libgomp.c++/loop-1.C -O execution test
Apply this patch: --- gcc/opts.c.small 2007-09-09 12:56:26.000000000 -0700 +++ gcc/opts.c 2007-09-11 07:13:15.000000000 -0700 @@ -822,7 +822,9 @@ decode_options (unsigned int argc, const if (optimize >= 2) { +#if 0 flag_inline_small_functions = 1; +#endif flag_thread_jumps = 1; flag_crossjumping = 1; flag_optimize_sibling_calls = 1; [hjl@gnu-13 applied]$ fixes the random failures in libgomp. Maybe this bug is related to PR 33383.
Subject: Re: [4.3 Regression] Revision 128239 causes libgomp failure Hi, I´ve just tested ia64-linux and x86_64-linux on our testers and both are clean WRT libgomp: === libgomp Summary === # of expected passes 496 it might be some sort of race condition (I sometimes see libgomp failures appear and go such as: libgomp.fortran/omp_parse3.f90 -O0 execution test One obvious difference is that we now consider functions produced by OMP lowering inlinable. This can be avoided by: Index: omp-low.c =================================================================== *** omp-low.c (revision 128412) --- omp-low.c (working copy) *************** expand_omp_parallel (struct omp_region * *** 2589,2594 **** --- 2589,2595 ---- /* Inform the callgraph about the new function. */ DECL_STRUCT_FUNCTION (child_fn)->curr_properties = cfun->curr_properties; + DECL_UNINLINABLE (child_fn) = true; cgraph_add_new_function (child_fn, true); /* Fix the callgraph edges for child_cfun. Those for cfun will be I am not sure if the functions was uninlinable by decision or just because someone forgot to play with DECL_INLINE, but can you test this if it makes any difference? I can not think of any other difference this patch sould cause especially with -finline-functions. Jakub, is libgomp safe wrt inlining? Honza
(In reply to comment #2) > Subject: Re: [4.3 Regression] Revision 128239 causes libgomp failure > > Hi, > I´ve just tested ia64-linux and x86_64-linux on our testers and both are > clean WRT libgomp: > === libgomp Summary === > > # of expected passes 496 > it might be some sort of race condition (I sometimes see libgomp > failures appear and go such as: Have you compared the times to take for "make check" on libgomp between revision 128238 and HEAD? With C libgomp tests only, revision 128238 takes less than 15 seconds on a 1.5GHz ia64 machine. Revision 128239 takes more than 20 minutes plus random failures. > > libgomp.fortran/omp_parse3.f90 -O0 execution test > > One obvious difference is that we now consider functions produced by OMP > lowering inlinable. This can be avoided by: > > Index: omp-low.c > =================================================================== > *** omp-low.c (revision 128412) > --- omp-low.c (working copy) > *************** expand_omp_parallel (struct omp_region * > *** 2589,2594 **** > --- 2589,2595 ---- > /* Inform the callgraph about the new function. */ > DECL_STRUCT_FUNCTION (child_fn)->curr_properties > = cfun->curr_properties; > + DECL_UNINLINABLE (child_fn) = true; > cgraph_add_new_function (child_fn, true); > > /* Fix the callgraph edges for child_cfun. Those for cfun will be > It makes no differences to me.
Subject: Re: [4.3 Regression] Revision 128239 causes libgomp failure > > Have you compared the times to take for "make check" on libgomp between > revision 128238 and HEAD? With C libgomp tests only, revision 128238 takes > less than 15 seconds on a 1.5GHz ia64 machine. Revision 128239 takes more > than 20 minutes plus random failures. Interesting. At the moment I can't access any ia64 machine directly, just test via our mail based interface that won't let me time the test. Are those failures an timeouts? It might be some sort of deadlocking... Honza
(In reply to comment #4) > > Interesting. At the moment I can't access any ia64 machine directly, > just test via our mail based interface that won't let me time the test. > Are those failures an timeouts? It might be some sort of deadlocking... > All failures are timeout.
Cannot reproduce this with r128425 + PR32337 + PR32338 fix (8way ia64, make -j16 -k check): Native configuration is ia64-unknown-linux-gnu === gcc tests === Running target unix FAIL: gcc.c-torture/execute/mayalias-2.c compilation, -O3 -g (internal compiler error) UNRESOLVED: gcc.c-torture/execute/mayalias-2.c execution, -O3 -g FAIL: gcc.c-torture/execute/mayalias-3.c compilation, -O3 -g (internal compiler error) UNRESOLVED: gcc.c-torture/execute/mayalias-3.c execution, -O3 -g FAIL: gcc.dg/cpp/_Pragma3.c (test for excess errors) XPASS: gcc.dg/cpp/cmdlne-dI-M.c scan-file (^|\\\\n)cmdlne-dI-M.*:[^\\\\n]*cmdlne-dI-M.c XPASS: gcc.dg/cpp/cmdlne-dM-M.c scan-file (^|\\\\n)cmdlne-dM-M[^\\\\n]*:[^\\\\n]*cmdlne-dM-M.c FAIL: gcc.dg/cpp/direct2s.c (test for excess errors) FAIL: gcc.dg/builtin-apply4.c execution test FAIL: gcc.dg/pr30643.c scan-assembler-not undefined WARNING: gcc.dg/torture/fp-int-convert-float128-timode.c -O0 compilation failed to produce executable WARNING: gcc.dg/torture/fp-int-convert-float128-timode.c -O1 compilation failed to produce executable WARNING: gcc.dg/torture/fp-int-convert-float128-timode.c -O2 compilation failed to produce executable WARNING: gcc.dg/torture/fp-int-convert-float128-timode.c -O3 -fomit-frame-pointer compilation failed to produce executable WARNING: gcc.dg/torture/fp-int-convert-float128-timode.c -O3 -g compilation failed to produce executable WARNING: gcc.dg/torture/fp-int-convert-float128-timode.c -Os compilation failed to produce executable FAIL: gcc.dg/torture/fp-int-convert-float128.c -O0 (test for excess errors) WARNING: gcc.dg/torture/fp-int-convert-float128.c -O0 compilation failed to produce executable FAIL: gcc.dg/torture/fp-int-convert-float128.c -O1 (test for excess errors) WARNING: gcc.dg/torture/fp-int-convert-float128.c -O1 compilation failed to produce executable FAIL: gcc.dg/torture/fp-int-convert-float128.c -O2 (test for excess errors) WARNING: gcc.dg/torture/fp-int-convert-float128.c -O2 compilation failed to produce executable FAIL: gcc.dg/torture/fp-int-convert-float128.c -O3 -fomit-frame-pointer (test for excess errors) WARNING: gcc.dg/torture/fp-int-convert-float128.c -O3 -fomit-frame-pointer compilation failed to produce executable FAIL: gcc.dg/torture/fp-int-convert-float128.c -O3 -g (test for excess errors) WARNING: gcc.dg/torture/fp-int-convert-float128.c -O3 -g compilation failed to produce executable FAIL: gcc.dg/torture/fp-int-convert-float128.c -Os (test for excess errors) WARNING: gcc.dg/torture/fp-int-convert-float128.c -Os compilation failed to produce executable XPASS: gcc.dg/tree-ssa/loop-1.c scan-assembler-times foo 5 FAIL: gcc.dg/tree-ssa/ssa-fre-4.c scan-tree-dump Replaced \\(char\\) .*with XPASS: gcc.dg/tree-ssa/update-threading.c scan-tree-dump-times Invalid sum 0 FAIL: gcc.dg/vect/vect-104.c scan-tree-dump-times vectorized 1 loops 0 FAIL: gcc.dg/vect/vect-iv-6.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/vect-outer-2.c scan-tree-dump-times OUTER LOOP VECTORIZED 1 FAIL: gcc.dg/vect/vect-outer-2a.c scan-tree-dump-times OUTER LOOP VECTORIZED 1 FAIL: gcc.dg/vect/vect-outer-2c.c scan-tree-dump-times OUTER LOOP VECTORIZED 1 FAIL: gcc.dg/vect/vect-outer-4c.c scan-tree-dump-times OUTER LOOP VECTORIZED 1 FAIL: gcc.dg/vect/vect-outer-5.c scan-tree-dump-times zero step in outer loop. 1 FAIL: gcc.dg/vect/vect-strided-store-u16-i4.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/vect-vfa-slp.c scan-tree-dump-times vectorized 1 loops 1 ERROR: gcc.dg/vect/slp-10.c: error executing dg-final: syntax error in target selector "target !{vect_int_mult}" UNRESOLVED: gcc.dg/vect/slp-10.c: error executing dg-final: syntax error in target selector "target !{vect_int_mult}" FAIL: gcc.dg/vect/slp-15.c scan-tree-dump-times vectorizing stmts using SLP 2 FAIL: gcc.dg/vect/slp-25.c scan-tree-dump-times Alignment of access forced using peeling 2 FAIL: gcc.dg/vect/slp-28.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/slp-28.c scan-tree-dump-times vectorizing stmts using SLP 1 FAIL: gcc.dg/vect/slp-3.c scan-tree-dump-times vectorized 3 loops 1 FAIL: gcc.dg/vect/slp-3.c scan-tree-dump-times vectorizing stmts using SLP 3 ERROR: gcc.dg/vect/slp-33.c: error executing dg-final: syntax error in target selector "target !{vect_int_mult}" UNRESOLVED: gcc.dg/vect/slp-33.c: error executing dg-final: syntax error in target selector "target !{vect_int_mult}" FAIL: gcc.dg/vect/no-vfa-pr29145.c scan-tree-dump-times vectorized 0 loops 2 FAIL: gcc.dg/vect/no-vfa-pr29145.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/no-vfa-vect-depend-1.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times OUTER LOOP VECTORIZED. 1 FAIL: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times OUTER LOOP VECTORIZED. 1 FAIL: gcc.dg/vect/no-scevccp-outer-18.c scan-tree-dump-times OUTER LOOP VECTORIZED. 1 FAIL: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times OUTER LOOP VECTORIZED. 1 FAIL: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times OUTER LOOP VECTORIZED. 1 === gcc Summary === # of expected passes 44377 # of unexpected failures 36 # of unexpected successes 4 # of expected failures 207 # of unresolved testcases 4 # of untested testcases 35 # of unsupported tests 314 /tmp/jakub/gcc/obj/gcc/xgcc version 4.3.0 20070912 (experimental) (GCC) === gfortran tests === Running target unix FAIL: gfortran.dg/do_3.F90 -O3 -fomit-frame-pointer execution test FAIL: gfortran.dg/do_3.F90 -O3 -fomit-frame-pointer -funroll-loops execution test FAIL: gfortran.dg/do_3.F90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test FAIL: gfortran.dg/do_3.F90 -O3 -g execution test === gfortran Summary === # of expected passes 21091 # of unexpected failures 4 # of expected failures 17 # of unsupported tests 21 /tmp/jakub/gcc/obj/gcc/testsuite/gfortran/../../gfortran version 4.3.0 20070912 (experimental) (GCC) === g++ tests === Running target unix XPASS: g++.dg/tree-ssa/copyprop-1.C scan-tree-dump-not .* = [^>;]*; FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn === g++ Summary === # of expected passes 15848 # of unexpected failures 7 # of unexpected successes 1 # of expected failures 86 # of unsupported tests 129 /tmp/jakub/gcc/obj/gcc/testsuite/g++/../../g++ version 4.3.0 20070912 (experimental) (GCC) === objc tests === Running target unix === objc Summary === # of expected passes 1810 # of expected failures 7 # of unsupported tests 25 /tmp/jakub/gcc/obj/gcc/xgcc version 4.3.0 20070912 (experimental) (GCC) === libffi tests === Running target unix === libffi Summary === # of expected passes 1108 # of unsupported tests 8 === libgomp tests === Running target unix === libgomp Summary === # of expected passes 1607 === libjava tests === Running target unix FAIL: getlocalvartable output FAIL: Throw_3 -O3 output - source compiled test FAIL: Throw_3 -O3 -findirect-dispatch output - source compiled test === libjava Summary === # of expected passes 2547 # of unexpected failures 3 === libmudflap tests === Running target unix FAIL: libmudflap.c++/pass41-frag.cxx execution test ... Many failures, but this was make -j16 -k check on 8way ia64 and libmudflap tests are horribly sensitive to loadavg. FAIL: libmudflap.cth/pass40-frag.c (-O3) output pattern test === libmudflap Summary === # of expected passes 1493 # of unexpected failures 333 === libstdc++ tests === Running target unix FAIL: 22_locale/moneypunct_byname/named_equivalence.cc (test for excess errors) WARNING: 22_locale/moneypunct_byname/named_equivalence.cc compilation failed to produce executable XPASS: 26_numerics/headers/cmath/c99_classification_macros_c.cc (test for excess errors) XPASS: 27_io/fpos/14320-1.cc execution test === libstdc++ Summary === # of expected passes 5169 # of unexpected failures 1 # of unexpected successes 2 # of expected failures 56 # of unsupported tests 8 Compiler version: 4.3.0 20070912 (experimental) (GCC) Platform: ia64-unknown-linux-gnu
BTW, when you say -fno-inline-small-functions cures this for you, is the problem on libgomp side or in the generated gomp tests? I.e. if you build libgomp with the default -O2 -finline-small-functions and run make check in libgomp with RUNTESTFLAGS=--target_board=unix/-fno-inline-small-functions do the problems go away? Or vice versa, build libgomp built with -O2 -fno-inline-small-functions vs. RUNTEST=--target_board=unix/-finline-small-functions ?
(In reply to comment #6) > Cannot reproduce this with r128425 + PR32337 + PR32338 fix (8way ia64, make > -j16 -k check): > From http://gcc.gnu.org/ml/gcc-testresults/2007-09/msg00573.html === libgomp tests === Running target unix FAIL: libgomp.c/pr26943-2.c execution test FAIL: libgomp.c/pr26943-4.c execution test FAIL: libgomp.c/pr29947-1.c execution test FAIL: libgomp.fortran/omp_parse3.f90 -O0 execution test Have you checked the times it takes to run "make check" on libgomp? The problem is random failure with timeout. "make check" may pass, but takes much longer to finish.
(In reply to comment #7) > BTW, when you say -fno-inline-small-functions cures this for you, is the > problem > on libgomp side or in the generated gomp tests? I.e. if you build libgomp Apparently not. From http://gcc.gnu.org/ml/gcc-testresults/2007-09/msg00581.html === libgomp tests === Running target unix FAIL: libgomp.c/pr26943-2.c execution test FAIL: libgomp.fortran/omp_parse2.f90 -O0 execution test FAIL: libgomp.fortran/omp_parse2.f90 -O2 execution test FAIL: libgomp.fortran/omp_parse2.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test FAIL: libgomp.fortran/omp_parse3.f90 -O0 execution test === libgomp Summary === # of expected passes 1602 # of unexpected failures 5 The nature of this bug is random failure.
The whole testsuite takes roughly 10 minutes: head -n2 libgomp.log; tail -n4 libgomp.log Test Run By root on Wed Sep 12 10:47:12 2007 Native configuration is ia64-unknown-linux-gnu === libgomp Summary === # of expected passes 1607 runtest completed at Wed Sep 12 10:57:39 2007 and just RUNTESTFLAGS=c.exp takes around 30sec, no failures here (stock RHEL5). Can you try to run make check with say libgomp.so* from some older build before you started seeing this or from 4.2? Knowing whether your libgomp was miscompiled or if the testcases themselves are miscompiled is really crucial to finding out what's going on.
(In reply to comment #10) > you started seeing this or from 4.2? Knowing whether your libgomp was > miscompiled or if the testcases themselves are miscompiled is really crucial > to finding out what's going on. > I verified that it was libgomp which was miscompiled. BTW, I am running RHEL4 U5.
Can you see if e.g. building whole libgomp with -O0 fixes it? If yes, can you do a binary search which of the sources is miscompiled? Thanks.
(In reply to comment #12) > Can you see if e.g. building whole libgomp with -O0 fixes it? > If yes, can you do a binary search which of the sources is miscompiled? > Thanks. > When I replaced bar.o with the bad one, C only "make check" went from 13 seconds to 84 seconds.
Can you please attach bad and good bar.s (with -fverbose-asm) and also -ftree-dump-optimized dump from both?
Created attachment 14197 [details] Good bar.s and bar.c.117t.optimized generated by revision 128238
Created attachment 14198 [details] Bad bar.s and bar.c.121t.optimized generated by revision 128239
gomp_barrier_wait_end in bar.c.121t.optimized has __asm__ __volatile__("break 0x100000":"=r" r15, "=r" out0, "=r" out1, "=r" out2, "=r" out3:"r" r15, "r" out0, "r" out1, "r" out2, "r" out3:"b6", "f15", "f14", "f13", "f12", "f11", "f10", "f9", "f8", "f7", "f6", "p15", "p14", "p13", "p12", "p11", "p10", "p9", "p8", "p7", "p6", "r31", "r30", "r29", "r28", "r27", "r26", "r25", "r24", "r23", "r22", "r21", "r20", "r19", "r18", "r17", "r16", "r14", "r13", "r12", "r11", "r9", "r3", "r2", "out7", "out6", "out5", "out4", "r10", "r8", "memory"); But r15, out0, out1, out2 and out3 aren't initialized at all.
Revision 128239 miscompiled static inline void sys_futex0(int *addr, int op, int val) { register long out0 asm ("out0") = (long) addr; register long out1 asm ("out1") = op; register long out2 asm ("out2") = val; register long out3 asm ("out3") = 0; register long r15 asm ("r15") = SYS_futex; __asm __volatile ("break 0x100000" : "=r"(r15), "=r"(out0), "=r"(out1), "=r"(out2), "=r"(out3) : "r"(r15), "r"(out0), "r"(out1), "r"(out2), "r"(out3) : "memory", "r8", "r10", "out4", "out5", "out6", "out7", /* Non-stacked integer registers, minus r8, r10, r15. */ "r2", "r3", "r9", "r11", "r12", "r13", "r14", "r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23", "r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31", /* Predicate registers. */ "p6", "p7", "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15", /* Non-rotating fp registers. */ "f6", "f7", "f8", "f9", "f10", "f11", "f12", "f13", "f14", "f15", /* Branch registers. */ "b6"); }
Here is self-contained source: void sys_futex0(int *addr, int op, int val) { register long out0 asm ("out0") = (long) addr; register long out1 asm ("out1") = op; register long out2 asm ("out2") = val; register long out3 asm ("out3") = 0; register long r15 asm ("r15") = 1230; __asm __volatile ("break 0x100000" : "=r"(r15), "=r"(out0), "=r"(out1), "=r"(out2), "=r"(out3) : "r"(r15), "r"(out0), "r"(out1), "r"(out2), "r"(out3) : "memory", "r8", "r10", "out4", "out5", "out6", "out7", /* Non-stacked integer registers, minus r8, r10, r15. */ "r2", "r3", "r9", "r11", "r12", "r13", "r14", "r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23", "r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31", /* Predicate registers. */ "p6", "p7", "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15", /* Non-rotating fp registers. */ "f6", "f7", "f8", "f9", "f10", "f11", "f12", "f13", "f14", "f15", /* Branch registers. */ "b6"); } vs. void sys_futex0(int *addr, int op, int val) { register long out0 asm ("out0") = (long) addr; register long out1 asm ("out1") = op; register long out2 asm ("out2") = val; register long out3 asm ("out3") = 0; register long r15 asm ("r15") = 1230; __asm __volatile ("break 0x100000" : : "r"(r15), "r"(out0), "r"(out1), "r"(out2), "r"(out3) : "memory", "r8", "r10", "out4", "out5", "out6", "out7", /* Non-stacked integer registers, minus r8, r10, r15. */ "r2", "r3", "r9", "r11", "r12", "r13", "r14", "r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23", "r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31", /* Predicate registers. */ "p6", "p7", "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15", /* Non-rotating fp registers. */ "f6", "f7", "f8", "f9", "f10", "f11", "f12", "f13", "f14", "f15", /* Branch registers. */ "b6"); } The former is miscompiled by sdse1 pass: Deleted dead store '# STORES: { out0D.1189 } out0D.1189 ={v} addr.0D.1194_2' Deleted dead store '# STORES: { out1D.1190 } out1D.1190 ={v} out1.1D.1195_4' Deleted dead store '# STORES: { out2D.1191 } out2D.1191 ={v} out2.2D.1196_6' Deleted dead store '# STORES: { out3D.1192 } out3D.1192 ={v} 0' Deleted dead store '# STORES: { r15D.1193 } r15D.1193 ={v} 1230' sys_futex0 (addrD.1184, opD.1185, valD.1186) { register long intD.2 r15D.1193 __asm__ (*r15); register long intD.2 out3D.1192 __asm__ (*out3); register long intD.2 out2D.1191 __asm__ (*out2); register long intD.2 out1D.1190 __asm__ (*out1); register long intD.2 out0D.1189 __asm__ (*out0); long intD.2 out2.2D.1196; long intD.2 out1.1D.1195; long intD.2 addr.0D.1194; # BLOCK 2 # PRED: ENTRY (fallthru,exec) addr.0D.1194_2 = (long intD.2) addrD.1184_1(D); out1.1D.1195_4 = (long intD.2) opD.1185_3(D); out2.2D.1196_6 = (long intD.2) valD.1186_5(D); # STORES: { out0D.1189 out1D.1190 out2D.1191 out3D.1192 r15D.1193 } __asm__ __volatile__("break 0x100000":"=r" r15D.1193, "=r" out0D.1189, "=r" out1D.1190, "=r" out2D.1191, "=r" out3D.1192:"r" r1 5D.1193, "r" out0D.1189, "r" out1D.1190, "r" out2D.1191, "r" out3D.1192:"b6", "f15", "f14", "f13", "f12", "f11", "f10", "f9", "f8 ", "f7", "f6", "p15", "p14", "p13", "p12", "p11", "p10", "p9", "p8", "p7", "p6", "r31", "r30", "r29", "r28", "r27", "r26", "r25", "r24", "r23", "r22", "r21", "r20", "r19", "r18", "r17", "r16", "r14", "r13", "r12", "r11", "r9", "r3", "r2", "out7", "out6", "ou t5", "out4", "r10", "r8", "memory"); return; # SUCC: EXIT while the latter is not: ;; Function sys_futex0 (sys_futex0) sys_futex0 (addrD.1184, opD.1185, valD.1186) { register long intD.2 r15D.1193 __asm__ (*r15); register long intD.2 out3D.1192 __asm__ (*out3); register long intD.2 out2D.1191 __asm__ (*out2); register long intD.2 out1D.1190 __asm__ (*out1); register long intD.2 out0D.1189 __asm__ (*out0); long intD.2 out2.2D.1196; long intD.2 out1.1D.1195; long intD.2 addr.0D.1194; # BLOCK 2 # PRED: ENTRY (fallthru,exec) addr.0D.1194_2 = (long intD.2) addrD.1184_1(D); # STORES: { out0D.1189 } out0D.1189 ={v} addr.0D.1194_2; out1.1D.1195_4 = (long intD.2) opD.1185_3(D); # STORES: { out1D.1190 } out1D.1190 ={v} out1.1D.1195_4; out2.2D.1196_6 = (long intD.2) valD.1186_5(D); # STORES: { out2D.1191 } out2D.1191 ={v} out2.2D.1196_6; # STORES: { out3D.1192 } out3D.1192 ={v} 0; # STORES: { r15D.1193 } r15D.1193 ={v} 1230; # LOADS: { out0D.1189 out1D.1190 out2D.1191 out3D.1192 r15D.1193 } __asm__ __volatile__("break 0x100000"::"r" r15D.1193, "r" out0D.1189, "r" out1D.1190, "r" out2D.1191, "r" out3D.1192:"b6", "f15 ", "f14", "f13", "f12", "f11", "f10", "f9", "f8", "f7", "f6", "p15", "p14", "p13", "p12", "p11", "p10", "p9", "p8", "p7", "p6", " r31", "r30", "r29", "r28", "r27", "r26", "r25", "r24", "r23", "r22", "r21", "r20", "r19", "r18", "r17", "r16", "r14", "r13", "r12 ", "r11", "r9", "r3", "r2", "out7", "out6", "out5", "out4", "r10", "r8", "memory"); return; # SUCC: EXIT Seems if the same reg is both input and output it is just among STORED_SYMS, not LOADED_SYMS.
append_vuse has: static inline void append_vuse (tree var) { tree sym; if (TREE_CODE (var) != SSA_NAME) { tree mpt; var_ann_t ann; /* If VAR belongs to a memory partition, use it instead of VAR. */ mpt = memory_partition (var); if (mpt) var = mpt; /* Don't allow duplicate entries. */ ann = get_var_ann (var); if (ann->in_vuse_list || ann->in_vdef_list) return; ann->in_vuse_list = true; sym = var; } else sym = SSA_NAME_VAR (var); VEC_safe_push (tree, heap, build_vuses, var); bitmap_set_bit (build_loads, DECL_UID (sym)); } For VAR_DECL with DECL_HARD_REGISTER set, if it is only in the input list of asm, it is added as vuse/load, if it is only in output list of am, it is added as vdef/store. But the above /* Don't allow duplicate entries. */ means that if there is "=r" (x) : "r" (x) with VAR_DECL DECL_HARD_REGISTER x, it is only marked as store/vdef, not as load/vuse. No matter whether it is right or wrong in tree-ssa-operands.c, I think sdse should just avoid removing VAR_DECL DECL_HARD_REGISTER store elimination.
Subject: Re: [4.3 Regression] Revision 128239 causes libgomp failure The failure occurs in the sdse (simple dead store elimination) pass. It looks like it is confused by an extended asm statement with input/output operands. It thinks these operands are not read by the asm. If the only use of a variable is inside such an asm, then sdse will think that the variable is never read, and optimize away stores to it. I haven't tried debugging sdse yet to see exactly what is going wrong. Jim
Subject: Re: [4.3 Regression] Revision 128239 causes libgomp failure jakub at gcc dot gnu dot org wrote: > No matter whether it is right or wrong in tree-ssa-operands.c, I think > sdse should just avoid removing VAR_DECL DECL_HARD_REGISTER store elimination. This isn't a volatile register, nor a global register variable, so there should be no problem performing dead store elimination on it. I think tree-ssa-operands.c needs to be fixed. I'm not a tree-ssa expert, but the fix needed in tree-ssa-operands.c seems simple enough. The append_vuse function does two things, it keeps a list of v_uses, and it keeps a list of loads. We don't want a v_use if we already have a v_def, but we still need to mark it as a load. We can do that with a little rearrangement of the code. Like this. This works for the testcase, but has had no other testing as yet. 2007-09-12 James E. Wilson <wilson@specifix.com> * tree-ssa-operands.c (append_vuse): If ann->in_vdef_list true, then set build_loads before returning. Index: tree-ssa-operands.c =================================================================== --- tree-ssa-operands.c (revision 128394) +++ tree-ssa-operands.c (working copy) @@ -1164,8 +1164,15 @@ /* Don't allow duplicate entries. */ ann = get_var_ann (var); - if (ann->in_vuse_list || ann->in_vdef_list) + if (ann->in_vuse_list) return; + else if (ann->in_vdef_list) + { + /* We don't want a vuse if we already have a vdef, but we must + still put this in build_loads. */ + bitmap_set_bit (build_loads, DECL_UID (var)); + return; + } ann->in_vuse_list = true; sym = var;
(In reply to comment #22) > > This works for the testcase, but has had no other testing as yet. > > 2007-09-12 James E. Wilson <wilson@specifix.com> > > * tree-ssa-operands.c (append_vuse): If ann->in_vdef_list true, > then set build_loads before returning. > > Index: tree-ssa-operands.c > =================================================================== > --- tree-ssa-operands.c (revision 128394) > +++ tree-ssa-operands.c (working copy) > @@ -1164,8 +1164,15 @@ > > /* Don't allow duplicate entries. */ > ann = get_var_ann (var); > - if (ann->in_vuse_list || ann->in_vdef_list) > + if (ann->in_vuse_list) > return; > + else if (ann->in_vdef_list) > + { > + /* We don't want a vuse if we already have a vdef, but we must > + still put this in build_loads. */ > + bitmap_set_bit (build_loads, DECL_UID (var)); > + return; > + } > > ann->in_vuse_list = true; > sym = var; > It works for C tests in libgomp. I am bootstrapping on Linux/ia64 now. Thanks.
(In reply to comment #23) > (In reply to comment #22) > > > 2007-09-12 James E. Wilson <wilson@specifix.com> > > > > * tree-ssa-operands.c (append_vuse): If ann->in_vdef_list true, > > then set build_loads before returning. > > > > It works for C tests in libgomp. I am bootstrapping on Linux/ia64 now. Thanks. > It seems fine on Linux/ia64, Linux/ia32 and Linux/x86-64.
Thanks a lot for tracking down this issue. The patch looks correct to me, can you please send it into gcc-patches? Honza
(In reply to comment #25) > Thanks a lot for tracking down this issue. The patch looks correct to me, can > you please send it into gcc-patches? > > Honza > Posted at http://gcc.gnu.org/ml/gcc-patches/2007-09/msg01178.html
Subject: Bug 33389 Author: hjl Date: Thu Sep 13 16:33:49 2007 New Revision: 128469 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=128469 Log: 2007-09-13 James E. Wilson <wilson@specifix.com> PR tree-optimization/33389 * tree-ssa-operands.c (append_vuse): If ann->in_vdef_list true, then set build_loads before returning. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-ssa-operands.c
Thanks, Jim.