Bug 33389 - [4.3 Regression] Revision 128239 causes libgomp failure
Summary: [4.3 Regression] Revision 128239 causes libgomp failure
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.3.0
: P3 blocker
Target Milestone: 4.3.0
Assignee: Not yet assigned to anyone
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords:
Depends on: 33383
Blocks:
  Show dependency treegraph
 
Reported: 2007-09-11 13:21 UTC by H.J. Lu
Modified: 2007-09-14 11:32 UTC (History)
4 users (show)

See Also:
Host:
Target: ia64-unknown-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Good bar.s and bar.c.117t.optimized generated by revision 128238 (6.66 KB, application/octet-stream)
2007-09-12 17:10 UTC, H.J. Lu
Details
Bad bar.s and bar.c.121t.optimized generated by revision 128239 (6.40 KB, application/octet-stream)
2007-09-12 17:11 UTC, H.J. Lu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2007-09-11 13:21:24 UTC
Revision 128239 causes some random libgomp failures on Linux/ia64:

FAIL: libgomp.c/loop-1.c execution test
FAIL: libgomp.c/nestedfn-4.c execution test
FAIL: libgomp.c/omp-loop01.c execution test
FAIL: libgomp.c/ordered-2.c execution test
FAIL: libgomp.c/pr26943-4.c execution test
FAIL: libgomp.c/pr29947-1.c execution test
FAIL: libgomp.c/pr29947-2.c execution test
FAIL: libgomp.c++/copyin-1.C  -O0  execution test
FAIL: libgomp.c++/copyin-1.C  -O2  execution test
FAIL: libgomp.c++/copyin-1.C  -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions  execution test
FAIL: libgomp.c++/copyin-2.C  -O3 -fomit-frame-pointer -funroll-loops  execution test
FAIL: libgomp.c++/copyin-2.C  -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions  execution test
FAIL: libgomp.c++/copyin-2.C  -Os  execution test
FAIL: libgomp.c++/loop-1.C  -O  execution test
Comment 1 H.J. Lu 2007-09-11 16:35:31 UTC
Apply this patch:

--- gcc/opts.c.small    2007-09-09 12:56:26.000000000 -0700
+++ gcc/opts.c  2007-09-11 07:13:15.000000000 -0700
@@ -822,7 +822,9 @@ decode_options (unsigned int argc, const
 
   if (optimize >= 2)
     {
+#if 0
       flag_inline_small_functions = 1;
+#endif
       flag_thread_jumps = 1;
       flag_crossjumping = 1;
       flag_optimize_sibling_calls = 1;
[hjl@gnu-13 applied]$ 

fixes the random failures in libgomp. Maybe this bug is related to
PR 33383.
Comment 2 Jan Hubicka 2007-09-12 12:01:48 UTC
Subject: Re:  [4.3 Regression]  Revision 128239 causes libgomp failure

Hi,
I´ve just tested ia64-linux and x86_64-linux on our testers and both are
clean WRT libgomp:
                === libgomp Summary ===

		# of expected passes            496
it might be some sort of race condition (I sometimes see libgomp
failures appear and go such as:

libgomp.fortran/omp_parse3.f90  -O0  execution test

One obvious difference is that we now consider functions produced by OMP
lowering inlinable.  This can be avoided by:

Index: omp-low.c
===================================================================
*** omp-low.c	(revision 128412)
--- omp-low.c	(working copy)
*************** expand_omp_parallel (struct omp_region *
*** 2589,2594 ****
--- 2589,2595 ----
        /* Inform the callgraph about the new function.  */
        DECL_STRUCT_FUNCTION (child_fn)->curr_properties
  	= cfun->curr_properties;
+       DECL_UNINLINABLE (child_fn) = true;
        cgraph_add_new_function (child_fn, true);
  
        /* Fix the callgraph edges for child_cfun.  Those for cfun will be

I am not sure if the functions was uninlinable by decision or just
because someone forgot to play with DECL_INLINE, but can you test this
if it makes any difference?
I can not think of any other difference this patch sould cause
especially with -finline-functions.

Jakub, is libgomp safe wrt inlining?

Honza
Comment 3 H.J. Lu 2007-09-12 13:54:56 UTC
(In reply to comment #2)
> Subject: Re:  [4.3 Regression]  Revision 128239 causes libgomp failure
> 
> Hi,
> I´ve just tested ia64-linux and x86_64-linux on our testers and both are
> clean WRT libgomp:
>                 === libgomp Summary ===
> 
>                 # of expected passes            496
> it might be some sort of race condition (I sometimes see libgomp
> failures appear and go such as:

Have you compared the times to take for "make check" on libgomp between
revision 128238 and HEAD? With C libgomp tests only, revision 128238 takes
less than 15 seconds on a 1.5GHz ia64 machine. Revision 128239 takes more
than 20 minutes plus random failures.

> 
> libgomp.fortran/omp_parse3.f90  -O0  execution test
> 
> One obvious difference is that we now consider functions produced by OMP
> lowering inlinable.  This can be avoided by:
> 
> Index: omp-low.c
> ===================================================================
> *** omp-low.c   (revision 128412)
> --- omp-low.c   (working copy)
> *************** expand_omp_parallel (struct omp_region *
> *** 2589,2594 ****
> --- 2589,2595 ----
>         /* Inform the callgraph about the new function.  */
>         DECL_STRUCT_FUNCTION (child_fn)->curr_properties
>         = cfun->curr_properties;
> +       DECL_UNINLINABLE (child_fn) = true;
>         cgraph_add_new_function (child_fn, true);
> 
>         /* Fix the callgraph edges for child_cfun.  Those for cfun will be
> 

It makes no differences to me.
Comment 4 Jan Hubicka 2007-09-12 14:16:47 UTC
Subject: Re:  [4.3 Regression]  Revision 128239 causes libgomp failure

> 
> Have you compared the times to take for "make check" on libgomp between
> revision 128238 and HEAD? With C libgomp tests only, revision 128238 takes
> less than 15 seconds on a 1.5GHz ia64 machine. Revision 128239 takes more
> than 20 minutes plus random failures.

Interesting.  At the moment I can't access any ia64 machine directly,
just test via our mail based interface that won't let me time the test.
Are those failures an timeouts?  It might be some sort of deadlocking...

Honza
Comment 5 H.J. Lu 2007-09-12 14:24:25 UTC
(In reply to comment #4)

> 
> Interesting.  At the moment I can't access any ia64 machine directly,
> just test via our mail based interface that won't let me time the test.
> Are those failures an timeouts?  It might be some sort of deadlocking...
> 

All failures are timeout.

Comment 6 Jakub Jelinek 2007-09-12 14:35:22 UTC
Cannot reproduce this with r128425 + PR32337 + PR32338 fix (8way ia64, make -j16 -k check):

Native configuration is ia64-unknown-linux-gnu

                === gcc tests ===


Running target unix
FAIL: gcc.c-torture/execute/mayalias-2.c compilation,  -O3 -g  (internal compiler error)
UNRESOLVED: gcc.c-torture/execute/mayalias-2.c execution,  -O3 -g 
FAIL: gcc.c-torture/execute/mayalias-3.c compilation,  -O3 -g  (internal compiler error)
UNRESOLVED: gcc.c-torture/execute/mayalias-3.c execution,  -O3 -g 
FAIL: gcc.dg/cpp/_Pragma3.c (test for excess errors)
XPASS: gcc.dg/cpp/cmdlne-dI-M.c scan-file (^|\\\\n)cmdlne-dI-M.*:[^\\\\n]*cmdlne-dI-M.c
XPASS: gcc.dg/cpp/cmdlne-dM-M.c scan-file (^|\\\\n)cmdlne-dM-M[^\\\\n]*:[^\\\\n]*cmdlne-dM-M.c
FAIL: gcc.dg/cpp/direct2s.c (test for excess errors)
FAIL: gcc.dg/builtin-apply4.c execution test
FAIL: gcc.dg/pr30643.c scan-assembler-not undefined
WARNING: gcc.dg/torture/fp-int-convert-float128-timode.c  -O0  compilation failed to produce executable
WARNING: gcc.dg/torture/fp-int-convert-float128-timode.c  -O1  compilation failed to produce executable
WARNING: gcc.dg/torture/fp-int-convert-float128-timode.c  -O2  compilation failed to produce executable
WARNING: gcc.dg/torture/fp-int-convert-float128-timode.c  -O3 -fomit-frame-pointer  compilation failed to produce executable
WARNING: gcc.dg/torture/fp-int-convert-float128-timode.c  -O3 -g  compilation failed to produce executable
WARNING: gcc.dg/torture/fp-int-convert-float128-timode.c  -Os  compilation failed to produce executable
FAIL: gcc.dg/torture/fp-int-convert-float128.c  -O0  (test for excess errors)
WARNING: gcc.dg/torture/fp-int-convert-float128.c  -O0  compilation failed to produce executable
FAIL: gcc.dg/torture/fp-int-convert-float128.c  -O1  (test for excess errors)
WARNING: gcc.dg/torture/fp-int-convert-float128.c  -O1  compilation failed to produce executable
FAIL: gcc.dg/torture/fp-int-convert-float128.c  -O2  (test for excess errors)
WARNING: gcc.dg/torture/fp-int-convert-float128.c  -O2  compilation failed to produce executable
FAIL: gcc.dg/torture/fp-int-convert-float128.c  -O3 -fomit-frame-pointer  (test for excess errors)
WARNING: gcc.dg/torture/fp-int-convert-float128.c  -O3 -fomit-frame-pointer  compilation failed to produce executable
FAIL: gcc.dg/torture/fp-int-convert-float128.c  -O3 -g  (test for excess errors)
WARNING: gcc.dg/torture/fp-int-convert-float128.c  -O3 -g  compilation failed to produce executable
FAIL: gcc.dg/torture/fp-int-convert-float128.c  -Os  (test for excess errors)
WARNING: gcc.dg/torture/fp-int-convert-float128.c  -Os  compilation failed to produce executable
XPASS: gcc.dg/tree-ssa/loop-1.c scan-assembler-times foo 5
FAIL: gcc.dg/tree-ssa/ssa-fre-4.c scan-tree-dump Replaced \\(char\\) .*with 
XPASS: gcc.dg/tree-ssa/update-threading.c scan-tree-dump-times Invalid sum 0
FAIL: gcc.dg/vect/vect-104.c scan-tree-dump-times vectorized 1 loops 0
FAIL: gcc.dg/vect/vect-iv-6.c scan-tree-dump-times vectorized 1 loops 1
FAIL: gcc.dg/vect/vect-outer-2.c scan-tree-dump-times OUTER LOOP VECTORIZED 1
FAIL: gcc.dg/vect/vect-outer-2a.c scan-tree-dump-times OUTER LOOP VECTORIZED 1
FAIL: gcc.dg/vect/vect-outer-2c.c scan-tree-dump-times OUTER LOOP VECTORIZED 1
FAIL: gcc.dg/vect/vect-outer-4c.c scan-tree-dump-times OUTER LOOP VECTORIZED 1
FAIL: gcc.dg/vect/vect-outer-5.c scan-tree-dump-times zero step in outer loop. 1
FAIL: gcc.dg/vect/vect-strided-store-u16-i4.c scan-tree-dump-times vectorized 1 loops 1
FAIL: gcc.dg/vect/vect-vfa-slp.c scan-tree-dump-times vectorized 1 loops 1
ERROR: gcc.dg/vect/slp-10.c: error executing dg-final: syntax error in target selector "target !{vect_int_mult}"
UNRESOLVED: gcc.dg/vect/slp-10.c: error executing dg-final: syntax error in target selector "target !{vect_int_mult}"
FAIL: gcc.dg/vect/slp-15.c scan-tree-dump-times vectorizing stmts using SLP 2
FAIL: gcc.dg/vect/slp-25.c scan-tree-dump-times Alignment of access forced using peeling 2
FAIL: gcc.dg/vect/slp-28.c scan-tree-dump-times vectorized 1 loops 1
FAIL: gcc.dg/vect/slp-28.c scan-tree-dump-times vectorizing stmts using SLP 1
FAIL: gcc.dg/vect/slp-3.c scan-tree-dump-times vectorized 3 loops 1
FAIL: gcc.dg/vect/slp-3.c scan-tree-dump-times vectorizing stmts using SLP 3
ERROR: gcc.dg/vect/slp-33.c: error executing dg-final: syntax error in target selector "target !{vect_int_mult}"
UNRESOLVED: gcc.dg/vect/slp-33.c: error executing dg-final: syntax error in target selector "target !{vect_int_mult}"
FAIL: gcc.dg/vect/no-vfa-pr29145.c scan-tree-dump-times vectorized 0 loops 2
FAIL: gcc.dg/vect/no-vfa-pr29145.c scan-tree-dump-times vectorized 1 loops 1
FAIL: gcc.dg/vect/no-vfa-vect-depend-1.c scan-tree-dump-times vectorized 1 loops 1
FAIL: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times OUTER LOOP VECTORIZED. 1
FAIL: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times OUTER LOOP VECTORIZED. 1
FAIL: gcc.dg/vect/no-scevccp-outer-18.c scan-tree-dump-times OUTER LOOP VECTORIZED. 1
FAIL: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times OUTER LOOP VECTORIZED. 1
FAIL: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times OUTER LOOP VECTORIZED. 1

                === gcc Summary ===

# of expected passes            44377
# of unexpected failures        36
# of unexpected successes       4
# of expected failures          207
# of unresolved testcases       4
# of untested testcases         35
# of unsupported tests          314
/tmp/jakub/gcc/obj/gcc/xgcc  version 4.3.0 20070912 (experimental) (GCC) 

                === gfortran tests ===


Running target unix
FAIL: gfortran.dg/do_3.F90  -O3 -fomit-frame-pointer  execution test
FAIL: gfortran.dg/do_3.F90  -O3 -fomit-frame-pointer -funroll-loops  execution test
FAIL: gfortran.dg/do_3.F90  -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions  execution test
FAIL: gfortran.dg/do_3.F90  -O3 -g  execution test

                === gfortran Summary ===

# of expected passes            21091
# of unexpected failures        4
# of expected failures          17
# of unsupported tests          21
/tmp/jakub/gcc/obj/gcc/testsuite/gfortran/../../gfortran  version 4.3.0 20070912 (experimental) (GCC) 

                === g++ tests ===


Running target unix
XPASS: g++.dg/tree-ssa/copyprop-1.C scan-tree-dump-not .* = [^>;]*;
FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn
FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn
FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn
FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn
FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn
FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn
FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn

                === g++ Summary ===

# of expected passes            15848
# of unexpected failures        7
# of unexpected successes       1
# of expected failures          86
# of unsupported tests          129
/tmp/jakub/gcc/obj/gcc/testsuite/g++/../../g++  version 4.3.0 20070912 (experimental) (GCC) 

                === objc tests ===


Running target unix

                === objc Summary ===

# of expected passes            1810
# of expected failures          7
# of unsupported tests          25
/tmp/jakub/gcc/obj/gcc/xgcc  version 4.3.0 20070912 (experimental) (GCC) 

                === libffi tests ===


Running target unix

                === libffi Summary ===

# of expected passes            1108
# of unsupported tests          8
                === libgomp tests ===


Running target unix

                === libgomp Summary ===

# of expected passes            1607
                === libjava tests ===


Running target unix
FAIL: getlocalvartable output
FAIL: Throw_3 -O3 output - source compiled test
FAIL: Throw_3 -O3 -findirect-dispatch output - source compiled test

                === libjava Summary ===

# of expected passes            2547
# of unexpected failures        3
                === libmudflap tests ===


Running target unix
FAIL: libmudflap.c++/pass41-frag.cxx execution test
...
Many failures, but this was make -j16 -k check on 8way ia64
and libmudflap tests are horribly sensitive to loadavg.
FAIL: libmudflap.cth/pass40-frag.c (-O3) output pattern test

                === libmudflap Summary ===

# of expected passes            1493
# of unexpected failures        333
                === libstdc++ tests ===


Running target unix
FAIL: 22_locale/moneypunct_byname/named_equivalence.cc (test for excess errors)
WARNING: 22_locale/moneypunct_byname/named_equivalence.cc compilation failed to produce executable
XPASS: 26_numerics/headers/cmath/c99_classification_macros_c.cc (test for excess errors)
XPASS: 27_io/fpos/14320-1.cc execution test

                === libstdc++ Summary ===

# of expected passes            5169
# of unexpected failures        1
# of unexpected successes       2
# of expected failures          56
# of unsupported tests          8

Compiler version: 4.3.0 20070912 (experimental) (GCC) 
Platform: ia64-unknown-linux-gnu

Comment 7 Jakub Jelinek 2007-09-12 14:39:39 UTC
BTW, when you say -fno-inline-small-functions cures this for you, is the problem
on libgomp side or in the generated gomp tests?  I.e. if you build libgomp
with the default -O2 -finline-small-functions and run
make check in libgomp with RUNTESTFLAGS=--target_board=unix/-fno-inline-small-functions
do the problems go away?  Or vice versa, build libgomp built with
-O2 -fno-inline-small-functions vs. RUNTEST=--target_board=unix/-finline-small-functions
?
Comment 8 H.J. Lu 2007-09-12 14:40:43 UTC
(In reply to comment #6)
> Cannot reproduce this with r128425 + PR32337 + PR32338 fix (8way ia64, make
> -j16 -k check):
> 

From

http://gcc.gnu.org/ml/gcc-testresults/2007-09/msg00573.html

		=== libgomp tests ===


Running target unix
FAIL: libgomp.c/pr26943-2.c execution test
FAIL: libgomp.c/pr26943-4.c execution test
FAIL: libgomp.c/pr29947-1.c execution test
FAIL: libgomp.fortran/omp_parse3.f90  -O0  execution test

Have you checked the times it takes to run "make check" on libgomp? The
problem is random failure with timeout. "make check" may pass, but takes
much longer to finish.
Comment 9 H.J. Lu 2007-09-12 14:41:59 UTC
(In reply to comment #7)
> BTW, when you say -fno-inline-small-functions cures this for you, is the
> problem
> on libgomp side or in the generated gomp tests?  I.e. if you build libgomp

Apparently not. From

http://gcc.gnu.org/ml/gcc-testresults/2007-09/msg00581.html

		=== libgomp tests ===


Running target unix
FAIL: libgomp.c/pr26943-2.c execution test
FAIL: libgomp.fortran/omp_parse2.f90  -O0  execution test
FAIL: libgomp.fortran/omp_parse2.f90  -O2  execution test
FAIL: libgomp.fortran/omp_parse2.f90  -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions  execution test
FAIL: libgomp.fortran/omp_parse3.f90  -O0  execution test

		=== libgomp Summary ===

# of expected passes		1602
# of unexpected failures	5

The nature of this bug is random failure.
Comment 10 Jakub Jelinek 2007-09-12 15:04:26 UTC
The whole testsuite takes roughly 10 minutes:
head -n2 libgomp.log; tail -n4 libgomp.log
Test Run By root on Wed Sep 12 10:47:12 2007
Native configuration is ia64-unknown-linux-gnu
                === libgomp Summary ===

# of expected passes            1607
runtest completed at Wed Sep 12 10:57:39 2007

and just RUNTESTFLAGS=c.exp takes around 30sec, no failures here (stock RHEL5).

Can you try to run make check with say libgomp.so* from some older build before
you started seeing this or from 4.2?  Knowing whether your libgomp was miscompiled or if the testcases themselves are miscompiled is really crucial
to finding out what's going on.
Comment 11 H.J. Lu 2007-09-12 15:36:13 UTC
(In reply to comment #10)

> you started seeing this or from 4.2?  Knowing whether your libgomp was
> miscompiled or if the testcases themselves are miscompiled is really crucial
> to finding out what's going on.
> 

I verified that it was libgomp which was miscompiled. BTW, I am running RHEL4 U5.
Comment 12 Jakub Jelinek 2007-09-12 15:48:19 UTC
Can you see if e.g. building whole libgomp with -O0 fixes it?
If yes, can you do a binary search which of the sources is miscompiled?
Thanks.
Comment 13 H.J. Lu 2007-09-12 16:51:51 UTC
(In reply to comment #12)
> Can you see if e.g. building whole libgomp with -O0 fixes it?
> If yes, can you do a binary search which of the sources is miscompiled?
> Thanks.
> 

When I replaced bar.o with the bad one, C only "make check" went from
13 seconds to 84 seconds.

Comment 14 Jakub Jelinek 2007-09-12 16:59:41 UTC
Can you please attach bad and good bar.s (with -fverbose-asm) and also
-ftree-dump-optimized dump from both?
Comment 15 H.J. Lu 2007-09-12 17:10:14 UTC
Created attachment 14197 [details]
Good bar.s and bar.c.117t.optimized generated by revision 128238
Comment 16 H.J. Lu 2007-09-12 17:11:22 UTC
Created attachment 14198 [details]
Bad bar.s and  bar.c.121t.optimized generated by revision 128239
Comment 17 H.J. Lu 2007-09-12 17:18:03 UTC
gomp_barrier_wait_end in bar.c.121t.optimized has

  __asm__ __volatile__("break 0x100000":"=r" r15, "=r" out0, "=r" out1, "=r" out2, "=r" out3:"r" r15, "r" out0, "r" out1, "r" out2, "r" out3:"b6", "f15", "f14", "f13", "f12", "f11", "f10", "f9", "f8", "f7", "f6", "p15", "p14", "p13", "p12", "p11", "p10", "p9", "p8", "p7", "p6", "r31", "r30", "r29", "r28", "r27", "r26", "r25", "r24", "r23", "r22", "r21", "r20", "r19", "r18", "r17", "r16", "r14", "r13", "r12", "r11", "r9", "r3", "r2", "out7", "out6", "out5", "out4", "r10", "r8", "memory"); 

But r15, out0, out1, out2 and out3 aren't initialized at all.
Comment 18 H.J. Lu 2007-09-12 17:20:09 UTC
Revision 128239 miscompiled

static inline void
sys_futex0(int *addr, int op, int val)
{
  register long out0 asm ("out0") = (long) addr;
  register long out1 asm ("out1") = op;
  register long out2 asm ("out2") = val;
  register long out3 asm ("out3") = 0;
  register long r15 asm ("r15") = SYS_futex;

  __asm __volatile ("break 0x100000"
        : "=r"(r15), "=r"(out0), "=r"(out1), "=r"(out2), "=r"(out3)
        : "r"(r15), "r"(out0), "r"(out1), "r"(out2), "r"(out3)
        : "memory", "r8", "r10", "out4", "out5", "out6", "out7",
          /* Non-stacked integer registers, minus r8, r10, r15.  */
          "r2", "r3", "r9", "r11", "r12", "r13", "r14", "r16", "r17", "r18",
          "r19", "r20", "r21", "r22", "r23", "r24", "r25", "r26", "r27",
          "r28", "r29", "r30", "r31",
          /* Predicate registers.  */
          "p6", "p7", "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15",
          /* Non-rotating fp registers.  */
          "f6", "f7", "f8", "f9", "f10", "f11", "f12", "f13", "f14", "f15",
          /* Branch registers.  */
          "b6");
}
Comment 19 Jakub Jelinek 2007-09-12 17:49:47 UTC
Here is self-contained source:
void
 sys_futex0(int *addr, int op, int val)
 {
   register long out0 asm ("out0") = (long) addr;
   register long out1 asm ("out1") = op;
   register long out2 asm ("out2") = val;
   register long out3 asm ("out3") = 0;
   register long r15 asm ("r15") = 1230;

   __asm __volatile ("break 0x100000"
         : "=r"(r15), "=r"(out0), "=r"(out1), "=r"(out2), "=r"(out3)
         : "r"(r15), "r"(out0), "r"(out1), "r"(out2), "r"(out3)
         : "memory", "r8", "r10", "out4", "out5", "out6", "out7",
           /* Non-stacked integer registers, minus r8, r10, r15.  */
           "r2", "r3", "r9", "r11", "r12", "r13", "r14", "r16", "r17", "r18",
           "r19", "r20", "r21", "r22", "r23", "r24", "r25", "r26", "r27",
           "r28", "r29", "r30", "r31",
           /* Predicate registers.  */
           "p6", "p7", "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15",
           /* Non-rotating fp registers.  */
           "f6", "f7", "f8", "f9", "f10", "f11", "f12", "f13", "f14", "f15",
           /* Branch registers.  */
           "b6");
 }
vs.
void
 sys_futex0(int *addr, int op, int val)
 {
   register long out0 asm ("out0") = (long) addr;
   register long out1 asm ("out1") = op;
   register long out2 asm ("out2") = val;
   register long out3 asm ("out3") = 0;
   register long r15 asm ("r15") = 1230;

   __asm __volatile ("break 0x100000"
         :
         : "r"(r15), "r"(out0), "r"(out1), "r"(out2), "r"(out3)
         : "memory", "r8", "r10", "out4", "out5", "out6", "out7",
           /* Non-stacked integer registers, minus r8, r10, r15.  */
           "r2", "r3", "r9", "r11", "r12", "r13", "r14", "r16", "r17", "r18",
           "r19", "r20", "r21", "r22", "r23", "r24", "r25", "r26", "r27",
           "r28", "r29", "r30", "r31",
           /* Predicate registers.  */
           "p6", "p7", "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15",
           /* Non-rotating fp registers.  */
           "f6", "f7", "f8", "f9", "f10", "f11", "f12", "f13", "f14", "f15",
           /* Branch registers.  */
           "b6");
 }

The former is miscompiled by sdse1 pass:
  Deleted dead store '# STORES:  { out0D.1189 }
out0D.1189 ={v} addr.0D.1194_2'
  Deleted dead store '# STORES:  { out1D.1190 }
out1D.1190 ={v} out1.1D.1195_4'
  Deleted dead store '# STORES:  { out2D.1191 }
out2D.1191 ={v} out2.2D.1196_6'
  Deleted dead store '# STORES:  { out3D.1192 }
out3D.1192 ={v} 0'
  Deleted dead store '# STORES:  { r15D.1193 }
r15D.1193 ={v} 1230'
sys_futex0 (addrD.1184, opD.1185, valD.1186)
{
  register long intD.2 r15D.1193 __asm__ (*r15);
  register long intD.2 out3D.1192 __asm__ (*out3);
  register long intD.2 out2D.1191 __asm__ (*out2);
  register long intD.2 out1D.1190 __asm__ (*out1);
  register long intD.2 out0D.1189 __asm__ (*out0);
  long intD.2 out2.2D.1196;
  long intD.2 out1.1D.1195;
  long intD.2 addr.0D.1194;

  # BLOCK 2
  # PRED: ENTRY (fallthru,exec)
  addr.0D.1194_2 = (long intD.2) addrD.1184_1(D);
  out1.1D.1195_4 = (long intD.2) opD.1185_3(D);
  out2.2D.1196_6 = (long intD.2) valD.1186_5(D);
  # STORES:  { out0D.1189 out1D.1190 out2D.1191 out3D.1192 r15D.1193 }
  __asm__ __volatile__("break 0x100000":"=r" r15D.1193, "=r" out0D.1189, "=r" out1D.1190, "=r" out2D.1191, "=r" out3D.1192:"r" r1
5D.1193, "r" out0D.1189, "r" out1D.1190, "r" out2D.1191, "r" out3D.1192:"b6", "f15", "f14", "f13", "f12", "f11", "f10", "f9", "f8
", "f7", "f6", "p15", "p14", "p13", "p12", "p11", "p10", "p9", "p8", "p7", "p6", "r31", "r30", "r29", "r28", "r27", "r26", "r25",
 "r24", "r23", "r22", "r21", "r20", "r19", "r18", "r17", "r16", "r14", "r13", "r12", "r11", "r9", "r3", "r2", "out7", "out6", "ou
t5", "out4", "r10", "r8", "memory");
  return;
  # SUCC: EXIT
while the latter is not:
;; Function sys_futex0 (sys_futex0)

sys_futex0 (addrD.1184, opD.1185, valD.1186)
{
  register long intD.2 r15D.1193 __asm__ (*r15);
  register long intD.2 out3D.1192 __asm__ (*out3);
  register long intD.2 out2D.1191 __asm__ (*out2);
  register long intD.2 out1D.1190 __asm__ (*out1);
  register long intD.2 out0D.1189 __asm__ (*out0);
  long intD.2 out2.2D.1196;
  long intD.2 out1.1D.1195;
  long intD.2 addr.0D.1194;

  # BLOCK 2
  # PRED: ENTRY (fallthru,exec)
  addr.0D.1194_2 = (long intD.2) addrD.1184_1(D);
  # STORES:  { out0D.1189 }
  out0D.1189 ={v} addr.0D.1194_2;
  out1.1D.1195_4 = (long intD.2) opD.1185_3(D);
  # STORES:  { out1D.1190 }
  out1D.1190 ={v} out1.1D.1195_4;
  out2.2D.1196_6 = (long intD.2) valD.1186_5(D);
  # STORES:  { out2D.1191 }
  out2D.1191 ={v} out2.2D.1196_6;
  # STORES:  { out3D.1192 }
  out3D.1192 ={v} 0;
  # STORES:  { r15D.1193 }
  r15D.1193 ={v} 1230;
  # LOADS:  { out0D.1189 out1D.1190 out2D.1191 out3D.1192 r15D.1193 }
  __asm__ __volatile__("break 0x100000"::"r" r15D.1193, "r" out0D.1189, "r" out1D.1190, "r" out2D.1191, "r" out3D.1192:"b6", "f15
", "f14", "f13", "f12", "f11", "f10", "f9", "f8", "f7", "f6", "p15", "p14", "p13", "p12", "p11", "p10", "p9", "p8", "p7", "p6", "
r31", "r30", "r29", "r28", "r27", "r26", "r25", "r24", "r23", "r22", "r21", "r20", "r19", "r18", "r17", "r16", "r14", "r13", "r12
", "r11", "r9", "r3", "r2", "out7", "out6", "out5", "out4", "r10", "r8", "memory");
  return;
  # SUCC: EXIT

Seems if the same reg is both input and output it is just among
STORED_SYMS, not LOADED_SYMS.
Comment 20 Jakub Jelinek 2007-09-12 18:26:39 UTC
append_vuse has:
static inline void
append_vuse (tree var)
{
  tree sym;

  if (TREE_CODE (var) != SSA_NAME)
    {
      tree mpt;
      var_ann_t ann;

      /* If VAR belongs to a memory partition, use it instead of VAR.  */
      mpt = memory_partition (var);
      if (mpt)
        var = mpt;

      /* Don't allow duplicate entries.  */
      ann = get_var_ann (var);
      if (ann->in_vuse_list || ann->in_vdef_list)
        return;

      ann->in_vuse_list = true;
      sym = var;
    }
  else
    sym = SSA_NAME_VAR (var);

  VEC_safe_push (tree, heap, build_vuses, var);
  bitmap_set_bit (build_loads, DECL_UID (sym));
}

For VAR_DECL with DECL_HARD_REGISTER set, if it is only in the input list of asm,
it is added as vuse/load, if it is only in output list of am, it is added as vdef/store.  But the above /* Don't allow duplicate entries.  */ means
that if there is "=r" (x) : "r" (x) with VAR_DECL DECL_HARD_REGISTER x, it is
only marked as store/vdef, not as load/vuse.

No matter whether it is right or wrong in tree-ssa-operands.c, I think
sdse should just avoid removing VAR_DECL DECL_HARD_REGISTER store elimination.
Comment 21 wilson@specifix.com 2007-09-12 19:30:55 UTC
Subject: Re:  [4.3 Regression]  Revision 128239
 causes libgomp failure

The failure occurs in the sdse (simple dead store elimination) pass.

It looks like it is confused by an extended asm statement with 
input/output operands.  It thinks these operands are not read by the 
asm.  If the only use of a variable is inside such an asm, then sdse 
will think that the variable is never read, and optimize away stores to 
it.  I haven't tried debugging sdse yet to see exactly what is going wrong.

Jim
Comment 22 wilson@specifix.com 2007-09-12 23:35:22 UTC
Subject: Re:  [4.3 Regression]  Revision 128239
 causes libgomp failure

jakub at gcc dot gnu dot org wrote:
> No matter whether it is right or wrong in tree-ssa-operands.c, I think
> sdse should just avoid removing VAR_DECL DECL_HARD_REGISTER store elimination.

This isn't a volatile register, nor a global register variable, so there 
should be no problem performing dead store elimination on it.  I think 
tree-ssa-operands.c needs to be fixed.

I'm not a tree-ssa expert, but the fix needed in tree-ssa-operands.c 
seems simple enough.  The append_vuse function does two things, it keeps 
a list of v_uses, and it keeps a list of loads.  We don't want a v_use 
if we already have a v_def, but we still need to mark it as a load.  We 
can do that with a little rearrangement of the code.  Like this.

This works for the testcase, but has had no other testing as yet.

2007-09-12  James E. Wilson  <wilson@specifix.com>

         * tree-ssa-operands.c (append_vuse): If ann->in_vdef_list true,
         then set build_loads before returning.

Index: tree-ssa-operands.c
===================================================================
--- tree-ssa-operands.c (revision 128394)
+++ tree-ssa-operands.c (working copy)
@@ -1164,8 +1164,15 @@

        /* Don't allow duplicate entries.  */
        ann = get_var_ann (var);
-      if (ann->in_vuse_list || ann->in_vdef_list)
+      if (ann->in_vuse_list)
         return;
+      else if (ann->in_vdef_list)
+       {
+         /* We don't want a vuse if we already have a vdef, but we must
+            still put this in build_loads.  */
+         bitmap_set_bit (build_loads, DECL_UID (var));
+         return;
+       }

        ann->in_vuse_list = true;
        sym = var;
Comment 23 H.J. Lu 2007-09-13 00:13:57 UTC
(In reply to comment #22)
>
> This works for the testcase, but has had no other testing as yet.
> 
> 2007-09-12  James E. Wilson  <wilson@specifix.com>
> 
>          * tree-ssa-operands.c (append_vuse): If ann->in_vdef_list true,
>          then set build_loads before returning.
> 
> Index: tree-ssa-operands.c
> ===================================================================
> --- tree-ssa-operands.c (revision 128394)
> +++ tree-ssa-operands.c (working copy)
> @@ -1164,8 +1164,15 @@
> 
>         /* Don't allow duplicate entries.  */
>         ann = get_var_ann (var);
> -      if (ann->in_vuse_list || ann->in_vdef_list)
> +      if (ann->in_vuse_list)
>          return;
> +      else if (ann->in_vdef_list)
> +       {
> +         /* We don't want a vuse if we already have a vdef, but we must
> +            still put this in build_loads.  */
> +         bitmap_set_bit (build_loads, DECL_UID (var));
> +         return;
> +       }
> 
>         ann->in_vuse_list = true;
>         sym = var;
> 

It works for C tests in libgomp. I am bootstrapping on Linux/ia64 now. Thanks.
Comment 24 H.J. Lu 2007-09-13 04:03:06 UTC
(In reply to comment #23)
> (In reply to comment #22)
>
> > 2007-09-12  James E. Wilson  <wilson@specifix.com>
> > 
> >          * tree-ssa-operands.c (append_vuse): If ann->in_vdef_list true,
> >          then set build_loads before returning.
> > 
> 
> It works for C tests in libgomp. I am bootstrapping on Linux/ia64 now. Thanks.
> 

It seems fine on Linux/ia64, Linux/ia32 and Linux/x86-64.
Comment 25 Jan Hubicka 2007-09-13 11:27:22 UTC
Thanks a lot for tracking down this issue.  The patch looks correct to me, can you please send it into gcc-patches?

Honza
Comment 26 H.J. Lu 2007-09-13 16:09:27 UTC
(In reply to comment #25)
> Thanks a lot for tracking down this issue.  The patch looks correct to me, can
> you please send it into gcc-patches?
> 
> Honza
> 

Posted at

http://gcc.gnu.org/ml/gcc-patches/2007-09/msg01178.html
Comment 27 hjl@gcc.gnu.org 2007-09-13 16:34:00 UTC
Subject: Bug 33389

Author: hjl
Date: Thu Sep 13 16:33:49 2007
New Revision: 128469

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=128469
Log:
2007-09-13  James E. Wilson  <wilson@specifix.com>

	PR tree-optimization/33389
	* tree-ssa-operands.c (append_vuse): If ann->in_vdef_list true,
	then set build_loads before returning.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-operands.c

Comment 28 Jakub Jelinek 2007-09-14 11:32:38 UTC
Thanks, Jim.