Since end of May, quite a few libstdc++-v3 testcases, stressing atomicity.h are failing on multi-processor ia64 machines. See, for instance, in: http://gcc.gnu.org/ml/gcc-testresults/2005-11/msg00411.html FAIL: 22_locale/locale/cons/12658_thread-1.cc execution test FAIL: 22_locale/locale/cons/12658_thread-2.cc execution test Those tests *never* fail in 4_0-branch, which doesn't use the builtins, and never did in mainline before the below of mine (and a simultaneous one to the compiler, which emptied ia64intrin.h) 2005-05-25 Paolo Carlini <pcarlini@suse.de> * config/cpu/ia64/atomicity.h: Do not include ia64intrin.h. My analysis is that __sync_fetch_and_add is somehow misbehaving on ia64. I will try to investigate further the actual assembly, but I don't make promises, sorry.
(In reply to comment #0) > Those tests *never* fail in 4_0-branch, which doesn't use the builtins, and > never did in mainline before the below of mine (and a simultaneous one to > the compiler, which emptied ia64intrin.h) Whereas it *is* true that those tests never fail in 4_0-branch, the second part is not strictly true: sometimes, in mainline the tests failed also before that date. I would say, much less consistently. This is all I know, for now.
Hmm you said in: http://gcc.gnu.org/ml/libstdc++/2005-11/msg00149.html That was really a glibc bug. And actually 4.0 and before uses the builtins for ia64, this is where the builtins came from in the first place.
(In reply to comment #2) > Hmm you said in: > http://gcc.gnu.org/ml/libstdc++/2005-11/msg00149.html > > That was really a glibc bug. Exactly *was*. Ehi, do you think I'm stupid? Of course in the meanwhile I have checked that the problem is present in machines equipped with glibc2.3.5 and is not, on the same machines, for 4_0-branch. > And actually 4.0 and before uses the builtins for ia64, this is where the > builtins came from in the first place. Exactly. But, given the glibc bug above, masking the gcc behavior well into 2005 for some of our machines, at least, isn't possible that when we switched to builtins everywhere something got broken? Alternately, the ia64 builtins themselves can be defective, but that seems much less likely to me, because we are talking about a very consistent behavior for relatively simple usages of one single mutex and one single atomic counter.
(In reply to comment #3) > .............................................Alternately, the ia64 builtins > themselves can be defective, but that seems much less likely to me, because > we are talking about a very consistent behavior for relatively simple usages > of one single mutex and one single atomic counter. And this is also inconsistent with 4_0-branch, where we have no problems, never during the last 6 months, at least.
Some additional details from testresults: the bulk of the builtin atomics changes went in around mid of April, the ia64 specific bits, on April, 14. All the results that Andreas sent at the beginning of the month (for instance april, 7, 8, 9) are fine. Then, at the end of the month (for instance, april, 22, 23, 24) the results are not clean anymore. Lately, never less than 2 failures, often 4, in threading tests.
Actually, I'm pretty much worried by this issue: whatever is at fault, as a matter of fact, for this target the library testsuite shows a severe misbehavior in a MT environment. Raising the priority/severity seems an appropriate action.
I'm marking this as a 4.1 Regression: certainly it is. If people can figure a way to workaround it at the library level, I'm ok with that.
I have got additional evidence that __sync_fetch_and_add does not work correctly. If, for example, in this code, stressed in the testcase 12658_thread-1.cc, and using atomic operations: const locale& locale::operator=(const locale& __other) throw() { __other._M_impl->_M_add_reference(); _M_impl->_M_remove_reference(); _M_impl = __other._M_impl; return *this; } I wrap everything in the locale_cons mutex then the testcase doesn't fail anymore.
I should note that ld.acq+cmpxchg.rel is a full barrier. as noted in ia64.c so a mf is not required. See also http://www.ussg.iu.edu/hypermail/linux/kernel/0205.1/0226.html
(In reply to comment #9) > I should note that > ld.acq+cmpxchg.rel is a full barrier. > as noted in ia64.c so a mf is not required. Did you read the last message from Alexander on the libstdc++ mailing list, in particular the last paragraph?
(In reply to comment #9) > I should note that > ld.acq+cmpxchg.rel is a full barrier. > as noted in ia64.c so a mf is not required. If that is really the issue, then http://gcc.gnu.org/ml/gcc-patches/2005-05/msg00872.html casued it but I really doubt it from reading that linux kernel message.
(In reply to comment #10) > Did you read the last message from Alexander on the libstdc++ mailing list, in > particular the last paragraph? Yes but I trust RTH better than Alexander on this issue.
(In reply to comment #12) > Yes but I trust RTH better than Alexander on this issue. And if you read the message which I pointed to: The semaphore instructions also have implicit ordering. So if I am reading this wrong, then there is something else wrong.
I'm not sure that specific patch is the culprit, because (see Comment #5) the threaded testcases started failing before that date. I would like to see the assembly before and after that date. In any case, it would be nice if Richard Henderson could analyze a bit this PR. Also relevant: http://gcc.gnu.org/ml/libstdc++/2005-11/msg00206.html http://gcc.gnu.org/ml/libstdc++/2005-11/msg00207.html
See also: http://www.linuxbase.org/spec/refspecs/IA64-softdevman-vol1.pdf Page73. Which is where the comment in comment #13. So If I read table 4-20 currently then the memory access will always be ordered in that loop. And if it started before that date, then there is something wrong as that is when the major difference between 4.0 and 4.1 started.
Alex's explanation can't be all of it. I hacked the compiler to emit two "mf" instructions before and after the sequence here, and it made no difference to the two tests failing. I really have no idea what's going on.
(In reply to comment #16) > Alex's explanation can't be all of it. I hacked the compiler to emit two > "mf" instructions before and after the sequence here, and it made no > difference to the two tests failing. I really have no idea what's going on. Thanks for your experiments. I'll try to help further, come up with a reduced piece of code actually failing, investigate whether something else important went in mid of April, maybe. I don't have much more, right now :(
Dammit, lemme try again; I patched a different tree than I tested.
Created attachment 10287 [details] Slightly hacked, self contained test (for use outside the testsuite)
Created attachment 10288 [details] Slightly hacked, self contained test (for use outside the testsuite)
Just in case, those are ready to use outside the testsuite... For me, especially the second one, fails very quickly and consistently on 4-way and 16-way.
I think we need to understand this problem before we release. It might be that we downgrade the priority if it turns out to be a problem outside our control.
Created attachment 10289 [details] mf hack For the record, here's the aforementioned hack. And it does appear to change the behaviour of one of the testsuite tests, but it doesn't fix both of them. I'll claim, then, that it didn't fix either of them, but merely changed the timing. Feel free to tell me what's still not right, but it seems like, with a hammer this big, that it isn't the synchronization that's the problem.
(In reply to comment #23) > Created an attachment (id=10289) [edit] > mf hack > > For the record, here's the aforementioned hack. Ok, thanks. Over the WE I'll play a bit with it. Among other things, I want to check its effects on other tests, like the thread/pthread*.cc, and also for a different version of the library, which uses more refcounting in the string class and for me exposed many more troubles during the last days.
(In reply to comment #23) > And it does appear to change the behaviour of one of the testsuite tests, > but it doesn't fix both of them. I'll claim, then, that it didn't fix either > of them, but merely changed the timing. Confirmed here. Actually, for me, the "improvement" is barely noticeable. It's something else...
Created attachment 10297 [details] Reduced from thread/pthread3 Richard, I'm attaching a really minimal testcase, which fails very quickly for me with mainline or 4_1-branch and doesn't with 4_0-branch. It's just two threads and many constructions and destructions of a trivial class which increases (descreases, respectively) atomically a static counter. Of course the abort should never trigger. Can you debug further with this? Otherwise, just tell me, ok?
The only thing in the pthread3_reduced.cc testcase that could make it going wrong is wrong alignment of aw. And I remember seeing a lot of unaligned traps in dmesg during libstdc++ testing on ia64.
Thanks Richard G. In fact, from time to time I saw warnings on the shell, which really puzzled me, because I was sure that no alignment issues were present anymore in the higher lever library code proper. I'll try to add an alignment attribute to the _Atomic_word typedef, as cris is already doing for instance, and see whether something changes for the better. Interestingly no such issue existed with 4_0-branch... Any idea why?
(In reply to comment #28) > I'll try to add an alignment attribute to the _Atomic_word typedef, as cris > is already doing for instance, and see whether something changes for the > better. Nope. Maybe the alignment is for some reason broken to the point that not even an explicit __attribute__ ((aligned (X))) can fix it... I don't know. Anyway, the problem is very easy to reproduce, now. I'm also attaching a further reduced, pure C testcase.
Created attachment 10298 [details] Further reduced, pure C
Testing this patch: Index: ia64.c =================================================================== --- ia64.c (revision 107220) +++ ia64.c (working copy) @@ -2113,7 +2113,7 @@ emit_insn (GEN_FCN (icode) (cmp_reg, mem, ar_ccv, new_reg)); - emit_cmp_and_jump_insns (cmp_reg, old_reg, EQ, NULL, DImode, true, label); + emit_cmp_and_jump_insns (cmp_reg, old_reg, NE, NULL, DImode, true, label); } /* Begin the assembly file. */
Oh good grief. I can't see the forest for the trees. Andreas, I expect that patch will work. If it tests ok, please commit it.
Subject: Bug 24757 Author: schwab Date: Sun Nov 20 10:43:43 2005 New Revision: 107246 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=107246 Log: PR target/24757 * config/ia64/ia64.c (ia64_expand_atomic_op): Fix condition of cmp insn. Modified: trunk/gcc/ChangeLog trunk/gcc/config/ia64/ia64.c
Subject: Bug 24757 Author: schwab Date: Sun Nov 20 10:44:27 2005 New Revision: 107247 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=107247 Log: PR target/24757 * config/ia64/ia64.c (ia64_expand_atomic_op): Fix condition of cmp insn. Modified: branches/gcc-4_1-branch/gcc/ChangeLog branches/gcc-4_1-branch/gcc/config/ia64/ia64.c
Fixed.