Bug 24757 - [4.1 Regression] __sync_fetch_and_add on ia64
Summary: [4.1 Regression] __sync_fetch_and_add on ia64
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.1.0
: P1 major
Target Milestone: 4.1.0
Assignee: Andreas Schwab
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-11-09 16:45 UTC by Paolo Carlini
Modified: 2005-11-20 10:45 UTC (History)
3 users (show)

See Also:
Host:
Target: ia64-linux
Build:
Known to work:
Known to fail:
Last reconfirmed: 2005-11-10 09:59:06


Attachments
Slightly hacked, self contained test (for use outside the testsuite) (997 bytes, text/plain)
2005-11-19 01:05 UTC, Paolo Carlini
Details
Slightly hacked, self contained test (for use outside the testsuite) (939 bytes, text/plain)
2005-11-19 01:06 UTC, Paolo Carlini
Details
mf hack (436 bytes, patch)
2005-11-19 02:09 UTC, Richard Henderson
Details | Diff
Reduced from thread/pthread3 (298 bytes, text/plain)
2005-11-19 22:12 UTC, Paolo Carlini
Details
Further reduced, pure C (388 bytes, text/plain)
2005-11-19 23:06 UTC, Paolo Carlini
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Paolo Carlini 2005-11-09 16:45:20 UTC
Since end of May, quite a few libstdc++-v3 testcases, stressing atomicity.h
are failing on multi-processor ia64 machines. See, for instance, in:

  http://gcc.gnu.org/ml/gcc-testresults/2005-11/msg00411.html

  FAIL: 22_locale/locale/cons/12658_thread-1.cc execution test
  FAIL: 22_locale/locale/cons/12658_thread-2.cc execution test

Those tests *never* fail in 4_0-branch, which doesn't use the builtins, and
never did in mainline before the below of mine (and a simultaneous one to
the compiler, which emptied ia64intrin.h)

2005-05-25  Paolo Carlini  <pcarlini@suse.de>

        * config/cpu/ia64/atomicity.h: Do not include ia64intrin.h.

My analysis is that __sync_fetch_and_add is somehow misbehaving on ia64.
I will try to investigate further the actual assembly, but I don't make
promises, sorry.
Comment 1 Paolo Carlini 2005-11-09 17:00:05 UTC
(In reply to comment #0)
> Those tests *never* fail in 4_0-branch, which doesn't use the builtins, and
> never did in mainline before the below of mine (and a simultaneous one to
> the compiler, which emptied ia64intrin.h)

Whereas it *is* true that those tests never fail in 4_0-branch, the second part
is not strictly true: sometimes, in mainline the tests failed also before that
date. I would say, much less consistently. This is all I know, for now.
Comment 2 Andrew Pinski 2005-11-09 17:35:49 UTC
Hmm you said in:
http://gcc.gnu.org/ml/libstdc++/2005-11/msg00149.html

That was really a glibc bug.

And actually 4.0 and before uses the builtins for ia64, this is where the builtins came from in the first place.
Comment 3 Paolo Carlini 2005-11-09 17:45:13 UTC
(In reply to comment #2)
> Hmm you said in:
> http://gcc.gnu.org/ml/libstdc++/2005-11/msg00149.html
> 
> That was really a glibc bug.

Exactly *was*. Ehi, do you think I'm stupid? Of course in the meanwhile
I have checked that the problem is present in machines equipped with
glibc2.3.5 and is not, on the same machines, for 4_0-branch.

> And actually 4.0 and before uses the builtins for ia64, this is where the
> builtins came from in the first place.

Exactly. But, given the glibc bug above, masking the gcc behavior well into
2005 for some of our machines, at least, isn't possible that when we switched
to builtins everywhere something got broken? Alternately, the ia64 builtins
themselves can be defective, but that seems much less likely to me, because
we are talking about a very consistent behavior for relatively simple usages
of one single mutex and one single atomic counter.
Comment 4 Paolo Carlini 2005-11-09 17:51:23 UTC
(In reply to comment #3)
> .............................................Alternately, the ia64 builtins
> themselves can be defective, but that seems much less likely to me, because
> we are talking about a very consistent behavior for relatively simple usages
> of one single mutex and one single atomic counter.

And this is also inconsistent with 4_0-branch, where we have no problems, never
during the last 6 months, at least.
Comment 5 Paolo Carlini 2005-11-09 19:30:17 UTC
Some additional details from testresults: the bulk of the builtin atomics changes
went in around mid of April, the ia64 specific bits, on April, 14. All the
results that Andreas sent at the beginning of the month (for instance april, 7,
8, 9) are fine. Then, at the end of the month (for instance, april, 22, 23, 24)
the results are not clean anymore.

Lately, never less than 2 failures, often 4, in threading tests.
Comment 6 Paolo Carlini 2005-11-10 01:41:07 UTC
Actually, I'm pretty much worried by this issue: whatever is at fault, as a
matter of fact, for this target the library testsuite shows a severe misbehavior
in a MT environment. Raising the priority/severity seems an appropriate action.
Comment 7 Paolo Carlini 2005-11-10 09:59:06 UTC
I'm marking this as a 4.1 Regression: certainly it is. If people can figure a
way to workaround it at the library level, I'm ok with that.
Comment 8 Paolo Carlini 2005-11-17 13:50:01 UTC
I have got additional evidence that __sync_fetch_and_add does not work 
correctly.  If, for example, in this code, stressed in the testcase
12658_thread-1.cc, and using atomic operations:

  const locale&
  locale::operator=(const locale& __other) throw()
  {
    __other._M_impl->_M_add_reference();
    _M_impl->_M_remove_reference();
    _M_impl = __other._M_impl;
    return *this;
  }

I wrap everything in the locale_cons mutex then the testcase doesn't fail
anymore.
Comment 9 Andrew Pinski 2005-11-18 15:20:05 UTC
I should note that 
ld.acq+cmpxchg.rel is a full barrier.
as noted in ia64.c so a mf is not required.

See also http://www.ussg.iu.edu/hypermail/linux/kernel/0205.1/0226.html
Comment 10 Paolo Carlini 2005-11-18 15:23:20 UTC
(In reply to comment #9)
> I should note that 
> ld.acq+cmpxchg.rel is a full barrier.
> as noted in ia64.c so a mf is not required.

Did you read the last message from Alexander on the libstdc++ mailing list, in
particular the last paragraph?
Comment 11 Andrew Pinski 2005-11-18 15:24:33 UTC
(In reply to comment #9)
> I should note that 
> ld.acq+cmpxchg.rel is a full barrier.
> as noted in ia64.c so a mf is not required.

If that is really the issue, then 
http://gcc.gnu.org/ml/gcc-patches/2005-05/msg00872.html casued it but I really doubt it from reading that linux kernel message.
Comment 12 Andrew Pinski 2005-11-18 15:27:04 UTC
(In reply to comment #10)
> Did you read the last message from Alexander on the libstdc++ mailing list, in
> particular the last paragraph?
Yes but I trust RTH better than Alexander on this issue.
Comment 13 Andrew Pinski 2005-11-18 15:29:30 UTC
(In reply to comment #12)
> Yes but I trust RTH better than Alexander on this issue.

And if you read the message which I pointed to:
The semaphore instructions also have implicit ordering.

So if I am reading this wrong, then there is something else wrong.
Comment 14 Paolo Carlini 2005-11-18 15:32:11 UTC
I'm not sure that specific patch is the culprit, because (see Comment #5) the
threaded testcases started failing before that date. I would like to see the
assembly before and after that date. In any case, it would be nice if Richard
Henderson could analyze a bit this PR. Also relevant:

  http://gcc.gnu.org/ml/libstdc++/2005-11/msg00206.html
  http://gcc.gnu.org/ml/libstdc++/2005-11/msg00207.html
Comment 15 Andrew Pinski 2005-11-18 15:35:32 UTC
See also:
http://www.linuxbase.org/spec/refspecs/IA64-softdevman-vol1.pdf

Page73.

Which is where the comment in comment #13.  So If I read table 4-20 currently  then the memory access will always be ordered in that loop.

And if it started before that date, then there is something wrong as that is when the major difference between 4.0 and 4.1 started.
Comment 16 Richard Henderson 2005-11-18 23:32:00 UTC
Alex's explanation can't be all of it.  I hacked the compiler to emit two
"mf" instructions before and after the sequence here, and it made no
difference to the two tests failing.  I really have no idea what's going on.
Comment 17 Paolo Carlini 2005-11-18 23:41:00 UTC
(In reply to comment #16)
> Alex's explanation can't be all of it.  I hacked the compiler to emit two
> "mf" instructions before and after the sequence here, and it made no
> difference to the two tests failing.  I really have no idea what's going on.

Thanks for your experiments. I'll try to help further, come up with a reduced piece of code actually failing, investigate whether something else important
went in mid of April, maybe. I don't have much more, right now :(
Comment 18 Richard Henderson 2005-11-19 00:56:30 UTC
Dammit, lemme try again; I patched a different tree than I tested.
Comment 19 Paolo Carlini 2005-11-19 01:05:24 UTC
Created attachment 10287 [details]
Slightly hacked, self contained test (for use outside the testsuite)
Comment 20 Paolo Carlini 2005-11-19 01:06:12 UTC
Created attachment 10288 [details]
Slightly hacked, self contained test (for use outside the testsuite)
Comment 21 Paolo Carlini 2005-11-19 01:08:40 UTC
Just in case, those are ready to use outside the testsuite... For me, especially
the second one, fails very quickly and consistently on 4-way and 16-way.
Comment 22 Mark Mitchell 2005-11-19 02:08:14 UTC
I think we need to understand this problem before we release.  It might be that we downgrade the priority if it turns out to be a problem outside our control.
Comment 23 Richard Henderson 2005-11-19 02:09:23 UTC
Created attachment 10289 [details]
mf hack

For the record, here's the aforementioned hack.

And it does appear to change the behaviour of one of the testsuite tests,
but it doesn't fix both of them.  I'll claim, then, that it didn't fix either
of them, but merely changed the timing.  Feel free to tell me what's still
not right, but it seems like, with a hammer this big, that it isn't the
synchronization that's the problem.
Comment 24 Paolo Carlini 2005-11-19 02:15:18 UTC
(In reply to comment #23)
> Created an attachment (id=10289) [edit]
> mf hack
> 
> For the record, here's the aforementioned hack.

Ok, thanks. Over the WE I'll play a bit with it. Among other things, I want
to check its effects on other tests, like the thread/pthread*.cc, and also
for a different version of the library, which uses more refcounting in the
string class and for me exposed many more troubles during the last days.
Comment 25 Paolo Carlini 2005-11-19 02:33:33 UTC
(In reply to comment #23)
> And it does appear to change the behaviour of one of the testsuite tests,
> but it doesn't fix both of them.  I'll claim, then, that it didn't fix either
> of them, but merely changed the timing.

Confirmed here. Actually, for me, the "improvement" is barely noticeable. It's
something else...
Comment 26 Paolo Carlini 2005-11-19 22:12:15 UTC
Created attachment 10297 [details]
Reduced from thread/pthread3

Richard, I'm attaching a really minimal testcase, which fails very quickly
for me with mainline or 4_1-branch and doesn't with 4_0-branch. It's just
two threads and many constructions and destructions of a trivial class
which increases (descreases, respectively) atomically a static counter.

Of course the abort should never trigger.

Can you debug further with this? Otherwise, just tell me, ok?
Comment 27 Richard Biener 2005-11-19 22:38:31 UTC
The only thing in the pthread3_reduced.cc testcase that could make it going wrong is wrong alignment of aw.  And I remember seeing a lot of unaligned traps in dmesg during libstdc++ testing on ia64.
Comment 28 Paolo Carlini 2005-11-19 22:45:25 UTC
Thanks Richard G. In fact, from time to time I saw warnings on the shell,
which really puzzled me, because I was sure that no alignment issues were
present anymore in the higher lever library code proper.

I'll try to add an alignment attribute to the _Atomic_word typedef, as cris
is already doing for instance, and see whether something changes for the
better. Interestingly no such issue existed with 4_0-branch... Any idea why?
Comment 29 Paolo Carlini 2005-11-19 23:02:39 UTC
(In reply to comment #28)
> I'll try to add an alignment attribute to the _Atomic_word typedef, as cris
> is already doing for instance, and see whether something changes for the
> better.

Nope. Maybe the alignment is for some reason broken to the point that not
even an explicit __attribute__ ((aligned (X))) can fix it... I don't know.

Anyway, the problem is very easy to reproduce, now. I'm also attaching a
further reduced, pure C testcase.
Comment 30 Paolo Carlini 2005-11-19 23:06:46 UTC
Created attachment 10298 [details]
Further reduced, pure C
Comment 31 Andreas Schwab 2005-11-20 00:22:17 UTC
Testing this patch:

Index: ia64.c
===================================================================
--- ia64.c	(revision 107220)
+++ ia64.c	(working copy)
@@ -2113,7 +2113,7 @@
 
   emit_insn (GEN_FCN (icode) (cmp_reg, mem, ar_ccv, new_reg));
 
-  emit_cmp_and_jump_insns (cmp_reg, old_reg, EQ, NULL, DImode, true, label);
+  emit_cmp_and_jump_insns (cmp_reg, old_reg, NE, NULL, DImode, true, label);
 }
  
 /* Begin the assembly file.  */
Comment 32 Richard Henderson 2005-11-20 05:43:47 UTC
Oh good grief.  I can't see the forest for the trees.  Andreas, I expect 
that patch will work.  If it tests ok, please commit it.
Comment 33 Andreas Schwab 2005-11-20 10:43:47 UTC
Subject: Bug 24757

Author: schwab
Date: Sun Nov 20 10:43:43 2005
New Revision: 107246

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=107246
Log:
	PR target/24757
	* config/ia64/ia64.c (ia64_expand_atomic_op): Fix condition of cmp
	insn.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/ia64/ia64.c

Comment 34 Andreas Schwab 2005-11-20 10:44:29 UTC
Subject: Bug 24757

Author: schwab
Date: Sun Nov 20 10:44:27 2005
New Revision: 107247

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=107247
Log:
	PR target/24757
	* config/ia64/ia64.c (ia64_expand_atomic_op): Fix condition of cmp
	insn.

Modified:
    branches/gcc-4_1-branch/gcc/ChangeLog
    branches/gcc-4_1-branch/gcc/config/ia64/ia64.c

Comment 35 Andreas Schwab 2005-11-20 10:45:24 UTC
Fixed.