Bug 89199 - libgo regression in implementation of CompareAndSwap functions resulting in intermittent testcase failures on ppc64le power9 after r268458
Summary: libgo regression in implementation of CompareAndSwap functions resulting in i...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: go (show other bugs)
Version: 9.0
: P2 major
Target Milestone: ---
Assignee: Ian Lance Taylor
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-04 20:36 UTC by boger
Modified: 2019-02-06 20:51 UTC (History)
1 user (show)

See Also:
Host:
Target: ppc64le power9
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description boger 2019-02-04 20:36:19 UTC
Some new intermittent testcase failures started appearing in gcc-testresults for libgo starting with commit 268458 on power9 systems. I have verified this is the commit where the failures begin. We have not seen these failures on Ubuntu 18.04 power8 with the same commit.

In some test runs, there are failures in cmd/go/internal/mvs, net/http, net/http/httputil, sync and they have panic output with the same message about sync.Cond as shown below in the panic trace.

I have run the tests on our 2 power9 systems which are both Ubuntu 18.04 and when built with a specific commit the binary will consistently fail.

panic: sync.Cond is copied

goroutine 348 [running]:
sync.copyChecker.check
	/home/boger/gccgo.work/bld/powerpc64le-linux/libgo/gotest163361/test/cond.go:85
sync.copyChecker.check
	/home/boger/gccgo.work/bld/powerpc64le-linux/libgo/gotest163361/test/cond.go:81
sync.Cond.Wait
	/home/boger/gccgo.work/bld/powerpc64le-linux/libgo/gotest163361/test/cond.go:53
sync_test.TestCondSignalStealing..func1
	/home/boger/gccgo.work/bld/powerpc64le-linux/libgo/gotest163361/test/cond_test.go:197
created by sync_test.TestCondSignalStealing
	/home/boger/gccgo.work/bld/powerpc64le-linux/libgo/gotest163361/test/cond_test.go:194 +472
Keeping gotest163361
FAIL: sync

To run an individual test I do:

cd bld/powerpc64le-linux/libgo
make sync/check
Comment 1 boger 2019-02-06 15:29:25 UTC
I have raised this priority because this looks like a bug in the Go code due to the use of __atomic_compare_exchange_n builtin and how it is implementing the CompareAndSwapUintptr.

In atomic.c:

_Bool
CompareAndSwapUintptr (uintptr_t *val, uintptr_t old, uintptr_t new)
{
  return __atomic_compare_exchange_n (val, &old, new, true, __ATOMIC_SEQ_CST,
	
				      __ATOMIC_RELAXED);
}

The true argument is indicating 'weak' which results in the following code. Note that there is no guarantee in this code that the CompareAndSwap will do the store and return true, if it can't get the reservation it will return false and the store won't happen.

00000000010cc280 <sync..z2fatomic.CompareAndSwapUintptr>:
 10cc280:       ac 04 00 7c     hwsync
 10cc284:       a8 18 20 7d     ldarx   r9,0,r3
 10cc288:       00 20 29 7c     cmpd    r9,r4
 10cc28c:       0c 00 82 40     bne     10cc298 <sync..z2fatomic.CompareAndSwapUintptr+0x18>
 10cc290:       ad 19 a0 7c     stdcx.  r5,0,r3
 10cc294:       2c 01 00 4c     isync
 10cc298:       26 00 78 7c     mfocrf  r3,128
 10cc29c:       fe 1f 63 54     rlwinm  r3,r3,3,31,31
 10cc2a0:       20 00 80 4e     blr


If I look in sync/cond.go check() function:

func (c *copyChecker) check() {
        if uintptr(*c) != uintptr(unsafe.Pointer(c)) &&
                !atomic.CompareAndSwapUintptr((*uintptr)(c), 0, uintptr(unsafe.Pointer(c))) &&
                uintptr(*c) != uintptr(unsafe.Pointer(c)) {
                panic("sync.Cond is copied")
        }
}

This if check assumes that the CompareAndSwapUinptr will do the store if needed which is not true with the latest implementation of CompareAndSwapUintptr. It seems that the CompareAndSwapUinptr should be passing the argument to indicate 'strong' and not 'weak'. I made that change and that fixes this problem. I have not checked the other CompareAndSwaps.
Comment 2 ian@gcc.gnu.org 2019-02-06 20:46:33 UTC
Author: ian
Date: Wed Feb  6 20:46:00 2019
New Revision: 268591

URL: https://gcc.gnu.org/viewcvs?rev=268591&root=gcc&view=rev
Log:
	PR go/89199
    sync/atomic: use strong form of atomic_compare_exchange_n
    
    In the recent change to use atomic_compare_exchange_n I thought we
    could use the weak form, which can spuriously fail. But that is not
    how it is implemented in the gc library, and it is not what the rest
    of the library expects.
    
    Thanks to Lynn Boger for identifying the problem.
    
    Fixes https://gcc.gnu.org/PR89199
    
    Reviewed-on: https://go-review.googlesource.com/c/161359

Modified:
    trunk/gcc/go/gofrontend/MERGE
    trunk/libgo/go/sync/atomic/atomic.c
Comment 3 Ian Lance Taylor 2019-02-06 20:51:30 UTC
Should be fixed, thanks for identifying the problem.