Bug 86005 - [RISCV] Invalid intermixing of __atomic_* libcalls and inline atomic instruction sequences
Summary: [RISCV] Invalid intermixing of __atomic_* libcalls and inline atomic instruct...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 9.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-30 21:18 UTC by Alex Bradbury
Modified: 2023-07-07 16:22 UTC (History)
3 users (show)

See Also:
Host:
Target: riscv
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-05-31 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Bradbury 2018-05-30 21:18:39 UTC
When targeting -march=rv32i or -march=rv64i (i.e. a RISC-V target without the A extension), GCC will produce a fence-based mapping for atomic load and store while calling the __atomic_* libcalls for other operations. As documented here <https://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary> this is not legal. As stated there and in the equivalent LLVM documentation <https://llvm.org/docs/Atomics.html#atomics-and-codegen> lock-based library calls and lock-free instructions should not be intermixed for the same object.

Below is a representative example, where I believe a call to __atomic_load_4 should be generated.

$ cat foo.c
int atomic(int *i) {
  int j = __atomic_add_fetch(i, 1, __ATOMIC_SEQ_CST);
  int k;
  __atomic_load(i, &k, __ATOMIC_SEQ_CST);
  return j+k;
}

$ ./riscv32-unknown-elf-gcc -march=rv32i foo.c -O1 -S -o -
	.file	"foo.c"
	.option nopic
	.text
	.align	2
	.globl	atomic
	.type	atomic, @function
atomic:
	addi	sp,sp,-16
	sw	ra,12(sp)
	sw	s0,8(sp)
	mv	s0,a0
	li	a2,5
	li	a1,1
	call	__atomic_fetch_add_4
	addi	a0,a0,1
	fence	iorw,iorw
	lw	a5,0(s0)
	fence	iorw,iorw
	add	a0,a5,a0
	lw	ra,12(sp)
	lw	s0,8(sp)
	addi	sp,sp,16
	jr	ra
	.size	atomic, .-atomic
	.ident	"GCC: (GNU) 9.0.0 20180530 (experimental)"
Comment 1 Alex Bradbury 2018-05-31 05:40:38 UTC
Actually I think this bug is wider in scope than I first thought. GCC will also intermix __atomic libcalls and inline instruction sequences with -march=rv32ia when values less than XLEN in size are accessed.

$ cat foo.c
char atomic(char *i) {
  char j = __atomic_add_fetch(i, 1, __ATOMIC_SEQ_CST);
  char k;
  __atomic_load(i, &k, __ATOMIC_SEQ_CST);
  return j+k;
}

$ ./riscv32-unknown-elf-gcc -march=rv32ia foo.c -O1 -S -o -
	.file	"foo.c"
	.option nopic
	.text
	.align	2
	.globl	atomic
	.type	atomic, @function
atomic:
	addi	sp,sp,-16
	sw	ra,12(sp)
	sw	s0,8(sp)
	mv	s0,a0
	li	a2,5
	li	a1,1
	call	__atomic_fetch_add_1
	addi	a0,a0,1
	andi	a0,a0,0xff
	fence	iorw,iorw
	lbu	a5,0(s0)
	fence	iorw,iorw
	add	a0,a0,a5
	andi	a0,a0,0xff
	lw	ra,12(sp)
	lw	s0,8(sp)
	addi	sp,sp,16
	jr	ra
	.size	atomic, .-atomic
	.ident	"GCC: (GNU) 9.0.0 20180530 (experimental)"
Comment 2 Andrew Waterman 2018-05-31 05:53:18 UTC
I realize the documentation doesn't concur with me, but as long as gcc
and libgcc agree on the lock-freeness of the routines, I don't see the
harm. (wrt. rv32ia, at least.)

On Wed, May 30, 2018 at 10:40 PM, asb at lowrisc dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86005
>
> Alex Bradbury <asb at lowrisc dot org> changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |asb at lowrisc dot org
>             Summary|[RISCV] Invalid lowering of |[RISCV] Invalid intermixing
>                    |atomics for -march=rv32i /  |of __atomic_* libcalls and
>                    |-march=rv64i                |inline atomic instruction
>                    |                            |sequences
>
> --- Comment #1 from Alex Bradbury <asb at lowrisc dot org> ---
> Actually I think this bug is wider in scope than I first thought. GCC will also
> intermix __atomic libcalls and inline instruction sequences with -march=rv32ia
> when values less than XLEN in size are accessed.
>
> $ cat foo.c
> char atomic(char *i) {
>   char j = __atomic_add_fetch(i, 1, __ATOMIC_SEQ_CST);
>   char k;
>   __atomic_load(i, &k, __ATOMIC_SEQ_CST);
>   return j+k;
> }
>
> $ ./riscv32-unknown-elf-gcc -march=rv32ia foo.c -O1 -S -o -
>         .file   "foo.c"
>         .option nopic
>         .text
>         .align  2
>         .globl  atomic
>         .type   atomic, @function
> atomic:
>         addi    sp,sp,-16
>         sw      ra,12(sp)
>         sw      s0,8(sp)
>         mv      s0,a0
>         li      a2,5
>         li      a1,1
>         call    __atomic_fetch_add_1
>         addi    a0,a0,1
>         andi    a0,a0,0xff
>         fence   iorw,iorw
>         lbu     a5,0(s0)
>         fence   iorw,iorw
>         add     a0,a0,a5
>         andi    a0,a0,0xff
>         lw      ra,12(sp)
>         lw      s0,8(sp)
>         addi    sp,sp,16
>         jr      ra
>         .size   atomic, .-atomic
>         .ident  "GCC: (GNU) 9.0.0 20180530 (experimental)"
Comment 3 Alex Bradbury 2018-05-31 06:07:44 UTC
(In reply to Andrew Waterman from comment #2)
> I realize the documentation doesn't concur with me, but as long as gcc
> and libgcc agree on the lock-freeness of the routines, I don't see the
> harm. (wrt. rv32ia, at least.)

Yes, for RV32IA it might be allowable - assuming you're able for the compiler to make the assumption that it knows that property of the __atomic_* implementation and that assumption won't/can't be invalidated by linking in another implementation which isn't lock-free.
Comment 4 James Y Knight 2018-05-31 15:07:46 UTC
I think if RISCV wants to use out-of-line lock-free atomic helpers, it should give those another function name (for example, __sync_fetch_and_##OP##_##WIDTH), and provide them in libgcc.a, the same as the other architectures which use such out-of-line helpers.

(But also, why doesn't it implement __atomic_add_fetch inline?)

Since lock-free helper functions can be linked into a process as many times as you like without correctness issues, it's safe, and preferable, to provide them in the static runtime lib.

That is quite unlike a potentially-locking __atomic_* function, which must share the same lock across the whole process, and thus must have only one implementation in a process, which is why they're provided separately by libatomic.so.

It's not a good idea to conflate the two different things.
Comment 5 Jim Wilson 2018-05-31 15:17:14 UTC
On Thu, 2018-05-31 at 05:40 +0000, asb at lowrisc dot org wrote:
> Actually I think this bug is wider in scope than I first thought. GCC
> will also
> intermix __atomic libcalls and inline instruction sequences with
> -march=rv32ia
> when values less than XLEN in size are accessed.

FYI I have a prototype patch to fix this.  I copied the subword atomic
support from the ppc port, and hacked it to work.  I haven't done the
fences yet, so it probably won't work if you have more than one thread,
but it works well enough to handle the gcc/g++ testsuites.  It also
needs some cleanup to remove more ppc specific stuff.  I don't know
when I will have time to finish it though.
Comment 6 Jim Wilson 2018-05-31 20:10:06 UTC
On Thu, 2018-05-31 at 15:07 +0000, foom at fuhm dot net wrote:
> (But also, why doesn't it implement __atomic_add_fetch inline?)

If you don't have atomic instructions, then we call an out-of-line
function that uses locks.

If you have atomic instructions, and it is a 32/64-bit operation, then
we inline it.

If you have atomic instructions, and it is a 8/16-bit operation, then
we call a lock free out-of-line function that uses a sequence of
instructions to implement it using 32/64-bit atomic instructions.  And
as mentioned previously, I have an unfinished prototype patch to fix
this by emitting the sequence of instructions inline.
Comment 7 James Y Knight 2018-06-01 15:57:34 UTC
(In reply to Jim Wilson from comment #6)
> On Thu, 2018-05-31 at 15:07 +0000, foom at fuhm dot net wrote:
> > (But also, why doesn't it implement __atomic_add_fetch inline?)
> 
> If you don't have atomic instructions, then we call an out-of-line
> function that uses locks.

Except as mentioned in the initial message, it doesn't emit calls for atomic loads and stores too, but it needs to so that they also participate in the locking.

> If you have atomic instructions, and it is a 32/64-bit operation, then
> we inline it.
> 
> If you have atomic instructions, and it is a 8/16-bit operation, then
> we call a lock free out-of-line function that uses a sequence of
> instructions to implement it using 32/64-bit atomic instructions.  

If you're building for rv32ia, yes, you certainly know that libatomic must implement __atomic_fetch_add_1 lock-free. So, sure, you shouldn't have a correctness issue by mixing the calls.

Yet even so, if you actually needed to emit a call to a known-lock-free out-of-line function, it's better to give it a different name (e.g. __sync_*) and include that in libgcc.a.

Emitting a call to the __atomic_* function means you require users to link against libatomic.so, which shouldn't be required here.

> And
> as mentioned previously, I have an unfinished prototype patch to fix
> this by emitting the sequence of instructions inline.

But doing that is fine, too, and likely preferable.
Comment 8 Jim Wilson 2018-06-04 20:58:43 UTC
This looks like a generic GCC problem, not a RISC-V specific problem.

For instance, if I build an armv6t2 compiler I get
	bl	__atomic_fetch_add_4
Comment 9 Jim Wilson 2018-06-04 21:50:12 UTC
Oops, hitting tab doesn't work as expected.  Trying again...

This looks like a generic GCC problem, not a RISC-V specific problem.  Or perhaps, not a gcc bug at all.

For instance, if I build an armv6t2 compiler I get
	bl	__atomic_fetch_add_4
        ...
	mcr	p15, 0, r0, c7, c10, 5
	ldr	r3, [r3]
	mcr	p15, 0, r0, c7, c10, 5
where the mcr is equivalent to the RISC-V fence.  It looks like MIPS16 and a number of other targets have the same problem.

GCC has no support for calling __atomic_load_4 for this testcase.  GCC assumes that loads smaller or equal to the word size are always atomic, and will not call a library routine for them.  It will emit memory barriers.

If what gcc is doing is wrong, then it is broken for all targets that don't inline expand every atomic function call, and/or don't have atomic instructions.

I can fix the rv32ia support by inlining expanding every atomic function call.  I can't fix the rv32i support without target independent optimizer changes.
Comment 10 James Y Knight 2018-06-13 04:47:38 UTC
I suppose since it doesn't affect most common platforms, nobody's noticed the brokenness yet?

ARM is probably the most common architecture which is missing atomics on common CPU models, but when targeting common OSes (Linux, FreeBSD), you get to use the kernel atomic cmpxchg helpers, so the issue doesn't arise there.

I haven't tested, but I don't think MIPS16 would be broken, since MIPS has native atomics, just not in the MIPS16 ISA. So, it calls out-of-line libgcc __sync_* routines, implemented in MIPS32 and which use LL/SC instructions. There's no problem mixing those with native load/store instructions.


Anyways, for why mixing native load/store with __atomic_* functions is wrong, consider,
int value = 4;

Thread1:
int oldval = __atomic_fetch_add(&value, 1, __ATOMIC_SEQ_CST);

Thread2:
__atomic_store_n(&value, 0, __ATOMIC_SEQ_CST);

The only allowable results after executing both threads should be:
oldval = 0, value = 1 -- if atomic_store executes first.
oldval = 4, value = 0 -- if atomic_fetch_add executes first.

But, if you implement __atomic_fetch_add with a lock, and implement __atomic_store with a native store instruction, you may get the "impossible" result,
oldval = 4, value = 5 -- if atomic_store executes *during* atomic_fetch_add, between the load and store instructions. Oops.
Comment 11 Patrick O'Neill 2023-07-07 16:22:36 UTC
The Char example from this issue is resolved on trunk since we added inline subword atomics. https://inbox.sourceware.org/gcc-patches/20230418214124.2446642-1-patrick@rivosinc.com/

Char example: https://godbolt.org/z/h94PT7hfo

char atomic(char *i) {
  char j = __atomic_add_fetch(i, 1, __ATOMIC_SEQ_CST);
  char k;
  __atomic_load(i, &k, __ATOMIC_SEQ_CST);
  return j+k;
} 

options: -O3 -march=rv64iad

atomic(char*):
        andi    a4,a0,3
        slliw   a4,a4,3
        li      a3,255
        sllw    a3,a3,a4
        li      a1,1
        andi    a2,a0,-4
        not     a6,a3
        sllw    a1,a1,a4
        1:
        lr.w.aqrl       a5, 0(a2)
        add     a7, a5, a1
        and     a7, a7, a3
        and     t1, a5, a6
        or      t1, t1, a7
        sc.w.rl a7, t1, 0(a2)
        bnez    a7, 1b
        sraw    a5,a5,a4
        addiw   a5,a5,1
        andi    a5,a5,0xff
        fence   rw,rw
        lbu     a0,0(a0)
        fence   rw,rw
        add     a0,a0,a5
        andi    a0,a0,0xff
        ret

The initial example's behavior is still present.