50583 – Many __sync_XXX builtin functions are incorrect

Bug 50583 - Many __sync_XXX builtin functions are incorrect

Summary: Many __sync_XXX builtin functions are incorrect

Status:	RESOLVED INVALID

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.7.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-09-30 16:33 UTC by H.J. Lu
Modified:	2011-09-30 23:35 UTC (History)
CC List:	5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description H.J. Lu 2011-09-30 16:33:05 UTC

We have

`TYPE __sync_fetch_and_add (TYPE *ptr, TYPE value, ...)'
`TYPE __sync_fetch_and_sub (TYPE *ptr, TYPE value, ...)'
`TYPE __sync_fetch_and_or (TYPE *ptr, TYPE value, ...)'
`TYPE __sync_fetch_and_and (TYPE *ptr, TYPE value, ...)'
`TYPE __sync_fetch_and_xor (TYPE *ptr, TYPE value, ...)'
`TYPE __sync_fetch_and_nand (TYPE *ptr, TYPE value, ...)'
     These builtins perform the operation suggested by the name, and
     returns the value that had previously been in memory.  That is,

          { tmp = *ptr; *ptr OP= value; return tmp; }
          { tmp = *ptr; *ptr = ~(tmp & value); return tmp; }   // nand

On x86, they may be implemented as

[hjl@gnu-33 gcc]$ cat /tmp/x.c
int
foo (int *p)
{
  return __sync_fetch_and_and(p, 7);
}
[hjl@gnu-33 gcc]$ gcc -S -O2 /tmp/x.c
[hjl@gnu-33 gcc]$ cat x.s
	.file	"x.c"
	.text
	.p2align 4,,15
	.globl	foo
	.type	foo, @function
foo:
.LFB0:
	.cfi_startproc
	movl	(%rdi), %eax
.L2:
	movl	%eax, %edx
	movl	%eax, %ecx
	andl	$7, %edx
	lock cmpxchgl	%edx, (%rdi)
	jne	.L2
	movl	%ecx, %eax
	ret
	.cfi_endproc
.LFE0:
	.size	foo, .-foo
	.ident	"GCC: (GNU) 4.6.0 20110603 (Red Hat 4.6.0-10)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-33 gcc]$ 

This isn't equivalent to 

 { tmp = *ptr; *ptr OP= value; return tmp; }

It is

 {
   tmp = *ptr;
   tmp1 = tmp OP value;
   return __sync_val_compare_and_swap (ptr, tmp, tmp1);
}

The only thing supported on x86 are


[hjl@gnu-33 gcc]$ cat /tmp/y.c 
int
foo1 (int *p)
{
  return __sync_fetch_and_add (p, 7);
}

int
foo2 (int *p)
{
  return __sync_fetch_and_sub (p, 7);
}
[hjl@gnu-33 gcc]$ gcc -S -O2 /tmp/y.c
[hjl@gnu-33 gcc]$ cat y.s
	.file	"y.c"
	.text
	.p2align 4,,15
	.globl	foo1
	.type	foo1, @function
foo1:
.LFB0:
	.cfi_startproc
	movl	$7, %eax
	lock xaddl	%eax, (%rdi)
	ret
	.cfi_endproc
.LFE0:
	.size	foo1, .-foo1
	.p2align 4,,15
	.globl	foo2
	.type	foo2, @function
foo2:
.LFB1:
	.cfi_startproc
	movl	$-7, %eax
	lock xaddl	%eax, (%rdi)
	ret
	.cfi_endproc
.LFE1:
	.size	foo2, .-foo2
	.ident	"GCC: (GNU) 4.6.0 20110603 (Red Hat 4.6.0-10)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-33 gcc]$

Comment 1 H.J. Lu 2011-09-30 16:57:14 UTC

We have 2 choices:

1. Update document of

`TYPE __sync_fetch_and_add (TYPE *ptr, TYPE value, ...)'
`TYPE __sync_fetch_and_sub (TYPE *ptr, TYPE value, ...)'
`TYPE __sync_fetch_and_or (TYPE *ptr, TYPE value, ...)'
`TYPE __sync_fetch_and_and (TYPE *ptr, TYPE value, ...)'
`TYPE __sync_fetch_and_xor (TYPE *ptr, TYPE value, ...)'
`TYPE __sync_fetch_and_nand (TYPE *ptr, TYPE value, ...)'

to

 {
   tmp = *ptr;
   tmp1 = tmp OP value;
   return __sync_val_compare_and_swap (ptr, tmp, tmp1);
 }

2. Remove those __sync_fetch_and_XXX which aren't

 { tmp = *ptr; *ptr OP= value; return tmp; }

Since only

 { tmp = *ptr; *ptr OP= value; return tmp; }

can be used to implement locks, I think we should do 2.

Comment 2 Andrew Macleod 2011-09-30 18:25:51 UTC

Technically it is closer to 

do { 
  tmp = *ptr;
  tmp1 = tmp OP value;
} while (__sync_val_compare_and_swap (ptr, tmp, tmp1) != tmp);
return tmp;

except it uses the new value of tmp from the CAS rather than reloading it again.

It loops performing OP on the new value of tmp until the CAS is successful. Functionally it is identical, what is your case where it can't be used?

The only difference I see is that another very hostile process which constantly changes *ptr may prevent forward progress from ever happening, resulting in an infinite loop.  

At the moment we aren't making any guarantees of forward progress, although I think at some point we may want to add a flag to the compiler as it becomes more of an issue.   With a flag like that enabled, a CAS loop would not be a valid implementation.

Comment 3 H.J. Lu 2011-09-30 18:37:42 UTC

The same problem with

`TYPE __sync_add_and_fetch (TYPE *ptr, TYPE value, ...)'
`TYPE __sync_sub_and_fetch (TYPE *ptr, TYPE value, ...)'
`TYPE __sync_or_and_fetch (TYPE *ptr, TYPE value, ...)'
`TYPE __sync_and_and_fetch (TYPE *ptr, TYPE value, ...)'
`TYPE __sync_xor_and_fetch (TYPE *ptr, TYPE value, ...)'
`TYPE __sync_nand_and_fetch (TYPE *ptr, TYPE value, ...)'
     These builtins perform the operation suggested by the name, and
     return the new value.  That is,

          { *ptr OP= value; return *ptr; }
          { *ptr = ~(*ptr & value); return *ptr; }   // nand

[hjl@gnu-33 pr50583]$ cat z.i
int
foo (int *p)
{
  return __sync_and_and_fetch (p, 7);
}
[hjl@gnu-33 pr50583]$ gcc -O2 -S z.i
[hjl@gnu-33 pr50583]$ cat z.s
	.file	"z.i"
	.text
	.p2align 4,,15
	.globl	foo
	.type	foo, @function
foo:
.LFB0:
	.cfi_startproc
	movl	(%rdi), %eax
.L2:
	movl	%eax, %edx
	andl	$7, %edx
	lock cmpxchgl	%edx, (%rdi)
	jne	.L2
	movl	%edx, %eax
	ret
	.cfi_endproc
.LFE0:
	.size	foo, .-foo
	.ident	"GCC: (GNU) 4.6.0 20110603 (Red Hat 4.6.0-10)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-33 pr50583]$ 

It is

 {
   tmp = *ptr;
   do
     {
       tmp1 = tmp OP value;
     }
   while (__sync_val_compare_and_swap (ptr, tmp, tmp1) != tmp);
   return tmp1;
 }

If *ptr is changed between load and cmpxchg, we get an infinite loop.

Comment 4 H.J. Lu 2011-09-30 18:47:21 UTC

I guess it is OK.

Comment 5 Jakub Jelinek 2011-09-30 18:49:29 UTC

(In reply to comment #3)

>     movl    (%rdi), %eax
> .L2:
>     movl    %eax, %edx
>     andl    $7, %edx
>     lock cmpxchgl    %edx, (%rdi)
>     jne    .L2
>     movl    %edx, %eax

> It is
> 
>  {
>    tmp = *ptr;
>    do
>      {
>        tmp1 = tmp OP value;
>      }
>    while (__sync_val_compare_and_swap (ptr, tmp, tmp1) != tmp);
>    return tmp1;
>  }
> 
> If *ptr is changed between load and cmpxchg, we get an infinite loop.

No, you aren't translating the asm correctly back into C.
It is
  tmp = *ptr;
  do
    tmp1 = tmp OP value; tmp2 = tmp;
  while ((tmp = __sync_val_compare_and_swap (ptr, tmp2, tmp1) != tmp2);
  return tmp1;
because cmpxchgl instruction loads the *ptr value from memory into %eax if the instruction has been unsuccessful.  So, tmp is loaded with the new value for the next iteration.

Comment 6 Andi Kleen 2011-09-30 23:35:29 UTC

Can't say I'm a fan of adding such a heavy weight sequence into
an intrinsic.  Maybe better to simply leave out the intrinsics that
cannot be implemented with loops? If someone wants a loop they
better open code it.

It would be nice if you implemented the ors, nands and ands
with bts, btr etc if the second argument is a constant and only has one
bit set.