Bug 67281 - HTM builtins aren't treated as compiler barriers on powerpc
Summary: HTM builtins aren't treated as compiler barriers on powerpc
Status: CLOSED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.9.0
: P3 normal
Target Milestone: 6.0
Assignee: Peter Bergner
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-19 17:22 UTC by Tulio Magno Quites Machado Filho
Modified: 2016-01-27 19:38 UTC (History)
2 users (show)

See Also:
Host:
Target: powerpc*-linux
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tulio Magno Quites Machado Filho 2015-08-19 17:22:03 UTC
Depending on the level of optimization, GCC moves a memory access outside a transaction which breaks the atomicity of the transaction.

This is a small testcase that reproduces this behavior:

$ cat tbegin-barrier.c
long
foo (long dest, long src0, long src1, long tries)
{
  long i;
  for (i = 0; i < tries; i++)
    {
      __builtin_tbegin (0);
      dest = src0 + src1;
      __builtin_tend (0);
    }
  return dest;
}

compiling using: -O2 -S -mcpu=power8 tbegin-barrier.c
gives
foo:
        cmpdi 0,6,0
        blelr 0
        mtctr 6
        add 3,4,5
        .p2align 4,,15
.L3:
        tbegin. 0
        tend. 0
        bdnz .L3
        blr
Comment 1 Andrew Pinski 2015-08-19 17:52:14 UTC
There is no memory access in your testcase at all. Since the variables are all local variables and nothing takes the address, it can be moved out of the loop as no other thread can cause that to fail.
Comment 2 Tulio Magno Quites Machado Filho 2015-08-19 18:43:33 UTC
Oooops.  My bad.
What about this one?

$ cat tbegin-barrier.c
long
foo (long dest, long *src0, long src1, long tries)
{
  long i;
  for (i = 0; i < tries; i++)
    {
      __builtin_tbegin (0);
      dest = *src0 + src1;
      __builtin_tend (0);
    }
  return dest;
}

If we compile it the same way:

foo:
        cmpdi 0,6,0
        blelr 0
        mtctr 6
        ld 3,0(4)
        .p2align 4,,15
.L3:
        tbegin. 0
        tend. 0
        bdnz .L3
        add 3,3,5
        blr
Comment 3 Andrew Pinski 2015-08-19 18:51:17 UTC
Since there are no stores, the load seems like it can be pulled out of the loop too.
Comment 4 Tulio Magno Quites Machado Filho 2015-08-19 19:07:29 UTC
(In reply to Andrew Pinski from comment #3)
> Since there are no stores, the load seems like it can be pulled out of the
> loop too.

I disagree with you.
If I use the value of dest to take a decision inside the transaction, I need the memory barrier before the access to *src0.

Here's an example:

long
foo (long dest, long *src0, long src1, long tries)
{
  long i;
  for (i = 0; i < tries; i++)
    {
      __builtin_tbegin (0);
      dest = *src0 + src1;
      if (dest == 13)
	__builtin_tabort(0);
      __builtin_tend (0);
    }
  return dest;
}

In other words, if you access *src0 before the memory barrier, its value may change when the memory barrier is created.  This is particularly useful if dest says if a lock has been acquired by another thread or not.

For the reference, this has been found in glibc source code: https://sourceware.org/ml/libc-alpha/2015-07/msg00986.html
Comment 5 Peter Bergner 2015-08-19 20:25:55 UTC
(In reply to Tulio Magno Quites Machado Filho from comment #4)
> I disagree with you.

I agree with Tulio.


> Here's an example:
> 
> long
> foo (long dest, long *src0, long src1, long tries)
> {
>   long i;
>   for (i = 0; i < tries; i++)
>     {
>       __builtin_tbegin (0);
>       dest = *src0 + src1;
>       if (dest == 13)
> 	__builtin_tabort(0);
>       __builtin_tend (0);
>     }
>   return dest;
> }

Even better is to change src1 to a pointer too and dereference that within the transaction.  If *src0 and *src1 are two memory locations that should be updated together and atomically, then allowing them to move past the tbegin. could break that requirement.
Comment 6 torvald 2015-08-23 12:27:14 UTC
I think tbegin needs to have same semantics as a lock acquisition and the compiler must not assume to know anything about tbegin's return value; tend must have same semantics as a lock release.

See the libc-alpha discussion for why I think this is the case:
https://sourceware.org/ml/libc-alpha/2015-08/msg00963.html

Thus, I don't think a full compiler barrier is necessary, but if we don't have something finer-grained to capture the semantics of a lock acquisition, then we need the compiler barrier (GCC currently assumes atomics to be compiler barriers AFAIK).  We should in any case agree on a semantics and document it in the GCC sources.  Documenting that we need a full compiler barrier is not correct in that it's not a necessary condition (even though it should be a sufficient condition).
Comment 7 Peter Bergner 2015-10-15 16:39:18 UTC
Author: bergner
Date: Thu Oct 15 16:38:47 2015
New Revision: 228846

URL: https://gcc.gnu.org/viewcvs?rev=228846&root=gcc&view=rev
Log:
	Backport from mainline
	2015-10-14  Peter Bergner  <bergner@vnet.ibm.com>
		    Torvald Riegel  <triegel@redhat.com>

	PR target/67281
	* config/rs6000/htm.md (UNSPEC_HTM_FENCE): New.
	(tabort, tabort<wd>c, tabort<wd>ci, tbegin, tcheck, tend,
	trechkpt, treclaim, tsr, ttest): Rename define_insns from this...
	(*tabort, *tabort<wd>c, *tabort<wd>ci, *tbegin, *tcheck, *tend,
	*trechkpt, *treclaim, *tsr, *ttest): ...to this.  Add memory barrier.
	(tabort, tabort<wd>c, tabort<wd>ci, tbegin, tcheck, tend,
	trechkpt, treclaim, tsr, ttest): New define_expands.
	* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
	__TM_FENCE__ for htm.
	* doc/extend.texi: Update documentation for htm builtins.

Modified:
    branches/gcc-5-branch/gcc/ChangeLog
    branches/gcc-5-branch/gcc/config/rs6000/htm.md
    branches/gcc-5-branch/gcc/config/rs6000/rs6000-c.c
    branches/gcc-5-branch/gcc/doc/extend.texi
Comment 8 Peter Bergner 2015-10-15 16:40:46 UTC
Author: bergner
Date: Thu Oct 15 16:40:14 2015
New Revision: 228847

URL: https://gcc.gnu.org/viewcvs?rev=228847&root=gcc&view=rev
Log:
	Backport from mainline
	2015-10-14  Peter Bergner  <bergner@vnet.ibm.com>
		    Torvald Riegel  <triegel@redhat.com>

	PR target/67281
	* config/rs6000/htm.md (UNSPEC_HTM_FENCE): New.
	(tabort, tabort<wd>c, tabort<wd>ci, tbegin, tcheck, tend,
	trechkpt, treclaim, tsr, ttest): Rename define_insns from this...
	(*tabort, *tabort<wd>c, *tabort<wd>ci, *tbegin, *tcheck, *tend,
	*trechkpt, *treclaim, *tsr, *ttest): ...to this.  Add memory barrier.
	(tabort, tabort<wd>c, tabort<wd>ci, tbegin, tcheck, tend,
	trechkpt, treclaim, tsr, ttest): New define_expands.
	* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
	__TM_FENCE__ for htm.
	* doc/extend.texi: Update documentation for htm builtins.

Modified:
    branches/gcc-4_9-branch/gcc/ChangeLog
    branches/gcc-4_9-branch/gcc/config/rs6000/htm.md
    branches/gcc-4_9-branch/gcc/config/rs6000/rs6000-c.c
    branches/gcc-4_9-branch/gcc/doc/extend.texi
Comment 9 Peter Bergner 2015-10-15 16:44:08 UTC
Fixed in trunk and the FSF 5 and 4.9 branches.
Comment 10 Peter Bergner 2016-01-27 19:38:55 UTC
Closing as fixed.