Bug 80004 - [6 Regression] non-atomic load moved to before atomic load with std::memory_order_acquire
Summary: [6 Regression] non-atomic load moved to before atomic load with std::memory_o...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 6.3.0
: P2 normal
Target Milestone: 6.4
Assignee: Richard Biener
URL:
Keywords: wrong-code
Depends on: 49244
Blocks:
  Show dependency treegraph
 
Reported: 2017-03-11 12:23 UTC by Frantisek Boranek
Modified: 2017-03-16 08:08 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work: 5.4.1, 6.3.1, 7.0.1
Known to fail: 6.3.0
Last reconfirmed: 2017-03-13 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Frantisek Boranek 2017-03-11 12:23:32 UTC
Folowing code (https://gist.github.com/fboranek/5c99f96a99f902f5cf9dc5d597ff8fe3) for 4 thread doesn't compute the correct result if it is compiled by gcc (Debian 6.3.0-6) 6.3.0 20170205. The code is corect compiled by gcc 4.9.2 and 5.4.1. It is also correct compiled if the variable counter is not static bat global.

This issue can be related to bug 78778, but bug 78778 is appear in 5.1.0 while this issue is since version 6.


static int counter {0};
static std::atomic<bool> flag {false};

void increment1(int cycles)
{
    for (int i=0; i < cycles; ++i)
    {
        while (flag.exchange(true, std::memory_order_acquire));
        
        ++counter;

        flag.store(false, std::memory_order_release);
    }
}

// asm for gcc 5.4.1
00000000004010f0 <_Z10increment1i>:
  4010f0:	31 c9                	xor    %ecx,%ecx
  4010f2:	85 ff                	test   %edi,%edi
  4010f4:	ba 01 00 00 00       	mov    $0x1,%edx
  4010f9:	7e 26                	jle    401121 <_Z10increment1i+0x31>
  4010fb:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  401100:	89 d0                	mov    %edx,%eax
  401102:	86 05 ec 20 20 00    	xchg   %al,0x2020ec(%rip)        # 6031f4 <_ZL4flag>
  401108:	84 c0                	test   %al,%al
  40110a:	75 f4                	jne    401100 <_Z10increment1i+0x10>
  40110c:	83 c1 01             	add    $0x1,%ecx
  40110f:	83 05 e2 20 20 00 01 	addl   $0x1,0x2020e2(%rip)        # 6031f8 <_ZL7counter>
  401116:	39 cf                	cmp    %ecx,%edi
  401118:	c6 05 d5 20 20 00 00 	movb   $0x0,0x2020d5(%rip)        # 6031f4 <_ZL4flag>
  40111f:	75 df                	jne    401100 <_Z10increment1i+0x10>
  401121:	f3 c3                	repz retq 
  401123:	0f 1f 00             	nopl   (%rax)
  401126:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  40112d:	00 00 00

// asm for gcc 6.3.0
0000000000001420 <_Z10increment1i>:
    1420:	8b 0d d2 1d 20 00    	mov    0x201dd2(%rip),%ecx        # 2031f8 <_ZL7counter>
    1426:	85 ff                	test   %edi,%edi
    1428:	ba 01 00 00 00       	mov    $0x1,%edx
    142d:	8d 34 0f             	lea    (%rdi,%rcx,1),%esi
    1430:	7e 26                	jle    1458 <_Z10increment1i+0x38>
    1432:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
    1438:	89 d0                	mov    %edx,%eax
    143a:	86 05 b4 1d 20 00    	xchg   %al,0x201db4(%rip)        # 2031f4 <_ZL4flag>
    1440:	84 c0                	test   %al,%al
    1442:	75 f4                	jne    1438 <_Z10increment1i+0x18>
    1444:	83 c1 01             	add    $0x1,%ecx
    1447:	39 ce                	cmp    %ecx,%esi
    1449:	89 0d a9 1d 20 00    	mov    %ecx,0x201da9(%rip)        # 2031f8 <_ZL7counter>
    144f:	c6 05 9e 1d 20 00 00 	movb   $0x0,0x201d9e(%rip)        # 2031f4 <_ZL4flag>
    1456:	75 e0                	jne    1438 <_Z10increment1i+0x18>
    1458:	f3 c3                	repz retq 
    145a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
Comment 1 Richard Biener 2017-03-13 09:59:04 UTC
Confirmed.  Somehow PRE messes up here, investigating.
Comment 2 Richard Biener 2017-03-13 10:09:20 UTC
The conservative handling in call_may_clobber_ref_p_1 doesn't work:

  /* Handle those builtin functions explicitly that do not act as
     escape points.  See tree-ssa-structalias.c:find_func_aliases
     for the list of builtins we might need to handle here.  */
  if (callee != NULL_TREE
      && gimple_call_builtin_p (call, BUILT_IN_NORMAL))
    switch (DECL_FUNCTION_CODE (callee))
      {
...
        /* __sync_* builtins and some OpenMP builtins act as threading
           barriers.  */
#undef DEF_SYNC_BUILTIN
#define DEF_SYNC_BUILTIN(ENUM, NAME, TYPE, ATTRS) case ENUM:
#include "sync-builtins.def"
#undef DEF_SYNC_BUILTIN

because gimple_call_builtin_p returns false.

_10 = __atomic_exchange_1 (&MEM[(struct __atomic_base *)&flag]._M_i, 1, 2);

not sure what fixed it on trunk...  The '1' is of type int while the
prototype seeks for a 'unsigned char'.

Somebody should bisect what fixed this.
Comment 3 Richard Biener 2017-03-13 10:52:43 UTC
r235792
Comment 4 Richard Biener 2017-03-14 10:00:35 UTC
Testing backport.
Comment 5 Richard Biener 2017-03-14 12:57:40 UTC
Author: rguenth
Date: Tue Mar 14 12:57:08 2017
New Revision: 246122

URL: https://gcc.gnu.org/viewcvs?rev=246122&root=gcc&view=rev
Log:
2017-03-14  Richard Biener  <rguenther@suse.de>

	Backport from mainline
	2016-05-02  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/80004
	PR target/49244
	* gimple.c (gimple_builtin_call_types_compatible_p): Allow
	char/short arguments promoted to int because of promote_prototypes.

	2017-03-09  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/79977
	* graphite-scop-detection.c (scop_detection::merge_sese):
	Handle the case of extra exits to blocks dominating the entry.

	* gcc.dg/graphite/pr79977.c: New testcase.

	2017-03-09  Richard Biener  <rguenther@suse.de>

	PR middle-end/79971
	* gimple-expr.c (useless_type_conversion_p): Preserve
	TYPE_SATURATING for fixed-point types.

	* gcc.dg/fixed-point/pr79971.c: New testcase.

	2017-03-06  Richard Biener  <rguenther@suse.de>

	PR fortran/79894
	* trans.c (gfc_add_modify_loc): Weaken assert.

	2017-03-02  Richard Biener  <rguenther@suse.de>

	PR c/79756
	* c-common.c (c_common_mark_addressable_vec): Look through
	C_MAYBE_CONST_EXPR.

	* gcc.dg/vector-1.c: New testcase.

	2017-02-22  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/79666
	* tree-vrp.c (extract_range_from_binary_expr_1): Make sure
	to not symbolically negate if that may introduce undefined
	overflow.

	* gcc.dg/torture/pr79666.c: New testcase.

	2017-02-17  Richard Biener  <rguenther@suse.de>

	PR middle-end/79576
	* params.def (max-ssa-name-query-depth): Limit to 10.

Added:
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/fixed-point/pr79971.c
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/graphite/pr79977.c
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/torture/pr79666.c
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/vector-1.c
Modified:
    branches/gcc-6-branch/gcc/ChangeLog
    branches/gcc-6-branch/gcc/c-family/ChangeLog
    branches/gcc-6-branch/gcc/c-family/c-common.c
    branches/gcc-6-branch/gcc/fortran/ChangeLog
    branches/gcc-6-branch/gcc/fortran/trans.c
    branches/gcc-6-branch/gcc/gimple-expr.c
    branches/gcc-6-branch/gcc/gimple.c
    branches/gcc-6-branch/gcc/graphite-scop-detection.c
    branches/gcc-6-branch/gcc/params.def
    branches/gcc-6-branch/gcc/testsuite/ChangeLog
    branches/gcc-6-branch/gcc/tree-vrp.c
Comment 6 Richard Biener 2017-03-16 08:08:16 UTC
Fixed.