Bug 52080

Summary: Stores to bitfields introduce a store-data-race on adjacent data
Product: gcc Reporter: Richard Biener <rguenth>
Component: targetAssignee: Richard Biener <rguenth>
Status: RESOLVED FIXED    
Severity: normal CC: bergner, ebotcazou, gcc-bugs, hp, kirill, matz, toolchain, vr5
Priority: P3 Keywords: wrong-code
Version: 4.7.0   
Target Milestone: 4.8.0   
Host: Target: ia64-*-linux, sparc64-*-linux, powerpc64-*-linux
Build: Known to work: 4.8.0
Known to fail: 4.3.3, 4.6.2, 4.7.0 Last reconfirmed: 2012-02-01 00:00:00
Bug Depends on: 48124    
Bug Blocks:    
Attachments: candidate patch

Description Richard Biener 2012-02-01 10:06:55 UTC
For the following testcase we generate a 8 byte RMW cycle on IA64 which
causes locking problems in the linux kernel btrfs filesystem.

struct x {
    long a;
    unsigned int lock;
    unsigned int full : 1;
};

void
wrong(struct x *ptr)
{
  ptr->full = 1;
}
Comment 1 Richard Biener 2012-02-01 10:22:51 UTC
SPARC64 is also affected.

;; ptr_1(D)->full = 1;

(insn 6 5 7 (set (reg:DI 110)
        (mem/j:DI (plus:DI (reg/v/f:DI 109 [ ptr ])
                (const_int 8 [0x8])) [0+8 S8 A64])) t.c:10 -1
     (nil))

(insn 7 6 8 (set (reg:DI 112)
        (const_int 2147483648 [0x80000000])) t.c:10 -1
     (nil))

(insn 8 7 9 (set (reg:DI 111)
        (ior:DI (reg:DI 110)
            (reg:DI 112))) t.c:10 -1
     (nil))

(insn 9 8 0 (set (mem/j:DI (plus:DI (reg/v/f:DI 109 [ ptr ])
                (const_int 8 [0x8])) [0+8 S8 A64])
        (reg:DI 111)) t.c:10 -1
     (nil))


wrong:
        ldx     [%o0+8], %g2
        sethi   %hi(2147483648), %g1
        or      %g2, %g1, %g1
        jmp     %o7+8
         stx    %g1, [%o0+8]


At least IA64 also can do 4-byte loads/stores (but not sure the HW
wouldn't re-introduce the data race).
Comment 2 Richard Biener 2012-02-01 10:25:21 UTC
SPARC64 also can do 32bit loads/stores as the following testcase shows:

struct x {
    long a;
    unsigned int lock;
    unsigned int full;
};

void
wrong(struct x *ptr)
{
  ptr->full = 1;
}

here we simply use a 32bit store.
Comment 3 Richard Biener 2012-02-01 10:38:58 UTC
For SPARC64 optimize_bitfield_assignment_op fails so we fall into the
store_field path with mode1 == VOIDmode (what get_inner_refrence says).
to_rtx is

(mem/j:BLK (reg/v/f:DI 109 [ ptr ]) [2 *ptr_1(D)+0 S1 A64])

Related bug: PR48124.
Comment 4 Richard Biener 2012-02-01 11:12:41 UTC
Btw,

  offset = bitnum / unit;
  bitpos = bitnum % unit;
  byte_offset = (bitnum % BITS_PER_WORD) / BITS_PER_UNIT
                + (offset * UNITS_PER_WORD);

byte_offset is bollocks (or has a really poor name).  On SPARC64 I see

(gdb) p unit
$11 = 8
(gdb) p offset
$12 = 12
(gdb) p bitpos
$13 = 0
(gdb) p byte_offset
$14 = 100

Other than that we are falling into the generic store_fixed_bit_field
routine which at the top does

  unsigned int total_bits = BITS_PER_WORD;

and

      mode = GET_MODE (op0);
      if (GET_MODE_BITSIZE (mode) == 0
          || GET_MODE_BITSIZE (mode) > GET_MODE_BITSIZE (word_mode))
        mode = word_mode;

(now, why we use a BLKmode mem here and not a QImode mem may be
surprising)

get_best_mode still returns DImode because that's the largest mode
the given alignment supports.  After that we're lost.  So it seems
that to fix this case we'd need to figure out some other largest
mode we can pass to get_best_mode.  The only hint would be from
providing a different mode for the initial MEM we create, like with

Index: gcc/expr.c
===================================================================
--- gcc/expr.c  (revision 183791)
+++ gcc/expr.c  (working copy)
@@ -4705,6 +4705,12 @@ expand_assignment (tree to, tree from, b
            to_rtx = adjust_address (to_rtx, mode1, 0);
          else if (GET_MODE (to_rtx) == VOIDmode)
            to_rtx = adjust_address (to_rtx, BLKmode, 0);
+         else if (TREE_CODE (to) == COMPONENT_REF
+                  && DECL_BIT_FIELD (TREE_OPERAND (to, 1))
+                  && DECL_MODE (TREE_OPERAND (to, 1)) != BLKmode)
+           to_rtx = adjust_address (to_rtx,
+                                    TYPE_MODE (DECL_BIT_FIELD_TYPE
+                                       (TREE_OPERAND (to, 1))), 0);
        }
  
       if (offset != 0)

That avoids the use of QImode we have on the field-decl but also
adjusts MEM_SIZE ...
Comment 5 Eric Botcazou 2012-02-01 17:11:10 UTC
You probably need to add a 'volatile' somewhere to really have wrong code.
Comment 6 Eric Botcazou 2012-02-01 17:45:08 UTC
get_best_mode returns DImode because of:

/* Nonzero if access to memory by bytes is slow and undesirable.
   For RISC chips, it means that access to memory by bytes is no
   better than access by words when possible, so grab a whole word
   and maybe make use of that.  */
#define SLOW_BYTE_ACCESS 1
Comment 7 Michael Matz 2012-02-01 17:49:45 UTC
Yeah.  Which is okay for reading, but doing the same when writing is
problematic.
Comment 8 Peter Bergner 2012-02-01 18:52:06 UTC
This fails on powerpc64-linux as well (-m64), where we generate:

        ld 9,8(3)
        li 10,1
        rldimi 9,10,31,32
        std 9,8(3)
        blr
Comment 9 Petr Tesarik 2012-02-02 11:00:50 UTC
OK, my minimal test case removed the "volatile" keyword by mistake.

The real code in BTRFS has the volatile for the lock value which precedes the bitfield, so the corresponding structure would be:

struct x {
    long a;
    volatile unsigned int lock;   /* <- note the "volatile" here */
    unsigned int full : 1;
};

Now, GCC should honour that the value of "lock" can change any time, so it must not assume that writing back the same value a few cycles later is safe.
Comment 10 Richard Biener 2012-02-02 11:08:56 UTC
(In reply to comment #9)
> OK, my minimal test case removed the "volatile" keyword by mistake.
> 
> The real code in BTRFS has the volatile for the lock value which precedes the
> bitfield, so the corresponding structure would be:
> 
> struct x {
>     long a;
>     volatile unsigned int lock;   /* <- note the "volatile" here */
>     unsigned int full : 1;
> };
> 
> Now, GCC should honour that the value of "lock" can change any time, so it must
> not assume that writing back the same value a few cycles later is safe.

volatiles on single structure members is of course under- (or even un-)specified.  Consider

struct x {
  int i : 1;
  volatile int j : 1;
};

Where we clearly cannot access i without modifying j (but it's still
valid C).  So I don't think that a volatile member inside a non-volatile
struct guarantees anything.

Now, with

struct x {
    long a;
    volatile unsigned int lock;
    unsigned int full : 1;
};

void
wrong(volatile struct x *ptr)
{
  ptr->full = 1;
}
        
IA64 uses

        .mmi
        ld8.acq r14 = [r32]
        ;;
        nop 0
        dep r14 = r15, r14, 32, 1
        ;;
        .mib
        st8.rel [r32] = r14

which seems to be an attempt to work around this issue (albeit a
possibly very slow one).
Comment 11 Petr Tesarik 2012-02-02 12:39:25 UTC
(In reply to comment #10)
> IA64 uses
> 
>         .mmi
>         ld8.acq r14 = [r32]
>         ;;
>         nop 0
>         dep r14 = r15, r14, 32, 1
>         ;;
>         .mib
>         st8.rel [r32] = r14
> 
> which seems to be an attempt to work around this issue (albeit a
> possibly very slow one).

Are you referring to the ".acq" and ".rel" forms? That doesn't change the situation at all. All it does is ensure correct memory ordering with respect to external visibility, but it does nothing to avoid the race condition.
Comment 12 Richard Biener 2012-02-21 11:58:00 UTC
Created attachment 26712 [details]
candidate patch

Here is a candidate patch (it ICEs one Ada testcase, see PR52134).  It enforces
the C++11 memory model (as far as bitfields are concerned) upon everyone and
builds on the machinery that was implemented for it (changing the implementation
for when we compute "an underlying object" and permanently store it, to make
it possible to use this information for optimization purposes).

It fixes the related PR48124 as well.

It's an invasive change that will change code generation on STRICT_ALIGNMENT
platforms quite severely for bitfield accesses.  So it is not an appropriate
fix for GCC 4.7 at this point of the development cycyle, nor would it be
appropriate to backport it.

Queued for GCC 4.8.
Comment 13 Richard Biener 2012-03-14 10:55:16 UTC
Author: rguenth
Date: Wed Mar 14 10:55:09 2012
New Revision: 185379

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=185379
Log:
2012-03-14  Richard Guenther  <rguenther@suse.de>

	* tree.h (DECL_BIT_FIELD_REPRESENTATIVE): New define.
	* stor-layout.c (start_bitfield_representative): New function.
	(finish_bitfield_representative): Likewise.
	(finish_bitfield_layout): Likewise.
	(finish_record_layout): Call finish_bitfield_layout.
	* tree.c (free_lang_data_in_decl): Only free DECL_QUALIFIER
	for QUAL_UNION_TYPE fields.
	* tree-streamer-in.c (lto_input_ts_field_decl_tree_pointers):
	Stream DECL_BIT_FIELD_REPRESENTATIVE.
	* tree-streamer-out.c (write_ts_field_decl_tree_pointers): Likewise.

	PR middle-end/52080
	PR middle-end/52097
	PR middle-end/48124
	* expr.c (get_bit_range): Unconditionally extract bitrange
	from DECL_BIT_FIELD_REPRESENTATIVE.
	(expand_assignment): Adjust call to get_bit_range.

	* gcc.dg/torture/pr48124-1.c: New testcase.
	* gcc.dg/torture/pr48124-2.c: Likewise.
	* gcc.dg/torture/pr48124-3.c: Likewise.
	* gcc.dg/torture/pr48124-4.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.dg/torture/pr48124-1.c
    trunk/gcc/testsuite/gcc.dg/torture/pr48124-2.c
    trunk/gcc/testsuite/gcc.dg/torture/pr48124-3.c
    trunk/gcc/testsuite/gcc.dg/torture/pr48124-4.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/expr.c
    trunk/gcc/stor-layout.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-streamer-in.c
    trunk/gcc/tree-streamer-out.c
    trunk/gcc/tree.c
    trunk/gcc/tree.h
Comment 14 Richard Biener 2012-03-14 10:57:32 UTC
Fixed for 4.8.
Comment 15 Richard Biener 2012-06-04 08:43:31 UTC
Author: rguenth
Date: Mon Jun  4 08:43:23 2012
New Revision: 188167

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188167
Log:
2012-06-04  Richard Guenther  <rguenther@suse.de>
	Eric Botcazou  <ebotcazou@adacore.com>

	Backport from mainline
	2012-04-03  Eric Botcazou  <ebotcazou@adacore.com>

        * expr.c (get_bit_range): Add OFFSET parameter and adjust BITPOS.
        Change type of BITOFFSET to signed.  Make sure the lower bound of
        the computed range is non-negative by adjusting OFFSET and BITPOS.
        (expand_assignment): Adjust call to get_bit_range.

	2012-03-27  Eric Botcazou  <ebotcazou@adacore.com>

        * expr.c (get_bit_range): Return the null range if the enclosing record
        is part of a larger bit field.

	2012-03-20  Richard Guenther  <rguenther@suse.de>

        * stor-layout.c (finish_bitfield_representative): Fallback
        to conservative maximum size if the padding up to the next
        field cannot be computed as a constant.
        (finish_bitfield_layout): If we cannot compute the distance
        between the start of the bitfield representative and the
        bitfield member start a new representative.
        * expr.c (get_bit_range): The distance between the start of
        the bitfield representative and the bitfield member is zero
        if the field offsets are not constants.

	2012-03-16  Richard Guenther  <rguenther@suse.de>

        * stor-layout.c (finish_bitfield_representative): Fall back
        to the conservative maximum size if we cannot compute the
        size of the tail padding.

	2012-03-14  Richard Guenther  <rguenther@suse.de>

	* tree.h (DECL_BIT_FIELD_REPRESENTATIVE): New define.
	* stor-layout.c (start_bitfield_representative): New function.
	(finish_bitfield_representative): Likewise.
	(finish_bitfield_layout): Likewise.
	(finish_record_layout): Call finish_bitfield_layout.
	* tree.c (free_lang_data_in_decl): Only free DECL_QUALIFIER
	for QUAL_UNION_TYPE fields.
	* tree-streamer-in.c (lto_input_ts_field_decl_tree_pointers):
	Stream DECL_BIT_FIELD_REPRESENTATIVE.
	* tree-streamer-out.c (write_ts_field_decl_tree_pointers): Likewise.

	PR middle-end/52080
	PR middle-end/52097
	PR middle-end/48124
	* expr.c (get_bit_range): Unconditionally extract bitrange
	from DECL_BIT_FIELD_REPRESENTATIVE.
	(expand_assignment): Adjust call to get_bit_range.

	* gcc.dg/torture/pr48124-1.c: New testcase.
	* gcc.dg/torture/pr48124-2.c: Likewise.
	* gcc.dg/torture/pr48124-3.c: Likewise.
	* gcc.dg/torture/pr48124-4.c: Likewise.
	* gnat.dg/pack16.adb: Likewise.
	* gnat.dg/pack16_pkg.ads: Likewise.
	* gnat.dg/pack17.adb: Likewise.
	* gnat.dg/specs/pack7.ads: Likewise.
	* gnat.dg/specs/pack8.ads: Likewise.
	* gnat.dg/specs/pack8_pkg.ads: Likewise.

Added:
    branches/gcc-4_7-branch/gcc/testsuite/gcc.dg/torture/pr48124-1.c
    branches/gcc-4_7-branch/gcc/testsuite/gcc.dg/torture/pr48124-2.c
    branches/gcc-4_7-branch/gcc/testsuite/gcc.dg/torture/pr48124-3.c
    branches/gcc-4_7-branch/gcc/testsuite/gcc.dg/torture/pr48124-4.c
    branches/gcc-4_7-branch/gcc/testsuite/gnat.dg/pack16.adb
    branches/gcc-4_7-branch/gcc/testsuite/gnat.dg/pack16_pkg.ads
    branches/gcc-4_7-branch/gcc/testsuite/gnat.dg/pack17.adb
    branches/gcc-4_7-branch/gcc/testsuite/gnat.dg/specs/pack7.ads
    branches/gcc-4_7-branch/gcc/testsuite/gnat.dg/specs/pack8.ads
    branches/gcc-4_7-branch/gcc/testsuite/gnat.dg/specs/pack8_pkg.ads
Modified:
    branches/gcc-4_7-branch/gcc/ChangeLog
    branches/gcc-4_7-branch/gcc/expr.c
    branches/gcc-4_7-branch/gcc/stor-layout.c
    branches/gcc-4_7-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_7-branch/gcc/tree-streamer-in.c
    branches/gcc-4_7-branch/gcc/tree-streamer-out.c
    branches/gcc-4_7-branch/gcc/tree.c
    branches/gcc-4_7-branch/gcc/tree.h
Comment 16 Jackie Rosen 2014-02-16 13:16:49 UTC Comment hidden (spam)