This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
RE: [PATCH] Fix another wrong-code bug with -fstrict-volatile-bitfields
- From: Bernd Edlinger <bernd dot edlinger at hotmail dot de>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Eric Botcazou <ebotcazou at adacore dot com>
- Date: Thu, 5 Mar 2015 16:05:26 +0100
- Subject: RE: [PATCH] Fix another wrong-code bug with -fstrict-volatile-bitfields
- Authentication-results: sourceware.org; auth=none
- References: <DUB118-W288F9097DAAF6E3E544713E41E0 at phx dot gbl>,<DUB118-W1B6CB327B0F0BBDB67F1BE41E0 at phx dot gbl>,<CAFiYyc19LMd7=dLFWu+5UpvmxdqfBL-A=VeqHx_xFS3gYMMCSg at mail dot gmail dot com>,<DUB118-W252167753CCC32AFAEE0C6E41F0 at phx dot gbl>,<CAFiYyc06G1oPUo3YiEzC+ssY7DFhk5hpP+DKHAMtuRVDjyBk3A at mail dot gmail dot com>
Hi,
On Thu, 5 Mar 2015 12:24:56, Richard Biener wrote:
>
> On Thu, Mar 5, 2015 at 12:00 PM, Bernd Edlinger
> <bernd.edlinger@hotmail.de> wrote:
>> Hi,
>>
>> On Thu, 5 Mar 2015 10:00:26, Richard Biener wrote:
>>>
>>> On Wed, Mar 4, 2015 at 3:13 PM, Bernd Edlinger
>> Maybe one thing is missing from strict_volatile_bitfield_p, I am not sure.
>>
>> Maybe it should check that MEM_ALIGN (op0)>= GET_MODE_ALIGNMENT (fieldmode)
>> Because the strict volatile bitfields handling will inevitably try to use
>> the fieldmode to access the memory.
>>
>> Or would it be better to say MEM_ALIGN (op0)>= GET_MODE_BITSIZE (fieldmode),
>> because it is easier to explain when some one asks, when we guarantee the semantics
>> of strict volatile bitfield?
>
> But on non-strict-align targets you can even for 1-byte aligned MEMs
> access an SImode field directly. So the old code looks correct to me
> here and the fix needs to be done somewhere else.
>
But this SImode access is split up in several QImode or HImode accesses,
in the processor's execution pipeline, finally on an external bus like AXI all memory
transactions are aligned. The difference is just that some processors can split up
the unaligned accesses and some need help from the compiler, but even on an
x86 we have a different semantics for unaligned acceses, that is they are no longer atomic,
while an aligned access is always executed as an atomic transaction on an x86 processor.
>> Probably there is already something in the logic in expr.c that prevents these cases,
>> because otherwise it would be way to easy to find an example for unaligned accesses
>> to unaligned memory on STRICT_ALIGNMENT targets.
>>
>>
>> Ok, what would you think about this variant?
>>
>> --- expmed.c.jj 2015-01-16 11:20:40.000000000 +0100
>> +++ expmed.c 2015-03-05 11:50:09.400444416 +0100
>> @@ -472,9 +472,11 @@ strict_volatile_bitfield_p (rtx op0, uns
>> return false;
>>
>> /* Check for cases of unaligned fields that must be split. */
>> - if (bitnum % BITS_PER_UNIT + bitsize> modesize
>> - || (STRICT_ALIGNMENT
>> - && bitnum % GET_MODE_ALIGNMENT (fieldmode) + bitsize> modesize))
>> + if (bitnum % modesize + bitsize> modesize)
>> + return false;
>> +
>> + /* Check that the memory is sufficiently aligned. */
>> + if (MEM_ALIGN (op0) < modesize)
>
> I think that only applies to strict-align targets and then only for
> GET_MODE_ALIGNMENT (modesize). And of course what matters
> is the alignment at bitnum - even though op0 may be not sufficiently
> aligned it may have known misalignment so that op0 + bitnum is
> sufficiently aligned.
>
> Testcases would need to annotate structs with packed/aligned attributes
> to get at these cases.
>
> For the testcase included in the patch, what does the patch end up doing?
> Not going the strict-volatile bitfield expansion path? That looks unnecessary
> on !strict-alignment targets but resonable on strict-align targets where the
> access would need to be splitted. So, why does it end up being splitted
> on !strict-align targets?
>
gcc -fstrict-volatile-bitfields -S 20150304-1.c
without patch we find this in 20150304-1.s:
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movzbl global+1(%rip), %eax
orl $-1, %eax
movb %al, global+1(%rip)
movzbl global+2(%rip), %eax
orl $-1, %eax
movb %al, global+2(%rip)
movzbl global+3(%rip), %eax
orl $-1, %eax
movb %al, global+3(%rip)
movzbl global+4(%rip), %eax
orl $127, %eax
movb %al, global+4(%rip)
movl global(%rip), %eax
shrl $8, %eax
andl $2147483647, %eax
cmpl $2147483647, %eax
je .L2
call abort
.L2:
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
so the write path is OK, because strict_volatile_bitfield_p returns false,
because
/* Check for cases where the C++ memory model applies. */
if (bitregion_end != 0
&& (bitnum - bitnum % modesize < bitregion_start
|| bitnum - bitnum % modesize + modesize - 1> bitregion_end))
return false;
bitregion_start=8, bitregion_end=39
bitnum - bitnum % modesize = 0
bitnum - bitnum % modesize + modesize - 1 = 31
this does not happen in the read path, and the problem is the access
here does not work:
str_rtx = narrow_bit_field_mem (str_rtx, fieldmode, bitsize, bitnum,
&bitnum);
/* Explicitly override the C/C++ memory model; ignore the
bit range so that we can do the access in the mode mandated
by -fstrict-volatile-bitfields instead. */
store_fixed_bit_field_1 (str_rtx, bitsize, bitnum, value);
str_rtx = unchanged, bitnum = 8, bitsize= 31, but store_fixed_bit_fileld_1
can not handle that.
BTW: I can make the write code path also fail if I change this bit
in the test case add ":8" to char x:
struct s
{
char x:8;
unsigned int y:31;
} __attribute__((packed));
now the bit region is from 0..39, and we get this code:
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl global(%rip), %eax
orl $-256, %eax
movl %eax, global(%rip)
movl global(%rip), %eax
shrl $8, %eax
andl $2147483647, %eax
cmpl $2147483647, %eax
je .L2
call abort
.L2:
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
The patch (I think, I would still prefer my initial proposal) fixes
this by skipping the strict alingment code path. That looks OK for me,
because the structure is always packed if that happens, and it can't
be used to access "device registers", which are by definition always
naturally aligned, at least to the machine word size.
with the patch we get:
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movzbl global+1(%rip), %eax
orl $-1, %eax
movb %al, global+1(%rip)
movzbl global+2(%rip), %eax
orl $-1, %eax
movb %al, global+2(%rip)
movzbl global+3(%rip), %eax
orl $-1, %eax
movb %al, global+3(%rip)
movzbl global+4(%rip), %eax
orl $127, %eax
movb %al, global+4(%rip)
movzbl global+1(%rip), %eax
movzbl %al, %eax
movzbl global+2(%rip), %edx
movzbl %dl, %edx
salq $8, %rdx
orq %rax, %rdx
movzbl global+3(%rip), %eax
movzbl %al, %eax
salq $16, %rax
orq %rax, %rdx
movzbl global+4(%rip), %eax
movzbl %al, %eax
andl $127, %eax
salq $24, %rax
orq %rdx, %rax
cmpl $2147483647, %eax
je .L2
call abort
.L2:
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
every access is split in 4 QImode accesses. but that is as
expected, because the structure is byte aligned.
Thanks
Bernd.