Description: Code includes two structures, a 6 byte struct of the following form: struct 6bytes { char [2] ; char [4] } and another that embeds this at the very end in the following form: struct { bool ; bool ; 6bytes } ;f compiler is generating code for assignment to the trailing 6byte struct using an 8 byte read from that address, then merging in the value to be assigned, and finally writing back 8 bytes. If this structure is allocated from memory where this 8 byte sequence is not accessable (ie: at the end of a shmat memory region for example), this results in a trap. When the memory is accessable the compiler generated code does happen to retain the value that follows the structure being accessed, but even if the 2 bytes of memory trailing the structure is accessable, this generated code that modified memory outside of the structure is not thread safe. This has been observed with GCC 4.2 prerelease compilers, but not GCC 3.3.3. I don't know if the issue applies to 4.1 or 4.0 compilers too. host/target/build triplets provided above. Other info: >Release: gcc (GCC) 4.2.0 20070317 (prerelease) >Environment: System: Linux hotel09 2.6.5-7.191-smp #1 SMP Tue Jun 28 14:58:56 UTC 2005 x86_64 x86_64 x86_64 GNU/Linux Architecture: x86_64 configured with: ../gcc-4.2.0-20070316/configure --prefix=/vbs/bldsupp.linux1/linuxamd64/gcc-4.2.0-20070316 --enable-threads=posix --enable-languages=c,c++ --enable-shared --enable-__cxa_atexit >How-To-Repeat: Sample code will be attached. This forces the memory following the structure to be unreadable by calling mprotect. Build command of the form: /view/peeterj_gcc-4.2.0-20070316-decfloat/vbs/bldsupp/linuxamd64/gcc-4.2.0/bin/g++ -O0 x.C -Wl,-rpath,/view/peeterj_gcc-4.2.0-20070316-decfloat/vbs/bldsupp/linuxamd64/gcc-4.2.0/lib64 -o ./xx (-rpath gunk is because I don't have the runtime locally installed) The following gdb session demonstrates the issue (but will make more sense when looking at the code). hotel09:/vbs/engn/squ> gdb x GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-suse-linux"...Using host libthread_db library "/lib64/tls/libthread_db.so.1". (gdb) run Starting program: /vbs/engn/squ/x [Thread debugging using libthread_db enabled] [New Thread 183048206464 (LWP 18902)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 183048206464 (LWP 18902)] 0x0000000000400edf in XXX::Initialize (this=0x7fbfff8ff8, iIndex=0) at x.C:51 51 pDictRidderInfo->dRID = INIT_6_BYTES_ZERO(); (gdb) disassemble $rip $rip+30 Dump of assembler code from 0x400edf to 0x400efd: 0x0000000000400edf <_ZN3XXX10InitializeEi+71>: mov 0xa(%rbx),%rcx 0x0000000000400ee3 <_ZN3XXX10InitializeEi+75>: mov %rcx,0xffffffffffffffd0(%rbp) 0x0000000000400ee7 <_ZN3XXX10InitializeEi+79>: movzbq %dl,%rcx 0x0000000000400eeb <_ZN3XXX10InitializeEi+83>: mov 0xffffffffffffffd0(%rbp),%rdx 0x0000000000400eef <_ZN3XXX10InitializeEi+87>: mov $0x0,%dl 0x0000000000400ef1 <_ZN3XXX10InitializeEi+89>: or %rcx,%rdx 0x0000000000400ef4 <_ZN3XXX10InitializeEi+92>: mov %rdx,0xffffffffffffffd0(%rbp) 0x0000000000400ef8 <_ZN3XXX10InitializeEi+96>: mov 0xffffffffffffffd0(%rbp),%rdx 0x0000000000400efc <_ZN3XXX10InitializeEi+100>: mov %rdx,0xa(%rbx) End of assembler dump. (gdb) p $rbx $1 = 548682059760 (gdb) p (void*)$rbx+0xa+6 $2 = (void *) 0x7fbfffd000 >Fix: Enabling optimization -O2 works around the issue, but the builds that we want to do with gcc-4.2 are development (-g) builds so -O isn't a natural fit.
Created attachment 13251 [details] source code that demonstrates the code generation issue described.
Confirmed. The program segfaults also with -fno-inline -O[12] and we create very interesting code for XXX::Initialize: _ZN3XXX10InitializeEi: .LFB4: movq (%rdi), %rax movslq %esi,%rsi pushq %rbx .LCFI0: movq (%rax,%rsi,8), %rbx movb $0, 8(%rbx) movb $0, 9(%rbx) call _Z17INIT_6_BYTES_ZEROv movq 10(%rbx), %rdx movzbl %al, %esi movzbl %ah, %ecx xorb %dl, %dl orq %rsi, %rdx movq %rax, %rsi movb %cl, %dh movq %rax, %rcx andl $4278190080, %esi andl $16711680, %ecx andq $-16711681, %rdx orq %rcx, %rdx movabsq $-4278190081, %rcx andq %rcx, %rdx movabsq $1095216660480, %rcx orq %rsi, %rdx movq %rax, %rsi andq %rcx, %rsi movabsq $-1095216660481, %rcx andq %rcx, %rdx movabsq $280375465082880, %rcx andq %rcx, %rax orq %rsi, %rdx movabsq $-280375465082881, %rcx andq %rcx, %rdx orq %rax, %rdx movq %rdx, 10(%rbx) popq %rbx ret
Thanks for figuring out how to reproduce this standalone, Peeter!
removing Taavi from the CC list. Any update on getting this resolved?
The tree IL looks sane.
The following testcase exposes the problem in Initialize() struct STRUCT_6_BYTES { unsigned char slot[sizeof(unsigned short)]; unsigned char page[sizeof(unsigned int)]; }; struct SQLU_DICT_INFO_0 { void * pBlah; char bSomeFlag1; char bSomeFlag2; STRUCT_6_BYTES dRID; }; STRUCT_6_BYTES INIT_6_BYTES_ZERO(void) { STRUCT_6_BYTES ridOut = { {0,0}, {0,0,0,0} } ; return ridOut; } void Initialize(SQLU_DICT_INFO_0 *pDictRidderInfo) { pDictRidderInfo->bSomeFlag1 = 0; pDictRidderInfo->bSomeFlag2 = 0; pDictRidderInfo->dRID = INIT_6_BYTES_ZERO(); }
We use store_field() to a bitpos of 80 with a bitsize of 48 to a mem:BLK target which in turn calls store_expr on an adjusted mem which calls temp = expand_expr_real (exp, tmp_target, GET_MODE (target), (call_param_p ? EXPAND_STACK_PARM : EXPAND_NORMAL), &alt_rtl); that in turn creates this crapload of rtl via else if (TYPE_MODE (TREE_TYPE (exp)) == BLKmode) { target = copy_blkmode_from_reg (target, valreg, TREE_TYPE (exp)); and /* Copy the structure BITSIZE bites at a time. We could probably emit more efficient code for machines which do not use strict alignment, but it doesn't seem worth the effort at the current time. */ not worth the effort!? uh :/ And the problem seems to be that we do store_bit_field of 1 byte at a time but are using word_mode for it. At (mem/s/j:DI (plus:DI (reg/f:DI 60) (const_int 10 [0xa])) [0 <variable>.dRID+0 S8 A16]) storing with word_mode will write beyond the structure bounds though. (store_bit_field will in turn use extraction and write-back to store each of the bytes, so we will be better off using it directly for quantities less than word_mode). To summarize: copy_blkmode_from_reg () doesn't deal with a target size that is less than the size of word_mode (or even a size that is not a multiple of the size of word_mode). The fix is either to make copy_blkmode_from_reg () handle this case or not use that, but store_bit_field in the caller. Of course maybe the bug is also that the type of the call expression has BLKmode but the return value is passed in (reg:DI ax). And of course I'm lost here ;)
hammer branch is bad as well. Code works with -m32.
This helps (the loop already tries to copy the content byte wise, but uses the wrong modes for that), but someone knowledgeable in that bitfield business should look at it. In particular it might happen that extract_bit_field() doesn't return a value in byte_mode as it's only a suggestion, in which case we might ICE later. In that case we probably need to fiddle with explicitely building subregs. Index: expr.c =================================================================== --- expr.c (revision 126382) +++ expr.c (working copy) @@ -2131,10 +2131,10 @@ copy_blkmode_from_reg (rtx tgtblk, rtx s /* Use xbitpos for the source extraction (right justified) and xbitpos for the destination store (left justified). */ - store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, word_mode, + store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, byte_mode, extract_bit_field (src, bitsize, xbitpos % BITS_PER_WORD, 1, - NULL_RTX, word_mode, word_mode)); + NULL_RTX, word_mode, byte_mode)); } return tgtblk;
This is also a missed optimization ;)
I see that this bug fix is targetted for 4.1.3, but also see that such a release isn't planned: http://gcc.gnu.org/ml/gcc/2007-05/msg00669.html should the target release for this bug be updated to 4.2 or 4.3?
It's not marked as regression, so it shouldn't have a release target at all.
It's a regression relative to gcc version 3.3.3.
With optimization we now avoid the error, but unoptimized code is still wrong: _Z10InitializeP16SQLU_DICT_INFO_0: .LFB3: pushq %rbp .LCFI2: movq %rsp, %rbp .LCFI3: pushq %rbx .LCFI4: subq $8, %rsp .LCFI5: movq %rdi, -16(%rbp) movq -16(%rbp), %rax movb $0, 8(%rax) movq -16(%rbp), %rax movb $0, 9(%rax) movq -16(%rbp), %rbx call _Z17INIT_6_BYTES_ZEROv movzbl %al, %ecx movzbl 10(%rbx), %edx andl $0, %edx orl %ecx, %edx movb %dl, 10(%rbx) movzbl %ah, %edx mov %edx, %ecx movq 10(%rbx), %rdx ^^^^^^^^^^^^^ here movb %cl, %dh movq %rdx, 10(%rbx) ^^^^^^^^^^^^^ and here movq %rax, %rdx shrq $16, %rdx movzbq %dl,%rcx movzbl 12(%rbx), %edx andl $0, %edx orl %ecx, %edx movb %dl, 12(%rbx) movq %rax, %rdx shrq $24, %rdx movzbq %dl,%rcx movzbl 13(%rbx), %edx andl $0, %edx orl %ecx, %edx movb %dl, 13(%rbx) movq %rax, %rdx shrq $32, %rdx movzbq %dl,%rcx movzbl 14(%rbx), %edx andl $0, %edx orl %ecx, %edx movb %dl, 14(%rbx) shrq $40, %rax movzbq %al,%rdx movzbl 15(%rbx), %eax andl $0, %eax orl %edx, %eax movb %al, 15(%rbx) addq $8, %rsp popq %rbx leave ret without inlining we do optimize this to _Z10InitializeP16SQLU_DICT_INFO_0: .LFB3: pushq %rbx .LCFI0: movq %rdi, %rbx movb $0, 8(%rdi) movb $0, 9(%rdi) call _Z17INIT_6_BYTES_ZEROv movb %al, 10(%rbx) movzbl %ah, %edx movb %dl, 11(%rbx) movq %rax, %rdx shrq $16, %rdx movb %dl, 12(%rbx) movq %rax, %rdx shrq $24, %rdx movb %dl, 13(%rbx) movq %rax, %rdx shrq $32, %rdx movb %dl, 14(%rbx) shrq $40, %rax movb %al, 15(%rbx) popq %rbx ret which doesn't have the 8 byte move any more. 4.2, even when optimizing retains the 8 byte move.
Ian, Eric? Any advice on the bitfield stuff as of comments 7/9?
Looking into it.
Michael, any reason for being so scrupulous and not forcing the mode? Index: expr.c =================================================================== --- expr.c (revision 131326) +++ expr.c (working copy) @@ -2061,6 +2061,7 @@ copy_blkmode_from_reg (rtx tgtblk, rtx s rtx src = NULL, dst = NULL; unsigned HOST_WIDE_INT bitsize = MIN (TYPE_ALIGN (type), BITS_PER_WORD); unsigned HOST_WIDE_INT bitpos, xbitpos, padding_correction = 0; + enum machine_mode copy_mode; if (tgtblk == 0) { @@ -2094,11 +2095,19 @@ copy_blkmode_from_reg (rtx tgtblk, rtx s padding_correction = (BITS_PER_WORD - ((bytes % UNITS_PER_WORD) * BITS_PER_UNIT)); - /* Copy the structure BITSIZE bites at a time. + /* Copy the structure BITSIZE bits at a time. If the target lives in + memory, take care of not reading/writing past its end by selecting + a copy mode suited to BITSIZE. This should always be possible given + how it is computed. We could probably emit more efficient code for machines which do not use strict alignment, but it doesn't seem worth the effort at the current time. */ + + if (!MEM_P (tgtblk) + || ((copy_mode = mode_for_size (bitsize, MODE_INT, 1)) == BLKmode)) + copy_mode = word_mode; + for (bitpos = 0, xbitpos = padding_correction; bitpos < bytes * BITS_PER_UNIT; bitpos += bitsize, xbitpos += bitsize) @@ -2117,11 +2126,11 @@ copy_blkmode_from_reg (rtx tgtblk, rtx s dst = operand_subword (tgtblk, bitpos / BITS_PER_WORD, 1, BLKmode); /* Use xbitpos for the source extraction (right justified) and - xbitpos for the destination store (left justified). */ - store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, word_mode, + bitpos for the destination store (left justified). */ + store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, copy_mode, extract_bit_field (src, bitsize, xbitpos % BITS_PER_WORD, 1, - NULL_RTX, word_mode, word_mode)); + NULL_RTX, copy_mode, copy_mode)); } return tgtblk;
It seems that this should work too: bitsize is 8 in this example IIRC, so copy_mode would be byte_mode, which is what we need here. Additionally using word_mode when we can seems worthwhile too. One thing would bother me, though: If TYPE_ALIGN(type) is very funny, then bitsize is too, so that mode_for_size could return BLKmode, so word_mode would be chosen to copy, which might then trip over the same problem. But for that TYPE_ALIGN must be so funny like 3 bits, i.e. something for which no INT mode exists. I believe it's impossible that TYPE_ALIGN is neither a multiple of word_mode nor a number for which mode_for_size ever returns BLKmode (i.e. either it returns BLKmode, but then only for very large aligns, which are multiples of word_mode, or it will return something != BLKmode, which we then can use). IOW: it's impossible that TYPE_ALIGN ever is something like 3*BITS_PER_UNIT. So, with that "argumentation" I think this would work, except for my probably theoretical fear from comment #9, that the mode argument for extract_bit_field is only a suggestion.
> It seems that this should work too: bitsize is 8 in this example IIRC, so > copy_mode would be byte_mode, which is what we need here. Additionally > using word_mode when we can seems worthwhile too. OK, thanks. > I believe it's impossible that TYPE_ALIGN is neither a multiple of word_mode > nor a number for which mode_for_size ever returns BLKmode (i.e. either it > returns BLKmode, but then only for very large aligns, which are multiples > of word_mode, or it will return something != BLKmode, which we then > can use). Yes, TYPE_ALIGN is supposed to be always a power of 2 if < BITS_PER_WORDS. > So, with that "argumentation" I think this would work, except for my > probably theoretical fear from comment #9, that the mode argument for > extract_bit_field is only a suggestion. In your patch, yes, in mine, no, that's what I call "forcing" the mode.
Subject: Bug 31309 Author: ebotcazou Date: Fri Jan 11 19:44:40 2008 New Revision: 131472 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=131472 Log: PR middle-end/31309 * expr.c (copy_blkmode_from_reg): Use a mode suited to the size when copying to memory. Added: trunk/gcc/testsuite/gcc.dg/struct-ret-3.c Modified: trunk/gcc/ChangeLog trunk/gcc/expr.c trunk/gcc/testsuite/ChangeLog
Fixed on the mainline. I'm not sure we want to put this on the other branches.
re: Fixed on the mainline. I'm not sure we want to put this on the other branches. Not knowing the gcc code in question, I can't comment on how much potential regression risk this fix will introduce. Assuming it's safe enough to apply without worry to other releases, I'd hope this would also be applied also to gcc-4.2 . The 4.2 version is what we were attempting to use when the bug was encountered, and switching to 4.3 at this point would likely be more destabilizing than the original bug (as well as potentially being a porting exersize as has traditionally been the case between major compiler revisions).
> Not knowing the gcc code in question, I can't comment on how much potential > regression risk this fix will introduce. Assuming it's safe enough to apply > without worry to other releases, I'd hope this would also be applied also to > gcc-4.2 . OK, I can backport it to the 4.2 branch if no maintainers disagree.
Subject: Bug 31309 Author: ebotcazou Date: Thu Jan 17 13:22:21 2008 New Revision: 131599 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=131599 Log: Backport from mainline: 2008-01-11 Eric Botcazou <ebotcazou@adacore.com> PR middle-end/31309 * expr.c (copy_blkmode_from_reg): Use a mode suited to the size when copying to memory. Added: branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/struct-ret-3.c Modified: branches/gcc-4_2-branch/gcc/ChangeLog branches/gcc-4_2-branch/gcc/expr.c branches/gcc-4_2-branch/gcc/testsuite/ChangeLog
I don't plan to backport this to the 4.1 branch.
(In reply to comment #25) > I don't plan to backport this to the 4.1 branch. > Thanks for both fixing this and backporting to 4.2!