31309 – [4.1 regression] reads/writes past end of structure

Bug 31309 - [4.1 regression] reads/writes past end of structure

Summary: [4.1 regression] reads/writes past end of structure

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	middle-end (show other bugs)
Version:	4.2.0

Importance:	P2 normal
Target Milestone:	4.2.3
Assignee:	Eric Botcazou

URL:	http://gcc.gnu.org/ml/gcc-patches/200...
Keywords:	missed-optimization, wrong-code

Depends on:
Blocks:

Reported:	2007-03-22 04:24 UTC by Peeter Joot
Modified:	2008-01-21 20:00 UTC (History)
CC List:	11 users (show)

See Also:
Host:
Target:	x86_64-unknown-linux-gnu
Build:
Known to work:	3.3.3
Known to fail:	4.0.4 4.1.2
Last reconfirmed:	2008-01-21 19:55:26

Attachments
source code that demonstrates the code generation issue described. (708 bytes, text/plain) 2007-03-22 04:28 UTC, Peeter Joot	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Peeter Joot 2007-03-22 04:24:40 UTC

Description:

Code includes two structures, a 6 byte struct of the following form:

struct 6bytes { char [2] ; char [4] }

and another that embeds this at the very end in the following form:

struct { bool ; bool ; 6bytes } ;f

compiler is generating code for assignment to the trailing 6byte struct using an 8 byte read from that address, then merging in the value to be assigned, and finally writing back 8 bytes.

If this structure is allocated from memory where this 8 byte sequence is not accessable (ie: at the end of a shmat memory region for example), this results in a trap.  When the memory is accessable the compiler generated code does happen to retain the value that follows the structure being accessed, but even if the 2 bytes of memory trailing the structure is accessable, this generated code that modified memory outside of the structure is not thread safe.

This has been observed with GCC 4.2 prerelease compilers, but not GCC 3.3.3.  I don't know if the issue applies to 4.1 or 4.0 compilers too.

host/target/build triplets provided above.  Other info:

>Release:   gcc (GCC) 4.2.0 20070317 (prerelease)
>Environment:
System: Linux hotel09 2.6.5-7.191-smp #1 SMP Tue Jun 28 14:58:56 UTC 2005 x86_64 x86_64 x86_64 GNU/Linux
Architecture: x86_64

configured with: ../gcc-4.2.0-20070316/configure --prefix=/vbs/bldsupp.linux1/linuxamd64/gcc-4.2.0-20070316 --enable-threads=posix --enable-languages=c,c++ --enable-shared --enable-__cxa_atexit

>How-To-Repeat:
  Sample code will be attached.  This forces the memory following the structure to be unreadable by calling mprotect.

Build command of the form:
/view/peeterj_gcc-4.2.0-20070316-decfloat/vbs/bldsupp/linuxamd64/gcc-4.2.0/bin/g++ -O0 x.C -Wl,-rpath,/view/peeterj_gcc-4.2.0-20070316-decfloat/vbs/bldsupp/linuxamd64/gcc-4.2.0/lib64 -o ./xx

(-rpath gunk is because I don't have the runtime locally installed)

The following gdb session demonstrates the issue (but will make more sense when looking at the code).

hotel09:/vbs/engn/squ> gdb x
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-suse-linux"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".

(gdb) run
Starting program: /vbs/engn/squ/x
[Thread debugging using libthread_db enabled]
[New Thread 183048206464 (LWP 18902)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 183048206464 (LWP 18902)]
0x0000000000400edf in XXX::Initialize (this=0x7fbfff8ff8, iIndex=0) at x.C:51
51          pDictRidderInfo->dRID = INIT_6_BYTES_ZERO();
(gdb) disassemble $rip $rip+30
Dump of assembler code from 0x400edf to 0x400efd:
0x0000000000400edf <_ZN3XXX10InitializeEi+71>:  mov    0xa(%rbx),%rcx
0x0000000000400ee3 <_ZN3XXX10InitializeEi+75>:  mov    %rcx,0xffffffffffffffd0(%rbp)
0x0000000000400ee7 <_ZN3XXX10InitializeEi+79>:  movzbq %dl,%rcx
0x0000000000400eeb <_ZN3XXX10InitializeEi+83>:  mov    0xffffffffffffffd0(%rbp),%rdx
0x0000000000400eef <_ZN3XXX10InitializeEi+87>:  mov    $0x0,%dl
0x0000000000400ef1 <_ZN3XXX10InitializeEi+89>:  or     %rcx,%rdx
0x0000000000400ef4 <_ZN3XXX10InitializeEi+92>:  mov    %rdx,0xffffffffffffffd0(%rbp)
0x0000000000400ef8 <_ZN3XXX10InitializeEi+96>:  mov    0xffffffffffffffd0(%rbp),%rdx
0x0000000000400efc <_ZN3XXX10InitializeEi+100>: mov    %rdx,0xa(%rbx)
End of assembler dump.
(gdb) p $rbx
$1 = 548682059760
(gdb) p (void*)$rbx+0xa+6
$2 = (void *) 0x7fbfffd000


>Fix:
  Enabling optimization -O2 works around the issue, but the builds that we want to do with gcc-4.2 are development (-g) builds so -O isn't a natural fit.

Comment 1 Peeter Joot 2007-03-22 04:28:48 UTC

Created attachment 13251 [details]
source code that demonstrates the code generation issue described.

Comment 2 Richard Biener 2007-03-22 10:03:14 UTC

Confirmed.  The program segfaults also with -fno-inline -O[12] and we create very
interesting code for XXX::Initialize:

_ZN3XXX10InitializeEi:
.LFB4:
        movq    (%rdi), %rax
        movslq  %esi,%rsi
        pushq   %rbx
.LCFI0:
        movq    (%rax,%rsi,8), %rbx
        movb    $0, 8(%rbx)
        movb    $0, 9(%rbx)
        call    _Z17INIT_6_BYTES_ZEROv
        movq    10(%rbx), %rdx
        movzbl  %al, %esi
        movzbl  %ah, %ecx
        xorb    %dl, %dl
        orq     %rsi, %rdx
        movq    %rax, %rsi
        movb    %cl, %dh
        movq    %rax, %rcx
        andl    $4278190080, %esi
        andl    $16711680, %ecx
        andq    $-16711681, %rdx
        orq     %rcx, %rdx
        movabsq $-4278190081, %rcx
        andq    %rcx, %rdx
        movabsq $1095216660480, %rcx
        orq     %rsi, %rdx
        movq    %rax, %rsi
        andq    %rcx, %rsi
        movabsq $-1095216660481, %rcx
        andq    %rcx, %rdx
        movabsq $280375465082880, %rcx
        andq    %rcx, %rax
        orq     %rsi, %rdx
        movabsq $-280375465082881, %rcx
        andq    %rcx, %rdx
        orq     %rax, %rdx
        movq    %rdx, 10(%rbx)
        popq    %rbx
        ret

Comment 3 Taavi Burns 2007-03-22 15:49:55 UTC

Thanks for figuring out how to reproduce this standalone, Peeter!

Comment 4 Peeter Joot 2007-06-27 14:49:42 UTC

removing Taavi from the CC list.

Any update on getting this resolved?

Comment 5 Richard Biener 2007-07-06 13:54:05 UTC

The tree IL looks sane.

Comment 6 Richard Biener 2007-07-06 14:05:21 UTC

The following testcase exposes the problem in Initialize()

struct STRUCT_6_BYTES
{
    unsigned char slot[sizeof(unsigned short)];
    unsigned char page[sizeof(unsigned int)];
};

struct SQLU_DICT_INFO_0
{
  void * pBlah;
  char bSomeFlag1;
  char bSomeFlag2;
  STRUCT_6_BYTES dRID;
};

STRUCT_6_BYTES INIT_6_BYTES_ZERO(void)
{
    STRUCT_6_BYTES ridOut = { {0,0}, {0,0,0,0} } ;
    return ridOut;
}

void Initialize(SQLU_DICT_INFO_0 *pDictRidderInfo)
{   
    pDictRidderInfo->bSomeFlag1 = 0;
    pDictRidderInfo->bSomeFlag2 = 0;
    pDictRidderInfo->dRID = INIT_6_BYTES_ZERO();
}

Comment 7 Richard Biener 2007-07-06 14:58:49 UTC

We use store_field() to a bitpos of 80 with a bitsize of 48 to a mem:BLK target
which in turn calls store_expr on an adjusted mem which calls

      temp = expand_expr_real (exp, tmp_target, GET_MODE (target),
                               (call_param_p
                                ? EXPAND_STACK_PARM : EXPAND_NORMAL),
                               &alt_rtl);

that in turn creates this crapload of rtl via

      else if (TYPE_MODE (TREE_TYPE (exp)) == BLKmode)
        {
          target = copy_blkmode_from_reg (target, valreg, TREE_TYPE (exp));

and

  /* Copy the structure BITSIZE bites at a time.

     We could probably emit more efficient code for machines which do not use
     strict alignment, but it doesn't seem worth the effort at the current
     time.  */

not worth the effort!?  uh :/

And the problem seems to be that we do store_bit_field of 1 byte at a time
but are using word_mode for it.  At

(mem/s/j:DI (plus:DI (reg/f:DI 60)
        (const_int 10 [0xa])) [0 <variable>.dRID+0 S8 A16])

storing with word_mode will write beyond the structure bounds though.
(store_bit_field will in turn use extraction and write-back to store each
of the bytes, so we will be better off using it directly for quantities
less than word_mode).

To summarize: copy_blkmode_from_reg () doesn't deal with a target size
that is less than the size of word_mode (or even a size that is not
a multiple of the size of word_mode).

The fix is either to make copy_blkmode_from_reg () handle this case or not
use that, but store_bit_field in the caller.

Of course maybe the bug is also that the type of the call expression has
BLKmode but the return value is passed in (reg:DI ax).

And of course I'm lost here ;)

Comment 8 Richard Biener 2007-07-06 15:06:04 UTC

hammer branch is bad as well.  Code works with -m32.

Comment 9 Michael Matz 2007-07-06 15:42:11 UTC

This helps (the loop already tries to copy the content byte wise, but uses
the wrong modes for that), but someone knowledgeable in that bitfield business
should look at it.  In particular it might happen that extract_bit_field()
doesn't return a value in byte_mode as it's only a suggestion, in which case
we might ICE later.  In that case we probably need to fiddle with explicitely
building subregs.

Index: expr.c
===================================================================
--- expr.c      (revision 126382)
+++ expr.c      (working copy)
@@ -2131,10 +2131,10 @@ copy_blkmode_from_reg (rtx tgtblk, rtx s

       /* Use xbitpos for the source extraction (right justified) and
         xbitpos for the destination store (left justified).  */
-      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, word_mode,
+      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, byte_mode,
                       extract_bit_field (src, bitsize,
                                          xbitpos % BITS_PER_WORD, 1,
-                                         NULL_RTX, word_mode, word_mode));
+                                         NULL_RTX, word_mode, byte_mode));
     }

   return tgtblk;

Comment 10 Richard Biener 2007-07-06 15:45:12 UTC

This is also a missed optimization ;)

Comment 11 Peeter Joot 2008-01-02 23:09:05 UTC

I see that this bug fix is targetted for 4.1.3, but also see that such a release isn't planned:

http://gcc.gnu.org/ml/gcc/2007-05/msg00669.html

should the target release for this bug be updated to 4.2 or 4.3?

Comment 12 Richard Biener 2008-01-03 09:05:21 UTC

It's not marked as regression, so it shouldn't have a release target at all.

Comment 13 Peeter Joot 2008-01-03 15:26:38 UTC

It's a regression relative to gcc version 3.3.3.

Comment 14 Richard Biener 2008-01-03 17:06:20 UTC

With optimization we now avoid the error, but unoptimized code is still wrong:

_Z10InitializeP16SQLU_DICT_INFO_0:
.LFB3:
        pushq   %rbp
.LCFI2:
        movq    %rsp, %rbp
.LCFI3:
        pushq   %rbx
.LCFI4:
        subq    $8, %rsp
.LCFI5:
        movq    %rdi, -16(%rbp)
        movq    -16(%rbp), %rax
        movb    $0, 8(%rax)
        movq    -16(%rbp), %rax
        movb    $0, 9(%rax)
        movq    -16(%rbp), %rbx
        call    _Z17INIT_6_BYTES_ZEROv
        movzbl  %al, %ecx
        movzbl  10(%rbx), %edx
        andl    $0, %edx
        orl     %ecx, %edx
        movb    %dl, 10(%rbx)
        movzbl  %ah, %edx
        mov     %edx, %ecx
        movq    10(%rbx), %rdx
   ^^^^^^^^^^^^^  here
        movb    %cl, %dh
        movq    %rdx, 10(%rbx)
   ^^^^^^^^^^^^^  and here
        movq    %rax, %rdx
        shrq    $16, %rdx
        movzbq  %dl,%rcx
        movzbl  12(%rbx), %edx
        andl    $0, %edx
        orl     %ecx, %edx
        movb    %dl, 12(%rbx)
        movq    %rax, %rdx
        shrq    $24, %rdx
        movzbq  %dl,%rcx
        movzbl  13(%rbx), %edx
        andl    $0, %edx
        orl     %ecx, %edx
        movb    %dl, 13(%rbx)
        movq    %rax, %rdx
        shrq    $32, %rdx
        movzbq  %dl,%rcx
        movzbl  14(%rbx), %edx
        andl    $0, %edx
        orl     %ecx, %edx
        movb    %dl, 14(%rbx)
        shrq    $40, %rax
        movzbq  %al,%rdx
        movzbl  15(%rbx), %eax
        andl    $0, %eax
        orl     %edx, %eax
        movb    %al, 15(%rbx)
        addq    $8, %rsp
        popq    %rbx
        leave
        ret

without inlining we do optimize this to

_Z10InitializeP16SQLU_DICT_INFO_0:
.LFB3:
        pushq   %rbx
.LCFI0:
        movq    %rdi, %rbx
        movb    $0, 8(%rdi)
        movb    $0, 9(%rdi)
        call    _Z17INIT_6_BYTES_ZEROv
        movb    %al, 10(%rbx)
        movzbl  %ah, %edx
        movb    %dl, 11(%rbx)
        movq    %rax, %rdx
        shrq    $16, %rdx
        movb    %dl, 12(%rbx)
        movq    %rax, %rdx
        shrq    $24, %rdx
        movb    %dl, 13(%rbx)
        movq    %rax, %rdx
        shrq    $32, %rdx
        movb    %dl, 14(%rbx)
        shrq    $40, %rax
        movb    %al, 15(%rbx)
        popq    %rbx
        ret

which doesn't have the 8 byte move any more.  4.2, even when optimizing
retains the 8 byte move.

Comment 15 Richard Biener 2008-01-03 17:10:36 UTC

Ian, Eric?  Any advice on the bitfield stuff as of comments 7/9?

Comment 16 Eric Botcazou 2008-01-03 21:41:51 UTC

Looking into it.

Comment 17 Eric Botcazou 2008-01-08 17:10:49 UTC

Michael, any reason for being so scrupulous and not forcing the mode?

Index: expr.c
===================================================================
--- expr.c	(revision 131326)
+++ expr.c	(working copy)
@@ -2061,6 +2061,7 @@ copy_blkmode_from_reg (rtx tgtblk, rtx s
   rtx src = NULL, dst = NULL;
   unsigned HOST_WIDE_INT bitsize = MIN (TYPE_ALIGN (type), BITS_PER_WORD);
   unsigned HOST_WIDE_INT bitpos, xbitpos, padding_correction = 0;
+  enum machine_mode copy_mode;
 
   if (tgtblk == 0)
     {
@@ -2094,11 +2095,19 @@ copy_blkmode_from_reg (rtx tgtblk, rtx s
     padding_correction
       = (BITS_PER_WORD - ((bytes % UNITS_PER_WORD) * BITS_PER_UNIT));
 
-  /* Copy the structure BITSIZE bites at a time.
+  /* Copy the structure BITSIZE bits at a time.  If the target lives in
+     memory, take care of not reading/writing past its end by selecting
+     a copy mode suited to BITSIZE.  This should always be possible given
+     how it is computed.
 
      We could probably emit more efficient code for machines which do not use
      strict alignment, but it doesn't seem worth the effort at the current
      time.  */
+
+  if (!MEM_P (tgtblk)
+      || ((copy_mode = mode_for_size (bitsize, MODE_INT, 1)) == BLKmode))
+    copy_mode = word_mode;
+
   for (bitpos = 0, xbitpos = padding_correction;
        bitpos < bytes * BITS_PER_UNIT;
        bitpos += bitsize, xbitpos += bitsize)
@@ -2117,11 +2126,11 @@ copy_blkmode_from_reg (rtx tgtblk, rtx s
 	dst = operand_subword (tgtblk, bitpos / BITS_PER_WORD, 1, BLKmode);
 
       /* Use xbitpos for the source extraction (right justified) and
-	 xbitpos for the destination store (left justified).  */
-      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, word_mode,
+	 bitpos for the destination store (left justified).  */
+      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, copy_mode,
 		       extract_bit_field (src, bitsize,
 					  xbitpos % BITS_PER_WORD, 1,
-					  NULL_RTX, word_mode, word_mode));
+					  NULL_RTX, copy_mode, copy_mode));
     }
 
   return tgtblk;

Comment 18 Michael Matz 2008-01-09 10:32:04 UTC

It seems that this should work too: bitsize is 8 in this example IIRC, so
copy_mode would be byte_mode, which is what we need here.  Additionally using
word_mode when we can seems worthwhile too.

One thing would bother me, though: If TYPE_ALIGN(type) is very funny, then
bitsize is too, so that mode_for_size could return BLKmode, so word_mode would
be chosen to copy, which might then trip over the same problem.  But for that
TYPE_ALIGN must be so funny like 3 bits, i.e. something for which no INT mode
exists.  I believe it's impossible that TYPE_ALIGN is neither a multiple of
word_mode nor a number for which mode_for_size ever returns BLKmode (i.e.
either it returns BLKmode, but then only for very large aligns, which are
multiples of word_mode, or it will return something != BLKmode, which we then
can use).

IOW: it's impossible that TYPE_ALIGN ever is something like 3*BITS_PER_UNIT.

So, with that "argumentation" I think this would work, except for my probably
theoretical fear from comment #9, that the mode argument for extract_bit_field
is only a suggestion.

Comment 19 Eric Botcazou 2008-01-09 11:02:58 UTC

> It seems that this should work too: bitsize is 8 in this example IIRC, so
> copy_mode would be byte_mode, which is what we need here.  Additionally
> using word_mode when we can seems worthwhile too.

OK, thanks.

> I believe it's impossible that TYPE_ALIGN is neither a multiple of word_mode
> nor a number for which mode_for_size ever returns BLKmode (i.e. either it
> returns BLKmode, but then only for very large aligns, which are multiples
> of word_mode, or it will return something != BLKmode, which we then
> can use).

Yes, TYPE_ALIGN is supposed to be always a power of 2 if < BITS_PER_WORDS.

> So, with that "argumentation" I think this would work, except for my
> probably theoretical fear from comment #9, that the mode argument for
> extract_bit_field is only a suggestion.

In your patch, yes, in mine, no, that's what I call "forcing" the mode.

Comment 20 Eric Botcazou 2008-01-11 19:45:24 UTC

Subject: Bug 31309

Author: ebotcazou
Date: Fri Jan 11 19:44:40 2008
New Revision: 131472

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=131472
Log:
	PR middle-end/31309
	* expr.c (copy_blkmode_from_reg): Use a mode suited to the size when
	copying to memory.


Added:
    trunk/gcc/testsuite/gcc.dg/struct-ret-3.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/expr.c
    trunk/gcc/testsuite/ChangeLog

Comment 21 Eric Botcazou 2008-01-11 19:47:43 UTC

Fixed on the mainline.  I'm not sure we want to put this on the other branches.

Comment 22 Peeter Joot 2008-01-16 00:04:48 UTC

re: Fixed on the mainline.  I'm not sure we want to put this on the other branches.

Not knowing the gcc code in question, I can't comment on how much potential regression risk this fix will introduce.  Assuming it's safe enough to apply without worry to other releases, I'd hope this would also be applied also to gcc-4.2 .

The 4.2 version is what we were attempting to use when the bug was encountered, and switching to 4.3 at this point would likely be more destabilizing than the original bug (as well as potentially being a porting exersize as has traditionally been the case between major compiler revisions).

Comment 23 Eric Botcazou 2008-01-16 12:26:43 UTC

> Not knowing the gcc code in question, I can't comment on how much potential
> regression risk this fix will introduce.  Assuming it's safe enough to apply
> without worry to other releases, I'd hope this would also be applied also to
> gcc-4.2 .

OK, I can backport it to the 4.2 branch if no maintainers disagree.

Comment 24 Eric Botcazou 2008-01-17 13:23:09 UTC

Subject: Bug 31309

Author: ebotcazou
Date: Thu Jan 17 13:22:21 2008
New Revision: 131599

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=131599
Log:
	Backport from mainline:

	2008-01-11  Eric Botcazou  <ebotcazou@adacore.com>
	PR middle-end/31309
	* expr.c (copy_blkmode_from_reg): Use a mode suited to the size
	when copying to memory.


Added:
    branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/struct-ret-3.c
Modified:
    branches/gcc-4_2-branch/gcc/ChangeLog
    branches/gcc-4_2-branch/gcc/expr.c
    branches/gcc-4_2-branch/gcc/testsuite/ChangeLog

Comment 25 Eric Botcazou 2008-01-21 19:55:59 UTC

I don't plan to backport this to the 4.1 branch.

Comment 26 Peeter Joot 2008-01-21 20:00:37 UTC

(In reply to comment #25)
> I don't plan to backport this to the 4.1 branch.
> 

Thanks for both fixing this and backporting to 4.2!