Bug 53220

Summary: [4.7/4.8 Regression] g++ mis-compiles compound literals
Product: gcc Reporter: Paul Pluzhnikov <ppluzhnikov>
Component: c++Assignee: Jason Merrill <jason>
Status: RESOLVED FIXED    
Severity: normal CC: jason, pawel_sikora, xinliangli
Priority: P3 Keywords: wrong-code
Version: 4.7.0   
Target Milestone: 4.7.1   
Host: Target:
Build: Known to work: 4.6.3
Known to fail: 4.7.0, 4.8.0 Last reconfirmed: 2012-05-03 00:00:00
Bug Depends on:    
Bug Blocks: 94264    

Description Paul Pluzhnikov 2012-05-03 19:50:28 UTC
This appears to be a gcc-4.7 regression. Confirmed in:
g++ (GCC) 4.8.0 20120331 (experimental)
g++ (GCC) 4.8.0 20120503 (experimental)


#include <stdio.h>

int main()
{
  for (int *p = (int[]){ 1, 2, 3, 0}; *p; ++p) {
    printf("%d\n", *p);
  }
  return 0;
}

gcc -std=c99 t2.c && ./a.out
1
2
3

gcc -std=c99 -O2 t2.c && ./a.out
1
2
3

g++ t2.c && ./a.out
1
2
3

g++ -O2 -g t2.c && ./a.out
944127552
32767

AFAICT, g++ completely removes the initializer and reads random garbage off stack:

(gdb) disas main
Dump of assembler code for function main():
   0x0000000000400600 <+0>:     push   %rbx
   0x0000000000400601 <+1>:     sub    $0x10,%rsp
   0x0000000000400605 <+5>:     mov    (%rsp),%esi
   0x0000000000400608 <+8>:     mov    %rsp,%rbx
   0x000000000040060b <+11>:    test   %esi,%esi
   0x000000000040060d <+13>:    je     0x400626 <main()+38>
   0x000000000040060f <+15>:    nop
   0x0000000000400610 <+16>:    xor    %eax,%eax
   0x0000000000400612 <+18>:    add    $0x4,%rbx
   0x0000000000400616 <+22>:    mov    $0x40071c,%edi
   0x000000000040061b <+27>:    callq  0x400478 <printf@plt>
   0x0000000000400620 <+32>:    mov    (%rbx),%esi
   0x0000000000400622 <+34>:    test   %esi,%esi
   0x0000000000400624 <+36>:    jne    0x400610 <main()+16>
   0x0000000000400626 <+38>:    add    $0x10,%rsp
   0x000000000040062a <+42>:    xor    %eax,%eax
   0x000000000040062c <+44>:    pop    %rbx
   0x000000000040062d <+45>:    retq   
End of assembler dump.

valgrind ./a.out
...
==13572== Conditional jump or move depends on uninitialised value(s)
==13572==    at 0x40060D: main (/tmp/t2.c:5)
==13572== 
==13572== Use of uninitialised value of size 8
==13572==    at 0x5625E4B: _itoa_word (/build/buildd/eglibc-2.11.1/stdio-common/_itoa.c:195)
==13572==    by 0x5628A87: vfprintf (/build/buildd/eglibc-2.11.1/stdio-common/vfprintf.c:1616)
==13572==    by 0x5631659: printf (/build/buildd/eglibc-2.11.1/stdio-common/printf.c:35)
==13572==    by 0x40061F: main (/tmp/t2.c:6)
==13572== 
==13572== Conditional jump or move depends on uninitialised value(s)
==13572==    at 0x5625E55: _itoa_word (/build/buildd/eglibc-2.11.1/stdio-common/_itoa.c:195)
==13572==    by 0x5628A87: vfprintf (/build/buildd/eglibc-2.11.1/stdio-common/vfprintf.c:1616)
==13572==    by 0x5631659: printf (/build/buildd/eglibc-2.11.1/stdio-common/printf.c:35)
==13572==    by 0x40061F: main (/tmp/t2.c:6)
==13572== 
==13572== Conditional jump or move depends on uninitialised value(s)
==13572==    at 0x5627ED2: vfprintf (/build/buildd/eglibc-2.11.1/stdio-common/vfprintf.c:1616)
==13572==    by 0x5631659: printf (/build/buildd/eglibc-2.11.1/stdio-common/printf.c:35)
==13572==    by 0x40061F: main (/tmp/t2.c:6)
==13572== 
==13572== Conditional jump or move depends on uninitialised value(s)
==13572==    at 0x5627EF0: vfprintf (/build/buildd/eglibc-2.11.1/stdio-common/vfprintf.c:1616)
==13572==    by 0x5631659: printf (/build/buildd/eglibc-2.11.1/stdio-common/printf.c:35)
==13572==    by 0x40061F: main (/tmp/t2.c:6)
==13572== 
-16780368
==13572== Conditional jump or move depends on uninitialised value(s)
==13572==    at 0x400624: main (/tmp/t2.c:5)
==13572== 
127
...

Google ref: b/6439133
Comment 1 Paul Pluzhnikov 2012-05-03 19:53:07 UTC
gcc-compiled code for reference:

(gdb) disas main
Dump of assembler code for function main:
   0x0000000000400540 <+0>:     push   %rbp
   0x0000000000400541 <+1>:     mov    $0x1,%esi
   0x0000000000400546 <+6>:     mov    $0x2,%ebp
   0x000000000040054b <+11>:    push   %rbx
   0x000000000040054c <+12>:    sub    $0x18,%rsp
   0x0000000000400550 <+16>:    movl   $0x1,(%rsp)           <<===
   0x0000000000400557 <+23>:    movl   $0x2,0x4(%rsp)        <<===
   0x000000000040055f <+31>:    mov    %rsp,%rbx
   0x0000000000400562 <+34>:    movl   $0x3,0x8(%rsp)        <<===
   0x000000000040056a <+42>:    movl   $0x0,0xc(%rsp)        <<===
   0x0000000000400572 <+50>:    jmp    0x40057d <main+61>
   0x0000000000400574 <+52>:    nopl   0x0(%rax)
   0x0000000000400578 <+56>:    mov    %ebp,%esi
   0x000000000040057a <+58>:    mov    0x4(%rbx),%ebp
   0x000000000040057d <+61>:    xor    %eax,%eax
   0x000000000040057f <+63>:    mov    $0x40068c,%edi
   0x0000000000400584 <+68>:    add    $0x4,%rbx
   0x0000000000400588 <+72>:    callq  0x4003b8 <printf@plt>
   0x000000000040058d <+77>:    test   %ebp,%ebp
   0x000000000040058f <+79>:    jne    0x400578 <main+56>
   0x0000000000400591 <+81>:    add    $0x18,%rsp
   0x0000000000400595 <+85>:    xor    %eax,%eax
   0x0000000000400597 <+87>:    pop    %rbx
   0x0000000000400598 <+88>:    pop    %rbp
   0x0000000000400599 <+89>:    retq   
End of assembler dump.
Comment 2 H.J. Lu 2012-05-03 22:38:26 UTC
It is caused by revision 181332:

http://gcc.gnu.org/ml/gcc-cvs/2011-11/msg00623.html
Comment 3 davidxl 2012-05-04 18:52:21 UTC
What is the right scope for the compound literals? Is the clobber correctly added or the program is ill formed?

David
Comment 4 Jason Merrill 2012-05-07 14:29:52 UTC
Compound literals are not part of C++, so correctness is a matter of debate.  In C, a compound literal designates an object with automatic storage duration.  In G++, a compound literal designates a temporary object, just like a normal cast or function-like cast.

This is a significant difference in semantics, which leads to the problem encountered here; the temporary object goes out of scope immediately after the initialization of p, so the loop has undefined behavior.

It would be possible for G++ to model the C semantics more closely.
Comment 5 davidxl 2012-05-07 16:18:13 UTC
So it is possible either 
1) to keep the current G++ semantics of compound literals, but change its behavior due to the implementation change (with clobber marker); 
or
2) to change hte G++ semantics to match C semantic, but keep the compiler behavior the same

Which way to go? If we go for 1), we probably just need to document this behavior better in GCC, and let user change their code.

David
Comment 6 Paul Pluzhnikov 2012-05-07 16:28:56 UTC
(In reply to comment #5)

> 1) to keep the current G++ semantics of compound literals, but change its
> behavior due to the implementation change (with clobber marker); 

I would argue that 1 is completely useless for "you can also construct an
array" use case from http://gcc.gnu.org/onlinedocs/gcc/Compound-Literals.html

It always initializes the pointer with dangling storage, and is always a bug.

If "keep the current g++ semantics", then the code should be rejected at
compile time, and should *not* work when built without optimization.

IMO, having this code working in C and not working in C++ is a lousy choice.
Comment 7 davidxl 2012-05-07 17:03:51 UTC
Yes, the array case should be warned or disallowed if 1 is the way to go.

I won't call it a lousy choice -- the C++ semantics of the compound literals allow more agressive optimization and smaller stack usage.

David



(In reply to comment #6)
> (In reply to comment #5)
> 
> > 1) to keep the current G++ semantics of compound literals, but change its
> > behavior due to the implementation change (with clobber marker); 
> 
> I would argue that 1 is completely useless for "you can also construct an
> array" use case from http://gcc.gnu.org/onlinedocs/gcc/Compound-Literals.html
> 
> It always initializes the pointer with dangling storage, and is always a bug.
> 
> If "keep the current g++ semantics", then the code should be rejected at
> compile time, and should *not* work when built without optimization.
> 
> IMO, having this code working in C and not working in C++ is a lousy choice.
Comment 8 Jason Merrill 2012-05-07 17:44:52 UTC
The thing is, C++11 introduces list-initialized temporaries; I could rewrite the testcase in C++11 as

extern "C" int printf (const char *, ...);

int main()
{
  typedef int AR[4];
  for (int *p = AR{1,2,3,0}; *p; ++p)
    {
      printf ("%d\n", *p);
    }
  return 0;
}

so it made sense to me for compound literals to have the same semantics; otherwise you have a difference in lifetime based on whether or not the type is wrapped in parentheses.

I definitely agree that we need to give a diagnostic about taking the address of a temporary here.
Comment 9 davidxl 2012-05-08 00:16:30 UTC
c++11 defines the lifetime of a temporary -- does it match C or g++'s semantics of compound literals or neither?

Note that without your change, the original program may also be subject to runtime failures due to escaped life ranges of the scoped variables leading to bad stack layout -- though such bugs are more subtle and less likely to be triggered.

David


(In reply to comment #8)
> The thing is, C++11 introduces list-initialized temporaries; I could rewrite
> the testcase in C++11 as
> 
> extern "C" int printf (const char *, ...);
> 
> int main()
> {
>   typedef int AR[4];
>   for (int *p = AR{1,2,3,0}; *p; ++p)
>     {
>       printf ("%d\n", *p);
>     }
>   return 0;
> }
> 
> so it made sense to me for compound literals to have the same semantics;
> otherwise you have a difference in lifetime based on whether or not the type is
> wrapped in parentheses.
> 
> I definitely agree that we need to give a diagnostic about taking the address
> of a temporary here.
Comment 10 Jason Merrill 2012-05-08 02:13:35 UTC
(In reply to comment #9)
> c++11 defines the lifetime of a temporary -- does it match C or g++'s semantics
> of compound literals or neither?

C++98 and C++11 define the lifetime of a temporary as lasting until the end of the full-expression, unless its lifetime is extended by binding it to a reference.  G++ treats compound literals as temporaries like any other.
Comment 11 Paul Pluzhnikov 2012-05-17 00:02:33 UTC
(In reply to comment #10)

> C++98 and C++11 define the lifetime of a temporary as lasting until the end of
> the full-expression, unless its lifetime is extended by binding it to a
> reference.  G++ treats compound literals as temporaries like any other.

So, as far as I understand:

1. the example with literal arrays in the documentation is invalid in C++,
2. I should fix C++ code that uses them
3. all that remains in this bug is to issue diagnostics for them
Comment 12 Jason Merrill 2012-05-22 17:41:45 UTC
I checked the behavior of clang and icc with the following testcase:

extern "C" int printf (const char *,...);

struct A
{
  int i;
  ~A() { printf ("~A()\n"); }
};
   
int main()
{
  for (A *p = (A[]){ 1, 2, 3, 0}; p->i; ++p) {
    printf("%d\n", p->i);
  }
  return 0;
}

Both compilers match G++ behavior and destroy all the elements of the array before the loop begins; the only difference is that recent G++ is more aggressive about optimizing based on the knowledge that the array is no longer alive.

So yes, what remains for this bug is to complain about undefined behavior due to accessing the value of an object after its lifetime has ended.
Comment 13 Paul Pluzhnikov 2012-05-22 17:49:24 UTC
(In reply to comment #12)

> So yes, what remains for this bug is to complain about undefined behavior due
> to accessing the value of an object after its lifetime has ended.

The documentation here:
  http://gcc.gnu.org/onlinedocs/gcc/Compound-Literals.html
sould probably be updated to mention that the "You can also construct an
array" part only works in C, and leads to undefined behavior in C++.
Comment 14 Paul Pluzhnikov 2012-05-22 17:50:07 UTC
(In reply to comment #12)

> So yes, what remains for this bug is to complain about undefined behavior due
> to accessing the value of an object after its lifetime has ended.

The documentation here:
  http://gcc.gnu.org/onlinedocs/gcc/Compound-Literals.html
should probably be updated to mention that the "You can also construct an
array" part only works in C, and leads to undefined behavior in C++.
Comment 15 Jason Merrill 2012-05-26 21:13:27 UTC
Author: jason
Date: Sat May 26 21:13:23 2012
New Revision: 187916

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=187916
Log:
	PR c++/53220
gcc/
	* c-typeck.c (array_to_pointer_conversion): Give -Wc++-compat warning
	about array compound literals.
gcc/cp/
	* call.c (convert_like_real) [ck_list]: Take array address directly.
	* typeck.c (decay_conversion): Reject decay of an array compound
	literal.

Added:
    trunk/gcc/testsuite/c-c++-common/array-lit.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/c-typeck.c
    trunk/gcc/cp/ChangeLog
    trunk/gcc/cp/call.c
    trunk/gcc/cp/typeck.c
    trunk/gcc/doc/extend.texi
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/g++.dg/ext/complit12.C
Comment 16 Jason Merrill 2012-05-30 14:51:58 UTC
Author: jason
Date: Wed May 30 14:51:54 2012
New Revision: 188020

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188020
Log:
	PR c++/53220
gcc/
	* c-typeck.c (array_to_pointer_conversion): Give -Wc++-compat warning
	about array compound literals.
gcc/cp/
	* call.c (convert_like_real) [ck_list]: Take array address directly.
	* typeck.c (decay_conversion): Reject decay of an array compound
	literal.

Added:
    branches/gcc-4_7-branch/gcc/testsuite/c-c++-common/array-lit.c
Modified:
    branches/gcc-4_7-branch/gcc/ChangeLog
    branches/gcc-4_7-branch/gcc/c-typeck.c
    branches/gcc-4_7-branch/gcc/cp/ChangeLog
    branches/gcc-4_7-branch/gcc/cp/call.c
    branches/gcc-4_7-branch/gcc/cp/typeck.c
    branches/gcc-4_7-branch/gcc/doc/extend.texi
    branches/gcc-4_7-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_7-branch/gcc/testsuite/g++.dg/ext/complit12.C
Comment 17 Jason Merrill 2012-06-03 04:49:45 UTC
Fixed by making the affected code ill-formed.
Comment 18 Paul Pluzhnikov 2012-06-20 01:59:01 UTC
FWIW, it appears that the new error is too strict. It rejects this source, which (AFAIU) is entirely kosher:

#include <stdio.h>

void fn(int arr[])
{
  for (int j = 0; j < 5; ++j)
    printf("%d: %d\n", j, arr[j]);
}

int main()
{
  fn((int[]) { 41, 42, 43, 44, 45 } );
  return 0;
}

g++ -c t.cc
t.cc: In function ‘int main()’:
t.cc:11:37: error: taking address of temporary array
   fn((int[]) { 41, 42, 43, 44, 45 } );
                                     ^

Still, I prefer to rewrite the case above, then to debug undiagnosed runtime problems from the original test case.
Comment 19 Jason Merrill 2012-06-20 07:19:33 UTC
I agree, but once the array decays to a pointer it's hard to tell whether it's used in a safe or unsafe way.
Comment 20 Karen 2013-05-01 09:54:43 UTC
(In reply to comment #18)

> #include <stdio.h>
> 
> void fn(int arr[])
> {
>   for (int j = 0; j < 5; ++j)
>     printf("%d: %d\n", j, arr[j]);
> }
> 
> int main()
> {
>   fn((int[]) { 41, 42, 43, 44, 45 } );
>   return 0;
> }
> 
> g++ -c t.cc
> t.cc: In function ‘int main()’:
> t.cc:11:37: error: taking address of temporary array
>    fn((int[]) { 41, 42, 43, 44, 45 } );


I wrote some code which compiled perfectly using gcc 4.7. However, now I am using gcc 4.8 and I am encountering exactly the problem mentioned by Paul. Will this be solved or should I look for another solution to overcome this problem? Any suggestions?
Comment 21 Jason Merrill 2013-05-01 19:13:19 UTC
(In reply to comment #20)
> I wrote some code which compiled perfectly using gcc 4.7. However, now I am
> using gcc 4.8 and I am encountering exactly the problem mentioned by Paul. Will
> this be solved or should I look for another solution to overcome this problem?
> Any suggestions?

You should probably adjust your code to avoid using array compound literals in C++.
Comment 22 Jonathan Wakely 2016-04-04 12:41:14 UTC
The added diagnostic rejects this, which clang and EDG accept:

extern "C" int printf(const char*, ...);
int main() { 
  using A = int[1];
  printf("%p\n", A{1} );
}

ts.c:4:18: error: taking address of temporary array
   printf("%p\n", A{1} );
                  ^~~~


This doesn't use compound literals, and seems like valid C++ according to [conv.array].
Comment 23 GCC Commits 2023-11-28 21:29:26 UTC
The trunk branch has been updated by Jason Merrill <jason@gcc.gnu.org>:

https://gcc.gnu.org/g:305a2686c99bf9b57490419f25c79f6fb3ae0feb

commit r14-5941-g305a2686c99bf9b57490419f25c79f6fb3ae0feb
Author: Jason Merrill <jason@redhat.com>
Date:   Tue Nov 28 13:54:47 2023 -0500

    c++: prvalue array decay [PR94264]
    
    My change for PR53220 made array to pointer decay for prvalue arrays
    ill-formed to catch well-defined C code that produces a dangling pointer in
    C++ due to the shorter lifetime of compound literals.  This wasn't really
    correct, but wasn't a problem until C++17 added prvalue arrays, at which
    point it started rejecting valid C++ code.
    
    I wanted to make sure that we still diagnose the problematic code;
    -Wdangling-pointer covers the array-lit.c case, but I needed to extend
    -Wreturn-local-addr to handle the return case.
    
            PR c++/94264
            PR c++/53220
    
    gcc/c/ChangeLog:
    
            * c-typeck.cc (array_to_pointer_conversion): Adjust -Wc++-compat
            diagnostic.
    
    gcc/cp/ChangeLog:
    
            * call.cc (convert_like_internal): Remove obsolete comment.
            * typeck.cc (decay_conversion): Allow array prvalue.
            (maybe_warn_about_returning_address_of_local): Check
            for returning pointer to temporary.
    
    gcc/testsuite/ChangeLog:
    
            * c-c++-common/array-lit.c: Adjust.
            * g++.dg/cpp1z/array-prvalue1.C: New test.
            * g++.dg/ext/complit17.C: New test.