Bug 42401 - wrong-code with -flto
Summary: wrong-code with -flto
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: lto (show other bugs)
Version: 4.5.0
: P3 major
Target Milestone: 4.5.0
Assignee: Not yet assigned to anyone
URL:
Keywords: lto, wrong-code
Depends on:
Blocks:
 
Reported: 2009-12-17 12:34 UTC by Wouter Vermaelen
Modified: 2009-12-19 11:41 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2009-12-17 13:40:54


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Wouter Vermaelen 2009-12-17 12:34:27 UTC
> cat main.cc
struct Foo { static void f(); };
int main() { Foo::f(); }


> cat Foo.cc
#include <string>
#include <map>

struct Foo { static void f(); };

void Foo::f()
{
        typedef std::map<int, std::string> Map;
        static Map m;

        Map::const_iterator it = m.find(0);
        if (it != m.end()) {
                std::string s = it->second;
        }
}


> g++ -flto -O3 -c Foo.cc
> g++ -flto -O3 -c main.cc
> g++ -flto -O3 main.o Foo.o
> ./a.out
Segmentation fault (core dumped)


This program works as expected when compiled without -flto. Though when using -flto it crashes. I'm using SVN revision trunk@155309 on linux x86_64.

I didn't have time yet to reduce this testcase further (expand an reduce the #includes). I'll try to do that next week.

If you change -O3 into -O0 in the compilation commands but not in the link command, this test triggers a ICE. But I'll file a different bug for this.
Comment 1 Richard Biener 2009-12-17 13:40:54 UTC
Confirmed.  It segfaults here:

13            std::string s = it->second;
(gdb) s

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7b7873b in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6

Dump of assembler code for function _ZNSsC2ERKSs:
0x00007ffff7b78730 <_ZNSsC2ERKSs+0>:    push   %rbx
0x00007ffff7b78731 <_ZNSsC2ERKSs+1>:    mov    %rdi,%rbx
0x00007ffff7b78734 <_ZNSsC2ERKSs+4>:    sub    $0x10,%rsp
0x00007ffff7b78738 <_ZNSsC2ERKSs+8>:    mov    (%rsi),%rax
0x00007ffff7b7873b <_ZNSsC2ERKSs+11>:   mov    -0x8(%rax),%edx

-fno-ipa-cp-clone fixes it.

IPA insert stage:

Cost of versioning find is 178, (size: 19, freq: 1002)
considering function find
  replacing param this with const &m._M_t
versioned function find with growth 18, overall 18
_ZNSt8_Rb_treeIiSt4pairIKiSsESt10_Select1stIS2_ESt4lessIiESaIS2_EE4findERS1_.clone.0 (struct _Rb_tree * const this, const int & __k)
{

}

Also fails with -O2 -fipa-cp-clone.  ICEs with -O1 -fipa-cp in ipa_write_node_info, at ipa-prop.c:2023 (PR42366).
Comment 2 Richard Biener 2009-12-17 14:26:33 UTC
It is DOM which threads over the check in bb 12:

<bb 11>:
  __y_90 = __y_108;
  D.2866_92 = __y_108->_M_value_field.first;
  if (D.2866_92 > 0)
    goto <bb 20>;
  else
    goto <bb 12>;

<bb 12>:
  # SR.42_94 = PHI <SR.42_88(11)>
  if (SR.42_94 != &m._M_t._M_impl._M_header)
    goto <bb 13>;
  else
    goto <bb 19>;

<bb 13>:
  # SR.42_78 = PHI <SR.42_94(12), SR.42_87(20)>
  D.2308_34 = (const struct _Rb_tree_node *) SR.42_78;
  D.2315_10 = &D.2308_34->_M_value_field.second;
  __comp_ctor  (&s, D.2315_10);

changing bb 11 to

<bb 11>:
  __y_90 = __y_108;
  D.2837_92 = __y_108->_M_value_field.first;
  if (D.2837_92 > 0)
    goto <bb 19>;
  else
    goto <bb 13>;

which looks correct but one wonders why it's only done with -fipa-cp-clone.

This change is what in the end triggers the bug, so -fno-tree-dominator-opts
fixes it as well.  The DOM thing is a missed-optimization in the -flto
case.

In the LTO case we thread

<bb 10>:
  # __y_108 = PHI <__y_95(9), __y_75(5)>
  SR.42_88 = (struct _Rb_tree_node_base *) __y_108;
  if (SR.42_88 == &m._M_t._M_impl._M_header)
    goto <bb 12>;
  else
    goto <bb 11>;

<bb 11>:
  __y_90 = __y_108;
  D.2837_92 = __y_90->_M_value_field.first;
  if (D.2837_92 > 0)
    goto <bb 12>;
  else
    goto <bb 13>;

<bb 12>:

<bb 13>:
  # SR.42_94 = PHI <SR.42_88(11), &m._M_t._M_impl._M_header(12)>
  if (SR.42_94 != &m._M_t._M_impl._M_header)
    goto <bb 14>;
  else
    goto <bb 20>;

<bb 14>:
  D.2308_34 = (const struct _Rb_tree_node *) SR.42_94;
  D.2315_10 = &D.2308_34->_M_value_field.second;
  __comp_ctor  (&s, D.2315_10);
...

<bb 20>:
  return;

}

to

<bb 10>:
  # __y_108 = PHI <__y_95(9), __y_75(5)>
  SR.42_88 = (struct _Rb_tree_node_base *) __y_108;
  if (SR.42_88 == &m._M_t._M_impl._M_header)
    goto <bb 20>;
  else
    goto <bb 11>;

<bb 11>:
  __y_90 = __y_108;
  D.2866_92 = __y_108->_M_value_field.first;
  if (D.2866_92 > 0)
    goto <bb 20>;
  else
    goto <bb 12>;

<bb 12>:
  # SR.42_94 = PHI <SR.42_88(11)>
  if (SR.42_94 != &m._M_t._M_impl._M_header)
    goto <bb 13>;
  else
    goto <bb 19>;

<bb 13>:
  # SR.42_78 = PHI <SR.42_94(12), SR.42_87(20)>
  D.2308_34 = (const struct _Rb_tree_node *) SR.42_78;
  D.2315_10 = &D.2308_34->_M_value_field.second;
  __comp_ctor  (&s, D.2315_10);

which is 1) incomplete as noted above and 2) wrong, as the target BB 20
no longer is BB 20 but BB 19 ...

...

<bb 19>:
  return;

<bb 20>:
  # SR.42_87 = PHI <&m._M_t._M_impl._M_header(11), &m._M_t._M_impl._M_header(10)> 
  goto <bb 13>;

}

thus somehow jump threading is messed up.

Good threading:

Jump threading proved probability of edge 13->20 too small (it is 3237, should be 10000).
  Threaded jump 11 --> 13 to 13
  Threaded jump 12 --> 13 to 22
Removing basic block 12
Merging blocks 13 and 14
Removing basic block 21

Bad threading:

 Threaded jump 12 --> 13 to 22
Removing basic block 12
Removing basic block 21

I can't see how this is not a latent problem with jump-threading independent
of LTO.
Comment 3 Richard Biener 2009-12-17 15:17:27 UTC
Btw, the single-file testcase

#include <string>
#include <map>

int main ()
{
  typedef std::map<int, std::string> Map;
  static Map m;

  Map::const_iterator it = m.find(0);
  if (it != m.end()) {
      std::string s = it->second;
  }
}

fails the same way (with -flto).
Comment 4 Richard Biener 2009-12-19 00:41:08 UTC
The following seems to fix it for some reason.

Index: lto-streamer-out.c
===================================================================
--- lto-streamer-out.c  (revision 155347)
+++ lto-streamer-out.c  (working copy)
@@ -638,7 +638,8 @@ tree_is_indexable (tree t)
 {
   if (TREE_CODE (t) == PARM_DECL)
     return false;
-  else if (TREE_CODE (t) == VAR_DECL && decl_function_context (t))
+  else if (TREE_CODE (t) == VAR_DECL && decl_function_context (t)
+          && !TREE_STATIC (t))
     return false;
   else
     return (TYPE_P (t) || DECL_P (t) || TREE_CODE (t) == SSA_NAME);
@@ -694,7 +695,8 @@ lto_output_tree_ref (struct output_block
 
     case VAR_DECL:
     case DEBUG_EXPR_DECL:
-      gcc_assert (decl_function_context (expr) == NULL);
+      gcc_assert (decl_function_context (expr) == NULL
+                 || TREE_STATIC (expr));
       output_record_start (ob, LTO_global_decl_ref);
       lto_output_var_decl_index (ob->decl_state, ob->main_stream, expr);
       break;
Comment 5 Michael Matz 2009-12-19 01:19:19 UTC
To explain: the miscompilation is a result of the cloner (when cloning for the
find() call) creating a new tree for a local static variable (the 'm' in
main).  After inlining we have:

if (  &mD.2293._M_tD.2062._M_implD.2039._M_headerD.2043
   != &mD.2303._M_tD.2062._M_implD.2039._M_headerD.2043)

note the different UIDs for 'm'.  fold will say true to the unequality,
as references to static vars are unequal if the base VAR_DECL is different.

The reason for all this is because the LTO reader/writer of jump_functions
create the new tree already.  The input for the LTO writer contains the same
tree node for the decl in main and the jump_function (val.constant),
but as it uses a different output_block (hence unshared cache) it can't find
the tree and emits a new copy.

For whatever reason richis patch makes this work, but I can't see why it
does ... ;-/
Comment 6 Michael Matz 2009-12-19 02:33:34 UTC
Ah, because the references to trees are not handled via the write_cache
hashtable, but via the per-kind stream (in output_block.decl_state), which
is not realloced on create_output_block, but only on
lto_push/pop_out_decl_state.  Unintuitive :)
Comment 7 Richard Biener 2009-12-19 11:41:25 UTC
Subject: Bug 42401

Author: rguenth
Date: Sat Dec 19 11:41:14 2009
New Revision: 155361

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155361
Log:
2009-12-19  Richard Guenther  <rguenther@suse.de>

	PR lto/42401
	* lto-streamer-out.c (tree_is_indexable): Local statics
	are indexable.
	(lto_output_tree_ref): Adjust assert.

	* g++.dg/lto/20091219_0.C: New testcase.

Added:
    trunk/gcc/testsuite/g++.dg/lto/20091219_0.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lto-streamer-out.c
    trunk/gcc/testsuite/ChangeLog

Comment 8 Richard Biener 2009-12-19 11:41:42 UTC
Fixed.