Bug 58723 - [4.9 Regression] ICE in lto_output_edge, at lto-cgraph.c:300 for OpenMP's simd reduction
Summary: [4.9 Regression] ICE in lto_output_edge, at lto-cgraph.c:300 for OpenMP's sim...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.9.0
: P1 normal
Target Milestone: 4.9.0
Assignee: Not yet assigned to anyone
URL:
Keywords: ice-on-valid-code, lto, openmp
Depends on:
Blocks:
 
Reported: 2013-10-14 15:06 UTC by Tobias Burnus
Modified: 2013-11-27 15:34 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.5.0, 4.6.0, 4.7.0, 4.8.0
Known to fail: 4.9.0
Last reconfirmed: 2013-11-10 00:00:00


Attachments
Delta reduced C++ test case (496 bytes, text/plain)
2013-10-14 15:06 UTC, Tobias Burnus
Details
patch (731 bytes, patch)
2013-11-26 14:35 UTC, Richard Biener
Details | Diff
Small test case: test.tar.gz (650 bytes, application/octet-stream)
2013-11-26 22:34 UTC, Tobias Burnus
Details
updated patch (1.09 KB, patch)
2013-11-27 10:10 UTC, Richard Biener
Details | Diff
updated patch (1.37 KB, patch)
2013-11-27 10:20 UTC, Richard Biener
Details | Diff
updated patch (1.65 KB, patch)
2013-11-27 10:30 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tobias Burnus 2013-10-14 15:06:12 UTC
Created attachment 31000 [details]
Delta reduced C++ test case

That's with gcc version 4.9.0 20131014 (experimental) [trunk revision 203511] (GCC) on an x86-64-gnu-linux system.


$ g++ -std=c++11 -O1 -flto -fopenmp testcase20.ii

testcase20.ii:40:1: internal compiler error: in lto_output_edge, at lto-cgraph.c:300
 }  // namespace std
 ^
Please submit a full bug report,



It works when compiled without -fopenmp. The only OpenMP code is:

#pragma omp simd reduction(+:area)
    for (int yi = 0; yi < count; ++yi)
      ...

(where delta has removed the "area" assignment.)
Comment 1 Tobias Burnus 2013-10-14 17:52:02 UTC
The failure is for the assert:

295           /* Flags that should not appear on indirect calls.  */
296           gcc_assert (!(flags & (ECF_LOOPING_CONST_OR_PURE
297                                  | ECF_MAY_BE_ALLOCA
298                                  | ECF_SIBCALL
299                                  | ECF_LEAF
300                                  | ECF_NOVOPS)));

Here, (flags & ECF_LEAF) != 0. The edge->caller is:

(gdb) p dump_cgraph_node(stdout, edge->caller)

_ZStL17DoFinalizeBooleanSt8XPBoolOpiPKSt5XPBoxPSt6vectorIS0_SaIS0_EE/0 (std::DoFinalizeBoolean(std::XPBoolOp, int, std::XPBox const*, std::vector<std::XPBox, std::allocator<std::XPBox> >*)) @0x2aaaac1f1e40
Type: function definition analyzed
Visibility: prevailing_def_ironly
References:
Referring:
Availability: local
Function flags: body local
Called by: _ZNSt7XPClass4finiEPSt6vectorISt5XPBoxSaIS1_EE/1 (1.00 per call)
Calls:
Has 3 outgoing edges for indirect calls.


My suspicion is that this is due to:

DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
Comment 2 Volker Reichelt 2013-11-10 00:04:04 UTC
Confirmed.

Even shorter testcase (compile with "-O -flto -fopenmp"):

=======================================
void foo(int n)
{
  int i;
  #pragma omp simd reduction(+:n)
  for (i = 0; i < 99; ++i) ;
}
=======================================

The regression appeared between 4.9-20131005 and 4.9.0-20131012.
Comment 3 Richard Biener 2013-11-19 10:18:24 UTC
Looks like cgraph doesn't know about internal functions and treats them as
indirect calls ... I suppose we shouldn't record any call edges for internal
fns which all should expand to non-calls (and should all be 'leaf')?
Runs into interesting fallout though ...

The other variant is to forcefully build non-indirect edges, also with
interesting fallout.

Bah.

Honza?

Internal functions are like builtins just we don't have a decl (and thus
a cgraph node) for them.  They are just special instructions, not really
calls to "functions".
Comment 4 Marek Polacek 2013-11-19 10:48:29 UTC
Yeah, I hit exactly this as well, with DEF_INTERNAL_FN (UBSAN_NULL, ECF_LEAF | ECF_NOTHROW).  The UBSAN_NULL is just like a macro, it expands to a GIMPLE_COND later on.  It has to be leaf - otherwise it wouldn't be possible to have it in the middle of a BB.  Temporarily, I disabled the tests with -flto.
Comment 5 Richard Biener 2013-11-26 14:35:28 UTC
Created attachment 31299 [details]
patch

Patch fixing the testcase (but otherwise untested - we don't have too many
internal fn testcases).
Comment 6 Tobias Burnus 2013-11-26 20:31:30 UTC
(In reply to Richard Biener from comment #5)
> Created attachment 31299 [details]
> Patch fixing the testcase (but otherwise untested - we don't have too many
> internal fn testcases).

Doesn't work for me. With the test case of comment 0 (attachment 31000 [details]) I get with
  g++ -std=c++11 -fopenmp -flto -O1 test.ii
the - new - ICE

cc1plus: internal compiler error: in expand_call_inline, at tree-inline.c:4076
0xc046a9 expand_call_inline
        ../../gcc/tree-inline.c:4076
0xc046a9 gimple_expand_calls_inline
        ../../gcc/tree-inline.c:4430
0xc046a9 optimize_inline_calls(tree_node*)
        ../../gcc/tree-inline.c:4569
0x109bbdb inline_transform(cgraph_node*)
        ../../gcc/ipa-inline-transform.c:435
Comment 7 Tobias Burnus 2013-11-26 22:06:56 UTC
(In reply to Tobias Burnus from comment #6)
> (In reply to Richard Biener from comment #5)
> > Created attachment 31299 [details]
> > Patch fixing the testcase (but otherwise untested - we don't have too many
> > internal fn testcases).
> Doesn't work for me.

If one dumps the bb, one finds a call to GOMP_SIMD_VF.

Using the following patch on top of yours fixes the issue for the reduced test case and for the compilation to the *.o files in the real-world code:

--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -4066,5 +4070,5 @@ expand_call_inline (basic_block bb, gimple stmt, copy_body_data *id)

   /* From here on, we're only interested in CALL_EXPRs.  */
-  if (gimple_code (stmt) != GIMPLE_CALL)
+  if (gimple_code (stmt) != GIMPLE_CALL || gimple_call_internal_p (stmt))
     goto egress;

 * * *

Unfortunately, the lto1 phase still fails for the big code. For that one, I get:

xpboolean.cc:3744:8: internal compiler error: in expand_GOMP_SIMD_VF, at internal-fn.c:137
0x7b48b7 expand_GOMP_SIMD_VF
        ../../gcc/internal-fn.c:137
0x5fe497 expand_call_stmt
        ../../gcc/cfgexpand.c:2164
0x5fe497 expand_gimple_stmt_1
        ../../gcc/cfgexpand.c:3143
0x5fe497 expand_gimple_stmt
        ../../gcc/cfgexpand.c:3295
Comment 8 Tobias Burnus 2013-11-26 22:34:58 UTC
Created attachment 31302 [details]
Small test case: test.tar.gz

I managed to create a test case.

The attached test case is essentially attachment 31000 [details], except of changing it to a .cc file and adding #include<vector> - and writing a simple main() function for it.

$  g++ -std=c++11 -flto -fopenmp-simd -Ofast test.cc test-main.cc
test.cc: In member function 'fini':
test.cc:31:6: internal compiler error: in expand_GOMP_SIMD_LANE, at internal-fn.c:129
 void XPClass::fini(std::vector<XPBox> *__restrict out) noexcept {
      ^
0x7b48d7 expand_GOMP_SIMD_LANE
        ../../gcc/internal-fn.c:129
Comment 9 Richard Biener 2013-11-27 10:06:02 UTC
We fail to stream cfun->has_force_vect_loops.
Comment 10 Richard Biener 2013-11-27 10:10:53 UTC
Created attachment 31305 [details]
updated patch

lto1 also does not get -fopenmp[-simd] passed, not sure if that is needed though.

The simple testcase from comment #2 already fails with

> ./xgcc -B. -B ../x86_64-unknown-linux-gnu/libgomp/ -B ../x86_64-unknown-linux-gnu/libgomp/.libs -O -flto -fopenmp t.i -r -nostdlib
t.i: In function 'foo':
t.i:1:6: internal compiler error: in expand_GOMP_SIMD_VF, at internal-fn.c:137
 void foo(int n)
      ^
0x8c4d19 expand_GOMP_SIMD_VF


updated patch attached that fixes the streaming and thus runs the vectorizer
at -O1 during LTRANs (but it does nothing).
Comment 11 Richard Biener 2013-11-27 10:20:15 UTC
Created attachment 31306 [details]
updated patch

And there is also has_simduid_loops and loop->safelen and loop->force_vect.
Also loop->simduid but 1) I don't know how to stream it, 2) it seems to work
without.  But I see the vectorizer looks at DECL_UID (loop->simduid)), so
"work without" may mean it just doesn't vectorize the thing but cleans up.
Comment 12 Richard Biener 2013-11-27 10:30:54 UTC
Created attachment 31307 [details]
updated patch

Indeed.

t.i:1:6: note: === vect_analyze_data_refs ===
t.i:1:6: note: not vectorized: loop contains function calls or data references that cannot be analyzed
t.i:1:6: note: bad data references.
t.i:1:6: note: vectorized 0 loops in function.

Ok, with simply streaming simduid as tree it "works".  Not sure if we want to
stream trees into the CFG section though ... (or if it even really works).

What's the reason that simduid is a tree?
Comment 13 Richard Biener 2013-11-27 10:37:51 UTC
Eh.

make check-gcc RUNTESTFLAGS="--target_board=unix/-flto/-ffat-lto-objects gomp.exp"

FAIL: gcc.dg/gomp/declare-simd-1.c (internal compiler error)
FAIL: gcc.dg/gomp/declare-simd-1.c (test for excess errors)

But otherwise clean.

/space/rguenther/src/svn/trunk/gcc/testsuite/gcc.dg/gomp/declare-simd-1.c:100:1: internal compiler error: tree code 'omp_clause' is not supported in LTO streams^M
0xa6dbe4 lto_write_tree^M
        /space/rguenther/src/svn/trunk/gcc/lto-streamer-out.c:376^M

seems like FUNCTION_DECLs now refer to OMP_CLAUSE?  Ah... via attributes :/

Jakub - please add tree streaming support for the relevant required stuff.
Comment 14 Jakub Jelinek 2013-11-27 10:40:31 UTC
(In reply to Richard Biener from comment #9)
> We fail to stream cfun->has_force_vect_loops.

Note we don't necessarily have to stream that (and has_simduid_loops), the alternative is to compute it during streaming in the IL - if it sees any force_vect loop, it can set fun->has_force_vect_loops for the loop, similarly for non-zero simdlen and fun->has_simduid_loops.  Those two flags are just (pessimistically correct) flags to avoid scanning the whole IL for it, but LTO scans the whole IL anyway...
Comment 15 Jakub Jelinek 2013-11-27 10:41:31 UTC
(In reply to Richard Biener from comment #13)
> Eh.
> 
> make check-gcc RUNTESTFLAGS="--target_board=unix/-flto/-ffat-lto-objects
> gomp.exp"
> 
> FAIL: gcc.dg/gomp/declare-simd-1.c (internal compiler error)
> FAIL: gcc.dg/gomp/declare-simd-1.c (test for excess errors)
> 
> But otherwise clean.
> 
> /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.dg/gomp/declare-simd-1.c:
> 100:1: internal compiler error: tree code 'omp_clause' is not supported in
> LTO streams^M
> 0xa6dbe4 lto_write_tree^M
>         /space/rguenther/src/svn/trunk/gcc/lto-streamer-out.c:376^M
> 
> seems like FUNCTION_DECLs now refer to OMP_CLAUSE?  Ah... via attributes :/
> 
> Jakub - please add tree streaming support for the relevant required stuff.

Yeah, that is the current reason why the new elementals tests to be committed also fail with LTO.  I'll hack on that.
Comment 16 Richard Biener 2013-11-27 15:18:25 UTC
Author: rguenth
Date: Wed Nov 27 15:18:23 2013
New Revision: 205447

URL: http://gcc.gnu.org/viewcvs?rev=205447&root=gcc&view=rev
Log:
2013-11-27  Richard Biener  <rguenther@suse.de>

	PR middle-end/58723
	* cgraphbuild.c (build_cgraph_edges): Do not build edges
	for internal calls.
	(rebuild_cgraph_edges): Likewise.
	* ipa-inline-analysis.c (estimate_function_body_sizes):
	Skip internal calls.
	* tree-inline.c (estimate_num_insns): Estimate size of internal
	calls as 0.
	(gimple_expand_calls_inline): Do not try inline-expanding
	internal calls.
	* lto-streamer-in.c (input_cfg): Stream loop safelen,
	force_vect and simduid.
	(input_struct_function_base): Stream has_force_vect_loops
	and has_simduid_loops.
	(input_function): Adjust.
	* lto-streamer-out.c (output_cfg): Stream loop safelen,
	force_vect and simduid.
	(output_struct_function_base): Stream has_force_vect_loops
	and has_simduid_loops.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cgraphbuild.c
    trunk/gcc/ipa-inline-analysis.c
    trunk/gcc/lto-streamer-in.c
    trunk/gcc/lto-streamer-out.c
    trunk/gcc/tree-inline.c
Comment 17 Richard Biener 2013-11-27 15:34:53 UTC
Fixed.  The OMP_CLAUSE issue remains but is really a separate thing.