Bug 59659 - large zero-initialized std::array compile time excessive
Summary: large zero-initialized std::array compile time excessive
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 4.8.2
: P3 normal
Target Milestone: 4.9.0
Assignee: Jason Merrill
URL:
Keywords: compile-time-hog, memory-hog, missed-optimization
Depends on:
Blocks:
 
Reported: 2014-01-02 15:25 UTC by Mark Devaney
Modified: 2023-04-05 17:21 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2014-01-07 00:00:00


Attachments
Minimal test case. (140 bytes, text/plain)
2014-01-02 16:04 UTC, Casey Carter
Details
patch for the array initialization (1.04 KB, patch)
2014-01-09 16:25 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Devaney 2014-01-02 15:25:35 UTC
Compiling the below with any of the zero-initialization forms of std::atomic takes at least several hours (3GhZ Intel).   The default-initialized and non-atomic forms compile immediately.

g++ --std=c++11 -O0 : no other compile flags

reproducible under 4.8.2 and 4.7.2


#include <iostream>
#include <array>
#include <algorithm>
#include <atomic>

int main(int argc, char* argv[]) {

    // std::array<std::atomic<int>, 1000000>  arr;         // default initialization (i.e., random data) = FAST

    std::array<std::atomic<int>, 1000000>  arr{{}};   // zero initialization = FOREVER

    //std::array<std::atomic<int>, 1000000>  arr{};   // zero initialization = FOREVER

    //std::array<std::atomic<int>, 1000000>  arr={{}};     // zero init via assignment = FOREVER

    //std::array<int, 1000000>  arr={{}};     // zero init non-atomic = FAST

    std::cerr << "sum = " << std::accumulate(arr.begin(), arr.end(), 0) << std::endl;
}
Comment 1 Casey Carter 2014-01-02 16:04:12 UTC
Created attachment 31563 [details]
Minimal test case.

Attached minimal test case. std::atomic<int> arr[1000000]; compiles correctly, so it seems to be an interaction between std::atomic<int> and std::array.
Comment 2 Richard Biener 2014-01-07 11:47:57 UTC
The frontend generates

  struct array arr;

    struct array arr;
  <<cleanup_point <<< Unknown tree: expr_stmt
  <<cleanup_point <<< Unknown tree: expr_stmt
  arr._M_elems[0] = TARGET_EXPR <D.25804, {.D.23130={._M_i=0}}> >>>>>;
  <<cleanup_point <<< Unknown tree: expr_stmt
  arr._M_elems[1] = TARGET_EXPR <D.25805, {.D.23130={._M_i=0}}> >>>>>;
  <<cleanup_point <<< Unknown tree: expr_stmt
  arr._M_elems[2] = TARGET_EXPR <D.25806, {.D.23130={._M_i=0}}> >>>>>;
  <<cleanup_point <<< Unknown tree: expr_stmt
  arr._M_elems[3] = TARGET_EXPR <D.25807, {.D.23130={._M_i=0}}> >>>>>;
...

instead of generating a loop.  Which may be the cause of std::atomic
having some subtle property (thus reproducible with a random class with
that very same property eventually).

Same issue with the following, so it's std::array's fault (or rather initializer
list support).

#include <array>

class Foo { public: Foo() {} int i; };

int main() {
    std::array<Foo, 1000000> arr = {{}}; // Halting problem.
}
Comment 3 Jakub Jelinek 2014-01-09 16:18:13 UTC
I think this is somewhat related to e.g.:
struct A { A (); A (int); ~A (); };
void bar (A *);

#define T10(N) N, N, N, N, N, N, N, N, N, N
#define T100(N) T10(N), T10(N), T10(N), T10(N), T10(N), \
T10(N), T10(N), T10(N), T10(N), T10(N)
#define T1000(N) T100(N), T100(N), T100(N), T100(N), T100(N), \
 T100(N), T100(N), T100(N), T100(N), T100(N)
#define T10000(N) T1000(N), T1000(N), T1000(N), T1000(N), T1000(N), \
  T1000(N), T1000(N), T1000(N), T1000(N), T1000(N)
#define T100000(N) T10000(N), T10000(N), T10000(N), T10000(N), T10000(N), \
   T10000(N), T10000(N), T10000(N), T10000(N), T10000(N)

void
foo ()
{
  A a[] = { 1, 2, T1000 (3), T10000 (4), T1000 (3), 2, 1 };
  bar (a);
}

also taking long time to compile and generating enormous amount of code (and when replacing T10000 (4) with T100000 (4) it is even much worse).
Comment 4 Jakub Jelinek 2014-01-09 16:25:16 UTC
Created attachment 31785 [details]
patch for the array initialization

Untested patch for the array initialization - if there are several (patch uses 10, could be say a param) consecutive same elements in the ctor, we can just use a loop instead, and let it to the compiler to decide if it will want to unroll it or not.  I guess the initializer_list would need similar treatment.
Comment 5 Markus Trippelsdorf 2014-01-09 16:58:59 UTC
(In reply to Jakub Jelinek from comment #3)
> I think this is somewhat related to e.g.:
> struct A { A (); A (int); ~A (); };
> void bar (A *);
> 
> #define T10(N) N, N, N, N, N, N, N, N, N, N
> #define T100(N) T10(N), T10(N), T10(N), T10(N), T10(N), \
> T10(N), T10(N), T10(N), T10(N), T10(N)
> #define T1000(N) T100(N), T100(N), T100(N), T100(N), T100(N), \
>  T100(N), T100(N), T100(N), T100(N), T100(N)
> #define T10000(N) T1000(N), T1000(N), T1000(N), T1000(N), T1000(N), \
>   T1000(N), T1000(N), T1000(N), T1000(N), T1000(N)
> #define T100000(N) T10000(N), T10000(N), T10000(N), T10000(N), T10000(N), \
>    T10000(N), T10000(N), T10000(N), T10000(N), T10000(N)
> 
> void
> foo ()
> {
>   A a[] = { 1, 2, T1000 (3), T10000 (4), T1000 (3), 2, 1 };
>   bar (a);
> }
> 
> also taking long time to compile and generating enormous amount of code (and
> when replacing T10000 (4) with T100000 (4) it is even much worse).

I don't think they are directly related.
For your testcase clang is even slower than gcc (1:51min vs. 21.8sec).
perf report shows:
  6.92%  cc1plus  cc1plus   [.] get_value_for_expr(tree_node*, bool)
  4.98%  cc1plus  cc1plus   [.] canonicalize_value(prop_value_d*)
  3.56%  cc1plus  cc1plus   [.] mark_used_flags(rtx_def*, int)
  3.01%  cc1plus  cc1plus   [.] ccp_visit_phi_node(gimple_statement_base*)
  2.97%  cc1plus  cc1plus   [.] redirect_eh_edge_1(edge_def*, basic_block_def*, bool)
  2.87%  cc1plus  cc1plus   [.] remove_eh_landing_pad(eh_landing_pad_d*)
  2.86%  cc1plus  cc1plus   [.] vrp_meet(value_range_d*, value_range_d*)


On the testcase from Mark (with array size 10000) clang is much 
faster (0.7sec vs. 15.7sec) and perf report shows that the multiplication in
get_ref_base_and_extent is mostly responsible:
  9.95%  cc1plus  cc1plus  [.] get_ref_base_and_extent(tree_node*, long*, long*, long*)
  7.94%  cc1plus  cc1plus  [.] mul_double_wide_with_sign(unsigned long, long, unsigned long, long, unsigned long*, long*, unsigned long*, long*, bool)
  7.40%  cc1plus  cc1plus  [.] hash_table<vn_reference_hasher, xcallocator>::find_slot_with_hash(vn_reference_s const*, unsigned int, insert_option)
  4.22%  cc1plus  cc1plus  [.] component_ref_field_offset(tree_node*)
  4.18%  cc1plus  cc1plus  [.] record_store(rtx_def*, bb_info*)
  3.81%  cc1plus  cc1plus  [.] hash_table_mod2(unsigned int, unsigned int)
  3.59%  cc1plus  cc1plus  [.] array_ref_low_bound(tree_node*)
Comment 6 Jakub Jelinek 2014-01-09 17:03:37 UTC
For the #c0 testcase, I think reduced testcase is:
struct S { S (); S (int); ~S (); int i; };
struct A { S s[1000000]; };

void
foo ()
{
  A a = {{}};
}
and in that case we don't go through build_vec_init, but check_init_r/process_init_constructor etc.
Comment 7 Markus Trippelsdorf 2014-01-09 17:24:30 UTC
(In reply to Jakub Jelinek from comment #6)
> For the #c0 testcase, I think reduced testcase is:
> struct S { S (); S (int); ~S (); int i; };
> struct A { S s[1000000]; };
> 
> void
> foo ()
> {
>   A a = {{}};
> }
> and in that case we don't go through build_vec_init, but
> check_init_r/process_init_constructor etc.

Using (one hundred thousand) "struct A { S s[100000]; };" crashes gcc-4.7, 4.8 and trunk.
Comment 8 Jason Merrill 2014-01-10 17:16:45 UTC
(In reply to Jakub Jelinek from comment #6)
> For the #c0 testcase, I think reduced testcase is:
> struct S { S (); S (int); ~S (); int i; };
> struct A { S s[1000000]; };

That's a more general case of the problem, but atomic has constant zero-initialization, which ought to simplify things.  This patch causes the initialization to properly (and more quickly) become a memset:

diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index d9c3647..04f0bf7 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -1182,6 +1182,7 @@ process_init_constructor_array (tree type, tree init,
              initialization explicit by list-initializing from {}.  */
            next = build_constructor (init_list_type_node, NULL);
            next = digest_init (TREE_TYPE (type), next, complain);
+           next = maybe_constant_init (next);
          }
        else if (!zero_init_p (TREE_TYPE (type)))
          next = build_zero_init (TREE_TYPE (type),

I'll poke at this some more.
Comment 9 Jason Merrill 2014-01-15 19:10:40 UTC
Author: jason
Date: Wed Jan 15 19:10:09 2014
New Revision: 206639

URL: http://gcc.gnu.org/viewcvs?rev=206639&root=gcc&view=rev
Log:
	PR c++/59659
	* typeck2.c (massage_init_elt): New.
	(process_init_constructor_record)
	(process_init_constructor_union): Use it.
	(process_init_constructor_array): Use it.  Use RANGE_EXPR.
	(split_nonconstant_init_1): Handle it.
	* semantics.c (cxx_eval_vec_init_1): Use force_rvalue.

Added:
    trunk/gcc/testsuite/g++.dg/opt/value-init1.C
    trunk/gcc/testsuite/g++.dg/opt/value-init2.C
Modified:
    trunk/gcc/cp/ChangeLog
    trunk/gcc/cp/semantics.c
    trunk/gcc/cp/typeck2.c
Comment 10 Jason Merrill 2014-01-15 20:37:46 UTC
Fixed for 4.9.
Comment 11 Jason Merrill 2014-01-24 16:48:26 UTC
Author: jason
Date: Fri Jan 24 16:47:54 2014
New Revision: 207051

URL: http://gcc.gnu.org/viewcvs?rev=207051&root=gcc&view=rev
Log:
	PR c++/59886
	PR c++/59659
	* typeck2.c (process_init_constructor_array): Don't create
	RANGE_EXPR yet.

Added:
    trunk/gcc/testsuite/g++.dg/init/aggr10.C
Modified:
    trunk/gcc/cp/ChangeLog
    trunk/gcc/cp/typeck2.c
Comment 12 Jason Merrill 2014-01-24 17:09:40 UTC
Author: jason
Date: Fri Jan 24 17:09:07 2014
New Revision: 207052

URL: http://gcc.gnu.org/viewcvs?rev=207052&root=gcc&view=rev
Log:
	PR c++/59886
	PR c++/59659
	* g++.dg/opt/value-init2.C: Remove.

Removed:
    trunk/gcc/testsuite/g++.dg/opt/value-init2.C