Compiling the below with any of the zero-initialization forms of std::atomic takes at least several hours (3GhZ Intel). The default-initialized and non-atomic forms compile immediately. g++ --std=c++11 -O0 : no other compile flags reproducible under 4.8.2 and 4.7.2 #include <iostream> #include <array> #include <algorithm> #include <atomic> int main(int argc, char* argv[]) { // std::array<std::atomic<int>, 1000000> arr; // default initialization (i.e., random data) = FAST std::array<std::atomic<int>, 1000000> arr{{}}; // zero initialization = FOREVER //std::array<std::atomic<int>, 1000000> arr{}; // zero initialization = FOREVER //std::array<std::atomic<int>, 1000000> arr={{}}; // zero init via assignment = FOREVER //std::array<int, 1000000> arr={{}}; // zero init non-atomic = FAST std::cerr << "sum = " << std::accumulate(arr.begin(), arr.end(), 0) << std::endl; }
Created attachment 31563 [details] Minimal test case. Attached minimal test case. std::atomic<int> arr[1000000]; compiles correctly, so it seems to be an interaction between std::atomic<int> and std::array.
The frontend generates struct array arr; struct array arr; <<cleanup_point <<< Unknown tree: expr_stmt <<cleanup_point <<< Unknown tree: expr_stmt arr._M_elems[0] = TARGET_EXPR <D.25804, {.D.23130={._M_i=0}}> >>>>>; <<cleanup_point <<< Unknown tree: expr_stmt arr._M_elems[1] = TARGET_EXPR <D.25805, {.D.23130={._M_i=0}}> >>>>>; <<cleanup_point <<< Unknown tree: expr_stmt arr._M_elems[2] = TARGET_EXPR <D.25806, {.D.23130={._M_i=0}}> >>>>>; <<cleanup_point <<< Unknown tree: expr_stmt arr._M_elems[3] = TARGET_EXPR <D.25807, {.D.23130={._M_i=0}}> >>>>>; ... instead of generating a loop. Which may be the cause of std::atomic having some subtle property (thus reproducible with a random class with that very same property eventually). Same issue with the following, so it's std::array's fault (or rather initializer list support). #include <array> class Foo { public: Foo() {} int i; }; int main() { std::array<Foo, 1000000> arr = {{}}; // Halting problem. }
I think this is somewhat related to e.g.: struct A { A (); A (int); ~A (); }; void bar (A *); #define T10(N) N, N, N, N, N, N, N, N, N, N #define T100(N) T10(N), T10(N), T10(N), T10(N), T10(N), \ T10(N), T10(N), T10(N), T10(N), T10(N) #define T1000(N) T100(N), T100(N), T100(N), T100(N), T100(N), \ T100(N), T100(N), T100(N), T100(N), T100(N) #define T10000(N) T1000(N), T1000(N), T1000(N), T1000(N), T1000(N), \ T1000(N), T1000(N), T1000(N), T1000(N), T1000(N) #define T100000(N) T10000(N), T10000(N), T10000(N), T10000(N), T10000(N), \ T10000(N), T10000(N), T10000(N), T10000(N), T10000(N) void foo () { A a[] = { 1, 2, T1000 (3), T10000 (4), T1000 (3), 2, 1 }; bar (a); } also taking long time to compile and generating enormous amount of code (and when replacing T10000 (4) with T100000 (4) it is even much worse).
Created attachment 31785 [details] patch for the array initialization Untested patch for the array initialization - if there are several (patch uses 10, could be say a param) consecutive same elements in the ctor, we can just use a loop instead, and let it to the compiler to decide if it will want to unroll it or not. I guess the initializer_list would need similar treatment.
(In reply to Jakub Jelinek from comment #3) > I think this is somewhat related to e.g.: > struct A { A (); A (int); ~A (); }; > void bar (A *); > > #define T10(N) N, N, N, N, N, N, N, N, N, N > #define T100(N) T10(N), T10(N), T10(N), T10(N), T10(N), \ > T10(N), T10(N), T10(N), T10(N), T10(N) > #define T1000(N) T100(N), T100(N), T100(N), T100(N), T100(N), \ > T100(N), T100(N), T100(N), T100(N), T100(N) > #define T10000(N) T1000(N), T1000(N), T1000(N), T1000(N), T1000(N), \ > T1000(N), T1000(N), T1000(N), T1000(N), T1000(N) > #define T100000(N) T10000(N), T10000(N), T10000(N), T10000(N), T10000(N), \ > T10000(N), T10000(N), T10000(N), T10000(N), T10000(N) > > void > foo () > { > A a[] = { 1, 2, T1000 (3), T10000 (4), T1000 (3), 2, 1 }; > bar (a); > } > > also taking long time to compile and generating enormous amount of code (and > when replacing T10000 (4) with T100000 (4) it is even much worse). I don't think they are directly related. For your testcase clang is even slower than gcc (1:51min vs. 21.8sec). perf report shows: 6.92% cc1plus cc1plus [.] get_value_for_expr(tree_node*, bool) 4.98% cc1plus cc1plus [.] canonicalize_value(prop_value_d*) 3.56% cc1plus cc1plus [.] mark_used_flags(rtx_def*, int) 3.01% cc1plus cc1plus [.] ccp_visit_phi_node(gimple_statement_base*) 2.97% cc1plus cc1plus [.] redirect_eh_edge_1(edge_def*, basic_block_def*, bool) 2.87% cc1plus cc1plus [.] remove_eh_landing_pad(eh_landing_pad_d*) 2.86% cc1plus cc1plus [.] vrp_meet(value_range_d*, value_range_d*) On the testcase from Mark (with array size 10000) clang is much faster (0.7sec vs. 15.7sec) and perf report shows that the multiplication in get_ref_base_and_extent is mostly responsible: 9.95% cc1plus cc1plus [.] get_ref_base_and_extent(tree_node*, long*, long*, long*) 7.94% cc1plus cc1plus [.] mul_double_wide_with_sign(unsigned long, long, unsigned long, long, unsigned long*, long*, unsigned long*, long*, bool) 7.40% cc1plus cc1plus [.] hash_table<vn_reference_hasher, xcallocator>::find_slot_with_hash(vn_reference_s const*, unsigned int, insert_option) 4.22% cc1plus cc1plus [.] component_ref_field_offset(tree_node*) 4.18% cc1plus cc1plus [.] record_store(rtx_def*, bb_info*) 3.81% cc1plus cc1plus [.] hash_table_mod2(unsigned int, unsigned int) 3.59% cc1plus cc1plus [.] array_ref_low_bound(tree_node*)
For the #c0 testcase, I think reduced testcase is: struct S { S (); S (int); ~S (); int i; }; struct A { S s[1000000]; }; void foo () { A a = {{}}; } and in that case we don't go through build_vec_init, but check_init_r/process_init_constructor etc.
(In reply to Jakub Jelinek from comment #6) > For the #c0 testcase, I think reduced testcase is: > struct S { S (); S (int); ~S (); int i; }; > struct A { S s[1000000]; }; > > void > foo () > { > A a = {{}}; > } > and in that case we don't go through build_vec_init, but > check_init_r/process_init_constructor etc. Using (one hundred thousand) "struct A { S s[100000]; };" crashes gcc-4.7, 4.8 and trunk.
(In reply to Jakub Jelinek from comment #6) > For the #c0 testcase, I think reduced testcase is: > struct S { S (); S (int); ~S (); int i; }; > struct A { S s[1000000]; }; That's a more general case of the problem, but atomic has constant zero-initialization, which ought to simplify things. This patch causes the initialization to properly (and more quickly) become a memset: diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c index d9c3647..04f0bf7 100644 --- a/gcc/cp/typeck2.c +++ b/gcc/cp/typeck2.c @@ -1182,6 +1182,7 @@ process_init_constructor_array (tree type, tree init, initialization explicit by list-initializing from {}. */ next = build_constructor (init_list_type_node, NULL); next = digest_init (TREE_TYPE (type), next, complain); + next = maybe_constant_init (next); } else if (!zero_init_p (TREE_TYPE (type))) next = build_zero_init (TREE_TYPE (type), I'll poke at this some more.
Author: jason Date: Wed Jan 15 19:10:09 2014 New Revision: 206639 URL: http://gcc.gnu.org/viewcvs?rev=206639&root=gcc&view=rev Log: PR c++/59659 * typeck2.c (massage_init_elt): New. (process_init_constructor_record) (process_init_constructor_union): Use it. (process_init_constructor_array): Use it. Use RANGE_EXPR. (split_nonconstant_init_1): Handle it. * semantics.c (cxx_eval_vec_init_1): Use force_rvalue. Added: trunk/gcc/testsuite/g++.dg/opt/value-init1.C trunk/gcc/testsuite/g++.dg/opt/value-init2.C Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/semantics.c trunk/gcc/cp/typeck2.c
Fixed for 4.9.
Author: jason Date: Fri Jan 24 16:47:54 2014 New Revision: 207051 URL: http://gcc.gnu.org/viewcvs?rev=207051&root=gcc&view=rev Log: PR c++/59886 PR c++/59659 * typeck2.c (process_init_constructor_array): Don't create RANGE_EXPR yet. Added: trunk/gcc/testsuite/g++.dg/init/aggr10.C Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/typeck2.c
Author: jason Date: Fri Jan 24 17:09:07 2014 New Revision: 207052 URL: http://gcc.gnu.org/viewcvs?rev=207052&root=gcc&view=rev Log: PR c++/59886 PR c++/59659 * g++.dg/opt/value-init2.C: Remove. Removed: trunk/gcc/testsuite/g++.dg/opt/value-init2.C