59659 – large zero-initialized std::array compile time excessive

Bug 59659 - large zero-initialized std::array compile time excessive

Summary: large zero-initialized std::array compile time excessive

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	c++ (show other bugs)
Version:	4.8.2

Importance:	P3 normal
Target Milestone:	4.9.0
Assignee:	Jason Merrill

URL:
Keywords:	compile-time-hog, memory-hog, missed-optimization

Depends on:
Blocks:

Reported:	2014-01-02 15:25 UTC by Mark Devaney
Modified:	2023-04-05 17:21 UTC (History)
CC List:	3 users (show)

See Also:	109421
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2014-01-07 00:00:00

Attachments
Minimal test case. (140 bytes, text/plain) 2014-01-02 16:04 UTC, Casey Carter	Details
patch for the array initialization (1.04 KB, patch) 2014-01-09 16:25 UTC, Jakub Jelinek	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mark Devaney 2014-01-02 15:25:35 UTC

Compiling the below with any of the zero-initialization forms of std::atomic takes at least several hours (3GhZ Intel).   The default-initialized and non-atomic forms compile immediately.

g++ --std=c++11 -O0 : no other compile flags

reproducible under 4.8.2 and 4.7.2


#include <iostream>
#include <array>
#include <algorithm>
#include <atomic>

int main(int argc, char* argv[]) {

    // std::array<std::atomic<int>, 1000000>  arr;         // default initialization (i.e., random data) = FAST

    std::array<std::atomic<int>, 1000000>  arr{{}};   // zero initialization = FOREVER

    //std::array<std::atomic<int>, 1000000>  arr{};   // zero initialization = FOREVER

    //std::array<std::atomic<int>, 1000000>  arr={{}};     // zero init via assignment = FOREVER

    //std::array<int, 1000000>  arr={{}};     // zero init non-atomic = FAST

    std::cerr << "sum = " << std::accumulate(arr.begin(), arr.end(), 0) << std::endl;
}

Comment 1 Casey Carter 2014-01-02 16:04:12 UTC

Created attachment 31563 [details]
Minimal test case.

Attached minimal test case. std::atomic<int> arr[1000000]; compiles correctly, so it seems to be an interaction between std::atomic<int> and std::array.

Comment 2 Richard Biener 2014-01-07 11:47:57 UTC

The frontend generates

  struct array arr;

    struct array arr;
  <<cleanup_point <<< Unknown tree: expr_stmt
  <<cleanup_point <<< Unknown tree: expr_stmt
  arr._M_elems[0] = TARGET_EXPR <D.25804, {.D.23130={._M_i=0}}> >>>>>;
  <<cleanup_point <<< Unknown tree: expr_stmt
  arr._M_elems[1] = TARGET_EXPR <D.25805, {.D.23130={._M_i=0}}> >>>>>;
  <<cleanup_point <<< Unknown tree: expr_stmt
  arr._M_elems[2] = TARGET_EXPR <D.25806, {.D.23130={._M_i=0}}> >>>>>;
  <<cleanup_point <<< Unknown tree: expr_stmt
  arr._M_elems[3] = TARGET_EXPR <D.25807, {.D.23130={._M_i=0}}> >>>>>;
...

instead of generating a loop.  Which may be the cause of std::atomic
having some subtle property (thus reproducible with a random class with
that very same property eventually).

Same issue with the following, so it's std::array's fault (or rather initializer
list support).

#include <array>

class Foo { public: Foo() {} int i; };

int main() {
    std::array<Foo, 1000000> arr = {{}}; // Halting problem.
}

Comment 3 Jakub Jelinek 2014-01-09 16:18:13 UTC

I think this is somewhat related to e.g.:
struct A { A (); A (int); ~A (); };
void bar (A *);

#define T10(N) N, N, N, N, N, N, N, N, N, N
#define T100(N) T10(N), T10(N), T10(N), T10(N), T10(N), \
T10(N), T10(N), T10(N), T10(N), T10(N)
#define T1000(N) T100(N), T100(N), T100(N), T100(N), T100(N), \
 T100(N), T100(N), T100(N), T100(N), T100(N)
#define T10000(N) T1000(N), T1000(N), T1000(N), T1000(N), T1000(N), \
  T1000(N), T1000(N), T1000(N), T1000(N), T1000(N)
#define T100000(N) T10000(N), T10000(N), T10000(N), T10000(N), T10000(N), \
   T10000(N), T10000(N), T10000(N), T10000(N), T10000(N)

void
foo ()
{
  A a[] = { 1, 2, T1000 (3), T10000 (4), T1000 (3), 2, 1 };
  bar (a);
}

also taking long time to compile and generating enormous amount of code (and when replacing T10000 (4) with T100000 (4) it is even much worse).

Comment 4 Jakub Jelinek 2014-01-09 16:25:16 UTC

Created attachment 31785 [details]
patch for the array initialization

Untested patch for the array initialization - if there are several (patch uses 10, could be say a param) consecutive same elements in the ctor, we can just use a loop instead, and let it to the compiler to decide if it will want to unroll it or not.  I guess the initializer_list would need similar treatment.

Comment 5 Markus Trippelsdorf 2014-01-09 16:58:59 UTC

(In reply to Jakub Jelinek from comment #3)
> I think this is somewhat related to e.g.:
> struct A { A (); A (int); ~A (); };
> void bar (A *);
> 
> #define T10(N) N, N, N, N, N, N, N, N, N, N
> #define T100(N) T10(N), T10(N), T10(N), T10(N), T10(N), \
> T10(N), T10(N), T10(N), T10(N), T10(N)
> #define T1000(N) T100(N), T100(N), T100(N), T100(N), T100(N), \
>  T100(N), T100(N), T100(N), T100(N), T100(N)
> #define T10000(N) T1000(N), T1000(N), T1000(N), T1000(N), T1000(N), \
>   T1000(N), T1000(N), T1000(N), T1000(N), T1000(N)
> #define T100000(N) T10000(N), T10000(N), T10000(N), T10000(N), T10000(N), \
>    T10000(N), T10000(N), T10000(N), T10000(N), T10000(N)
> 
> void
> foo ()
> {
>   A a[] = { 1, 2, T1000 (3), T10000 (4), T1000 (3), 2, 1 };
>   bar (a);
> }
> 
> also taking long time to compile and generating enormous amount of code (and
> when replacing T10000 (4) with T100000 (4) it is even much worse).

I don't think they are directly related.
For your testcase clang is even slower than gcc (1:51min vs. 21.8sec).
perf report shows:
  6.92%  cc1plus  cc1plus   [.] get_value_for_expr(tree_node*, bool)
  4.98%  cc1plus  cc1plus   [.] canonicalize_value(prop_value_d*)
  3.56%  cc1plus  cc1plus   [.] mark_used_flags(rtx_def*, int)
  3.01%  cc1plus  cc1plus   [.] ccp_visit_phi_node(gimple_statement_base*)
  2.97%  cc1plus  cc1plus   [.] redirect_eh_edge_1(edge_def*, basic_block_def*, bool)
  2.87%  cc1plus  cc1plus   [.] remove_eh_landing_pad(eh_landing_pad_d*)
  2.86%  cc1plus  cc1plus   [.] vrp_meet(value_range_d*, value_range_d*)


On the testcase from Mark (with array size 10000) clang is much 
faster (0.7sec vs. 15.7sec) and perf report shows that the multiplication in
get_ref_base_and_extent is mostly responsible:
  9.95%  cc1plus  cc1plus  [.] get_ref_base_and_extent(tree_node*, long*, long*, long*)
  7.94%  cc1plus  cc1plus  [.] mul_double_wide_with_sign(unsigned long, long, unsigned long, long, unsigned long*, long*, unsigned long*, long*, bool)
  7.40%  cc1plus  cc1plus  [.] hash_table<vn_reference_hasher, xcallocator>::find_slot_with_hash(vn_reference_s const*, unsigned int, insert_option)
  4.22%  cc1plus  cc1plus  [.] component_ref_field_offset(tree_node*)
  4.18%  cc1plus  cc1plus  [.] record_store(rtx_def*, bb_info*)
  3.81%  cc1plus  cc1plus  [.] hash_table_mod2(unsigned int, unsigned int)
  3.59%  cc1plus  cc1plus  [.] array_ref_low_bound(tree_node*)

Comment 6 Jakub Jelinek 2014-01-09 17:03:37 UTC

For the #c0 testcase, I think reduced testcase is:
struct S { S (); S (int); ~S (); int i; };
struct A { S s[1000000]; };

void
foo ()
{
  A a = {{}};
}
and in that case we don't go through build_vec_init, but check_init_r/process_init_constructor etc.

Comment 7 Markus Trippelsdorf 2014-01-09 17:24:30 UTC

(In reply to Jakub Jelinek from comment #6)
> For the #c0 testcase, I think reduced testcase is:
> struct S { S (); S (int); ~S (); int i; };
> struct A { S s[1000000]; };
> 
> void
> foo ()
> {
>   A a = {{}};
> }
> and in that case we don't go through build_vec_init, but
> check_init_r/process_init_constructor etc.

Using (one hundred thousand) "struct A { S s[100000]; };" crashes gcc-4.7, 4.8 and trunk.

Comment 8 Jason Merrill 2014-01-10 17:16:45 UTC

(In reply to Jakub Jelinek from comment #6)
> For the #c0 testcase, I think reduced testcase is:
> struct S { S (); S (int); ~S (); int i; };
> struct A { S s[1000000]; };

That's a more general case of the problem, but atomic has constant zero-initialization, which ought to simplify things.  This patch causes the initialization to properly (and more quickly) become a memset:

diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index d9c3647..04f0bf7 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -1182,6 +1182,7 @@ process_init_constructor_array (tree type, tree init,
              initialization explicit by list-initializing from {}.  */
            next = build_constructor (init_list_type_node, NULL);
            next = digest_init (TREE_TYPE (type), next, complain);
+           next = maybe_constant_init (next);
          }
        else if (!zero_init_p (TREE_TYPE (type)))
          next = build_zero_init (TREE_TYPE (type),

I'll poke at this some more.

Comment 9 Jason Merrill 2014-01-15 19:10:40 UTC

Author: jason
Date: Wed Jan 15 19:10:09 2014
New Revision: 206639

URL: http://gcc.gnu.org/viewcvs?rev=206639&root=gcc&view=rev
Log:
	PR c++/59659
	* typeck2.c (massage_init_elt): New.
	(process_init_constructor_record)
	(process_init_constructor_union): Use it.
	(process_init_constructor_array): Use it.  Use RANGE_EXPR.
	(split_nonconstant_init_1): Handle it.
	* semantics.c (cxx_eval_vec_init_1): Use force_rvalue.

Added:
    trunk/gcc/testsuite/g++.dg/opt/value-init1.C
    trunk/gcc/testsuite/g++.dg/opt/value-init2.C
Modified:
    trunk/gcc/cp/ChangeLog
    trunk/gcc/cp/semantics.c
    trunk/gcc/cp/typeck2.c

Comment 10 Jason Merrill 2014-01-15 20:37:46 UTC

Fixed for 4.9.

Comment 11 Jason Merrill 2014-01-24 16:48:26 UTC

Author: jason
Date: Fri Jan 24 16:47:54 2014
New Revision: 207051

URL: http://gcc.gnu.org/viewcvs?rev=207051&root=gcc&view=rev
Log:
	PR c++/59886
	PR c++/59659
	* typeck2.c (process_init_constructor_array): Don't create
	RANGE_EXPR yet.

Added:
    trunk/gcc/testsuite/g++.dg/init/aggr10.C
Modified:
    trunk/gcc/cp/ChangeLog
    trunk/gcc/cp/typeck2.c

Comment 12 Jason Merrill 2014-01-24 17:09:40 UTC

Author: jason
Date: Fri Jan 24 17:09:07 2014
New Revision: 207052

URL: http://gcc.gnu.org/viewcvs?rev=207052&root=gcc&view=rev
Log:
	PR c++/59886
	PR c++/59659
	* g++.dg/opt/value-init2.C: Remove.

Removed:
    trunk/gcc/testsuite/g++.dg/opt/value-init2.C