We are trying to move from gcc 2.95 to 3.3.1. We are using cygwin and the code that will not compile is code in a library generated by another group. we must use it as is. The other group also uses the library but they use a greenhills compiler. The code will compile with gcc 2.95 but gcc 3.3.1 give us this error: cc1plus: out of memory allocating 65536 bytes after a total of 402620416 bytes The gcc -v produces this information : Reading specs from /bin/../lib/gcc-lib/i686-pc-cygwin/3.3.1/specs Configured with: /netrel/src/gcc-3.3.1-2/configure --enable- languages=c,c++,f77,java --enable-libgcj --enable-threads=posix --with-system- zlib --enable-nls --without-included-gettext --enable-interpreter --enable-sjlj- exceptions --disable-version-specific-runtime-libs --enable-shared --build=i686- pc-linux --host=i686-pc-cygwin --target=i686-pc-cygwin --prefix=/usr --exec- prefix=/usr --sysconfdir=/etc --libdir=/usr/lib -- includedir=/nonexistent/include --libexecdir=/usr/sbin Thread model: posix gcc version 3.3.1 (cygming special)
Created attachment 5760 [details] file created with the -save-temps the command line that was run to produce this error was: g++ -I/usr/include/GL -c -I/RhapsodyCustomizations/Share/LangCpp -fno-schedule-insns -fno-schedule-insns2 -/RapsodyCustomizations/Share/LangCpp/osconfig/Cygwin -I/RhapsodyCustomizations/Share -I/RhapsodyCustomizations/Share/LangCpp/oxf -I/uaenfs_ASE/base/uae_dev/b2v6_1_0/uae//models/src/cds/include/ -I/uaenfs_ASE/base/uae_dev/b2v6_1_0/uae//models/src/cds/include/mfdSym/ -I/uaenfs_ASE/base/uae_dev/b2v6_1_0/uae//include -I/uaenfs_ASE/base/uae_dev/b2v6_1_0/universal/include -I/uaenfs_ASE/base/uae_dev/b2v6_1_0/universal/include/i386-pc-cygwin32 -I/uaenfs_ASE/base/uae_dev/b2v6_1_0universal/include/i386-pc-cygwin32 -I/uaenfs_ASE/base/uae_dev/b2v6_1_0//universal/lib/src/libsdn/include -I/usr/local/mysql/include/mysql -I/Exceed/xdk/motif21/include -I/Exceed/xdk/include -I/usr/include/GL -I/Exceed/xdk/motif21/include -DARCH_X86 -DOS_CYGWIN -DOS_VERSION=1322 -DWIN32 -DNT -DMOTIFAPP -DBO_LITTLE_ENDIAN -DOM_USE_STL -DDBTYPE_MYSQL -DUSE_IOSTREAM -DARCH_X86 -DCYGWIN -DOS_VERSION=1322 -DWIN32 -DNT -DXMSTATIC -c ../SymSymbolGwb.cpp -o SymSymbolGwb.o
Created attachment 5763 [details] file created with -save-temps file was to large so it was compressed using bzip2
Debora, Your problem is that G++ needs more RAM to be able to compile that file. Lately, there has been a lot of work on making the C++ compiler consumes less and less memory, and on making on faster, but this is work cannot be backported into the 3.3 stable release serie. Currently G++ 3.4.0 compiles faster than 2.95 on most tests, and consumes around the same amount of memory. I would suggest you to try again upgrading your compiler to 3.4.0. You will have to pull it from CVS and recompile it. Also, you may want to take a look at http://gcc.gnu.org/gcc-3.4/changes.html, since the new compiler is much more strict to the C++ standard, so old code might need to be modified. If this is feasable for you, I will be looking forward to a new report from you on how 3.4.0 behaves with your code. Alas, it's probably too late to improve the 3.3 serie so much. If you really need to use it, you can probably trying adding more RAM to your computer, it should probably allow it to finish compilation.
Confirmed., the problem is the large array for some reason GCC now likes to take huge amount memory for the array, mega_TextureSymbolData.
Related to bug 12245.
I've investigated this problem. To some extent, this problem is inevitable. In the old days, we used to output assembly code for a global array element-by-element as we saw it. That's not what we want to do: we want to store up the array so that we can optimize loads from fixed indices, etc. However, we do waste a ton of memory. We allocate a separate INTEGER_CST for each occurrence of the same integer constant, even though there are only a few of them. We allocate an entirely new list of constants when we refactor the brace-enclosed form (essentially adding braces that C++ says you can omit). We allocate an integer constant for every possible index into the array, which has 8 million entries. We should be representing the initializer with a structure like this: struct init_group { struct init_group *next; tree designator; /* The designator for the first element in the array. */ tree elts[]; }; rather than a linked-list per element. That would be far more efficient for large arrays and a win even for small arrays. None of this is going to get fixed until 3.5, however.
I will take a look at this next week because of things which need to improve with respect to compile time.
Mark, one question: are you suggesting the special structure for the constructor only before reshape_init or also after it? Because we need to build a CONSTRUCTOR sooner or later for the gimplifier. Anyway, would such a change acceptable for Stage 3, being a bugfix towards better memory allocation (and fixing this regression)?
Subject: Re: [3.3/3.4/4.0 Regression] out of memory giovannibajo at libero dot it wrote: >------- Additional Comments From giovannibajo at libero dot it 2004-09-15 14:26 ------- >Mark, one question: are you suggesting the special structure for the >constructor only before reshape_init or also after it? Because we need to build >a CONSTRUCTOR sooner or later for the gimplifier. > > Both. After reshape_init is even more critical because that memory cannot be collected. >Anyway, would such a change acceptable for Stage 3, being a bugfix towards >better memory allocation (and fixing this regression)? > Perhaps, but it seems unlikely. I've thought about it, but I'm scared of how much impact there would be through the compiler. There are some smaller things that we could do more locally, though, like reusing the TREE_LISTs from before reshape_init after reshape_init. Also, I suspect that Nathan's integer-sharing work has already reduced this problem somewhat; part of the problem was that we made multiple copies of every INTEGER_CST frpm zero up to the upper bound of the array. Now, we should have only one at least.
It looks like we do not destroy and recreate initializers in reshape_init, elements are moved from the old CONSTRUCTOR to the new one. Instead, while investigating the code, I noticed this in reshape_init: /* Loop through the array elements, gathering initializers. */ for (index = size_zero_node; *initp && (!max_index || !tree_int_cst_lt (max_index, index)); index = size_binop (PLUS_EXPR, index, size_one_node)) { We are constructing a *different* INTEGER_CST for each index, and we never use it. This generates a lot of garbage. I do not know if it is enough to switch to HOST_WIDE_INT only, we may want to handle arrays larger than HWI (e.g. crosscompiling from 16bit to 32bit). My solution for mainline is to use HWI whenever possible, and falling back to trees when the indices get too high. Mark, does this make sense? Dunno if this will be acceptable for 3.3 and 3.4 too, but let's have this fixed in mainline, as a start.
Two patches posted, waiting for review: http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01839.html http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01840.html
Subject: Bug 14179 CVSROOT: /cvs/gcc Module name: gcc Changes by: giovannibajo@gcc.gnu.org 2004-09-20 23:05:43 Modified files: gcc/cp : ChangeLog decl.c Log message: PR c++/14179 * decl.c (reshape_init): Extract array handling into... (reshape_init_array): New function. Use integers instead of trees for indices. Handle out-of-range designated initializers. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/ChangeLog.diff?cvsroot=gcc&r1=1.4366&r2=1.4367 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/decl.c.diff?cvsroot=gcc&r1=1.1296&r2=1.1297
Subject: Bug 14179 CVSROOT: /cvs/gcc Module name: gcc Branch: gcc-3_3-branch Changes by: giovannibajo@gcc.gnu.org 2004-09-21 21:12:51 Modified files: gcc/cp : ChangeLog decl.c Log message: PR c++/14179 * decl.c (reshape_init): Extract array handling into... (reshape_init_array): New function. Use integers instead of trees for indices. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_3-branch&r1=1.3076.2.274&r2=1.3076.2.275 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/decl.c.diff?cvsroot=gcc&only_with_tag=gcc-3_3-branch&r1=1.965.2.84&r2=1.965.2.85
Subject: Bug 14179 CVSROOT: /cvs/gcc Module name: gcc Branch: gcc-3_4-branch Changes by: giovannibajo@gcc.gnu.org 2004-09-21 22:49:56 Modified files: gcc/cp : ChangeLog decl.c Log message: PR c++/14179 * decl.c (reshape_init): Extract array handling into... (reshape_init_array): New function. Use integers instead of trees for indices. Handle out-of-range designated initializers. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.3892.2.158&r2=1.3892.2.159 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/decl.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.1174.2.23&r2=1.1174.2.24
Subject: Bug 14179 CVSROOT: /cvs/gcc Module name: gcc Branch: gcc-3_4-branch Changes by: giovannibajo@gcc.gnu.org 2004-09-21 23:46:08 Modified files: gcc/cp : ChangeLog parser.c Log message: PR c++/14179 * parser.c (cp_parser_initializer): Speed up parsing of simple literals as initializers. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.3892.2.159&r2=1.3892.2.160 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/parser.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.157.2.40&r2=1.157.2.41
OK, let me update the situation. I'm using a testcase which has an array of 4 millions of initializers (about 1/4th of the original testcase). These are the three patches I'm considering important for this patch till now: (1) Nathan's INTEGER_CST sharing (2) my patch to reshape_init_array using HOST_WIDE_INTEGER (3) my patch to speedup initializer list parsing (1) and (2) affects memory occupation, (3) affects compilation time. This is the situations for our supported compilers: GCC-3.3.5 --------- (1) not present (2) applied (3) not required (old parser) Testcase: killed after 0m31s (420MB is the kill threshold). GCC-3.4.3 --------- (1) not present (2) applied (3) applied Testcase: killed after 0m23s (420MB is the kill threshold). GCC-4.0.0 --------- (1) applied (2) applied (3) will be applied (or rewritten in a better form). Testcase: 0m26s, 220MB (this is with (3) applied, and checking disabled) For comparison, GCC-2.95: 0m5s, 175MB.
So, situation is getting better, but we are not there yet. My next target is process_init_constructor, where we are being very dumb.
Subject: Re: [3.3/3.4/4.0 Regression] out of memory giovannibajo at libero dot it wrote: >Testcase: 0m26s, 220MB >(this is with (3) applied, and checking disabled) > > >For comparison, GCC-2.95: 0m5s, 175MB. > > For the record, I don't expect we will get all the way back there. I consider it a design feature that we are keeping the initializer around in the compiler: that allows us to (in theory) pull elements out of it if they are referenced. With that capabitility comes a cost. That is not to say that we should not keep going with your work of course! FYI, Jeffrey Oldham is working on operator-precedence parsing; hopefully he'll have a patch soon.
Created attachment 7196 [details] patch for precedence parsing Here is a patch for precedence parsing, being regtested currently.
Actually precedence parsing is not a big job, incrementally what I did for my patch was just a matter of 1) making a single map of all the productions instead of the per-function maps we have currently 2) making cp_parser_binary_expression use it, still keeping the recursive call and passing around the precedence we are interested in 3) turn recursion into an explicit stack 4) optimize to eliminate unneeded (simulated) recursion It would be nice if the wiki were used to track in-house CodeSourcery projects. It looks like Jeffrey and I duplicated work, and we almost did the same on the tree class codes projects (and I'm not mocking CodeSourcery's work, I'm just reading the wiki and using common sense on where to optimize the compiler). Paolo
Subject: Re: [3.3/3.4/4.0 Regression] out of memory bonzini at gnu dot org wrote: >------- Additional Comments From bonzini at gnu dot org 2004-09-22 13:51 ------- >Actually precedence parsing is not a big job, incrementally what I did for my >patch was just a matter of >1) making a single map of all the productions instead of the per-function maps >we have currently >2) making cp_parser_binary_expression use it, still keeping the recursive call >and passing around the precedence we are interested in >3) turn recursion into an explicit stack >4) optimize to eliminate unneeded (simulated) recursion > >It would be nice if the wiki were used to track in-house CodeSourcery projects. > It looks like Jeffrey and I duplicated work, and we almost did the same on the >tree class codes projects (and I'm not mocking CodeSourcery's work, I'm just >reading the wiki and using common sense on where to optimize the compiler). > > I'm sorry that happened, but we did mention the operator-precedence parsing and tree-code conversion work publicly as we started on it. We'll try to be louder. I am certainly no happier than you that we are duplicating each other's work!
Paolo -- This patch is great. My only comment is that I would like the grammar entries that used to be above the various functions (pm-expression: ..., etc.) to be preserved above the rewritten binary_expression function. Please check in! Thanks, -- Mark
Subject: Re: [3.3/3.4/4.0 Regression] out of memory > I'm sorry that happened, but we did mention the operator-precedence > parsing and tree-code conversion work publicly as we started on it. Though I only knew of tree-code conversion from private mail by Zack (and you said you were about to finish it when I said I'd do that on faster-compiler-branch); and the operator-precedence was placed on the wiki a week ago as a generic "speedup area", not a project that is actively worked on unlike Matt's lexer overhaul: http://www.dberlin.org/gccwiki/index.php/Speedup%20areas Since that page also mentions Nathan's "Add contains-repeated-base and is diamond-shaped flags to classes" project, maybe it is *that* page that needs to be louder. > We'll try to be louder. No problem. You're doing a good work. Paolo
Subject: Re: [3.3/3.4/4.0 Regression] out of memory paolo dot bonzini at polimi dot it wrote: > and the operator-precedence was placed on the >wiki a week ago as a generic "speedup area", not a project that is >actively worked on unlike Matt's lexer overhaul: > > Honestly, we didn't know we'd be working on until the early part of this week. We didn't decided until Monday around noon in California. But, the fundamental point is that it's in nobody's best interest to duplicate work, so we will try to make as much noise as possible about projects. >http://www.dberlin.org/gccwiki/index.php/Speedup%20areas > >Since that page also mentions Nathan's "Add contains-repeated-base and >is diamond-shaped flags to classes" project, maybe it is *that* page >that needs to be louder. > > Just for the record, I do not know if Nathan is presently working on that or not. I think he may have concluded there is not enough win there. > > We'll try to be louder. > >No problem. You're doing a good work. > > Thanks for your understanding -- and you, likewise, are doing good work! After you get timing numbers for 14179, it would be interesting to consider whether or not we should try to extend the operator-precedence parser, or do other short-circuiting tricks, when getting down to the bottom of binary expressions. For example, if a unary expression is an integer literal or identifier followed by ")" or "," or ";" we know that it's just a primary expression. In other words, we could use two tokens of lookahead to zip straight to cp_parser_primary_expression. Would you like to take a look at that as well?
Subject: Re: [3.3/3.4/4.0 Regression] out of memory > In other words, we could use two tokens > of lookahead to zip straight to cp_parser_primary_expression. Would you > like to take a look at that as well? I had mailed a prototype of the patch to Giovanni as well, hoping that he got to that before me. We'll see after the regtest has concluded, though. Paolo
Mark: process_init_constructor builds new TREE_LISTs for each new initializer. This is pretty easy to get rid of, at least for arrays, and will be taken of with a patch I will be testing soon. The mainline version will likely extract and cleanup array handling into a separate function. But process_init_constructor also calls digest_init for each and every initializer, which makes the initializer goes through the conversion machinery. For the example in this PR, we build an IDENTITY_CONV and a couple of STANDARD_CONV for each initializer. For mainline, I could try to modify the conversion engine to not use trees to keep track of conversions. I think a specific struct kept in a local Vec would be good enough. For 3.4/3.3, is there a way to avoid calling digest_init if we detect that we can just fold_convert (or similar) the initializer? Or maybe put such a speedup check within digest_init directly? I am thinking of simple default promotions, for which building 3-4 trees and throwing them away doesn't look too smart. I am not expert in this kind of type conversions stuff, so I can't devise what a correct check for this would be, without making it too specific for the case in this PR. Can you suggest me something to get me started?
Subject: Re: [3.3/3.4/4.0 Regression] out of memory giovannibajo at libero dot it wrote: >------- Additional Comments From giovannibajo at libero dot it 2004-09-23 00:01 ------- >Mark: process_init_constructor builds new TREE_LISTs for each new initializer. >This is pretty easy to get rid of, at least for arrays, and will be taken of >with a patch I will be testing soon. The mainline version will likely extract >and cleanup array handling into a separate function. > >But process_init_constructor also calls digest_init for each and every >initializer, which makes the initializer goes through the conversion machinery. >For the example in this PR, we build an IDENTITY_CONV and a couple of >STANDARD_CONV for each initializer. > > On the mainline, this should be much cheaper because we do not build any trees for conversions. We build "struct conversion" instead, and those are allocated on an obstack. So, you should confirm that this is still a bottleneck on the mainline. >For 3.4/3.3, is there a way to avoid calling digest_init if we detect that we >can just fold_convert (or similar) the initializer? Or maybe put such a speedup >check within digest_init directly? I am thinking of simple default promotions, >for which building 3-4 trees and throwing them away doesn't look too smart. I >am not expert in this kind of type conversions stuff, so I can't devise what a >correct check for this would be, without making it too specific for the case in >this PR. Can you suggest me something to get me started? > > I think it would be better to try to do this in a way that could be used on the mainline too. If conversions are still a bottleneck, then we could try to optimize. The most common case is probably that the "from" and "to" types are the same. So, you could try having implicit_conversion do "if same_type_p (to, from) && !class_type return identity conversion". (Might even be better just to check pointer equality of "to" and "from", so as to avoid the cost of same_type_p if they are *not* the same.) That would short-circuit a lot of the work, and might win for other test cases as well, because you save not only on digest_init, but with function calls like: void f(int); void g() { f(3); }
(In reply to comment #27) > On the mainline, this should be much cheaper because we do not build any > trees for conversions. We build "struct conversion" instead, and those > are allocated on an obstack. So, you should confirm that this is still > a bottleneck on the mainline. Ah right. I did forget that this cleanup was already done. I can confirm this is not a bottleneck on the mainline anymore. BTW, preliminar testing of my patch to process_init_constructor is *very* promising: on the mainline, compared to comment #16, we now save an additional 100MB of RAM. We can compile the quarter of the testcase with 120MB of RAM (and GCC 2.95 uses 175MB)! >>For 3.4/3.3, is there a way to avoid calling digest_init if we detect that we >>can just fold_convert (or similar) the initializer? > I think it would be better to try to do this in a way that could be used > on the mainline too. If conversions are still a bottleneck, then we > could try to optimize. It turned out I was wrong, and we don't need to do this on mainline. > The most common case is probably that the "from" > and "to" types are the same. So, you could try having > implicit_conversion do "if same_type_p (to, from) && !class_type return > identity conversion". (Might even be better just to check pointer > equality of "to" and "from", so as to avoid the cost of same_type_p if > they are *not* the same.) That would short-circuit a lot of the work, > and might win for other test cases as well, because you save not only on > digest_init, but with function calls like: > void f(int); > void g() { f(3); } Yes, but the problem is that also default promotions are very common: void f(char); void g() { f(3); } and this is what we need to short-circuit for the testcase to start saving memory. I tried something like: if (INTEGRAL_TYPE_P (to) && INTEGRAL_TYPE_P (from) && same_type_p (type_promotes_to (to), type_promotes_to (from))) return ocp_convert (to, expr, CONV_IMPLICIT, flags); but I'm not sure about those type_promotes_to, plus it segfaults for some reason I'm investigating...
Subject: Re: [3.3/3.4/4.0 Regression] out of memory giovannibajo at libero dot it wrote: >>The most common case is probably that the "from" >>and "to" types are the same. So, you could try having >>implicit_conversion do "if same_type_p (to, from) && !class_type return >>identity conversion". (Might even be better just to check pointer >>equality of "to" and "from", so as to avoid the cost of same_type_p if >>they are *not* the same.) That would short-circuit a lot of the work, >>and might win for other test cases as well, because you save not only on >>digest_init, but with function calls like: >> void f(int); >> void g() { f(3); } >> >> > >Yes, but the problem is that also default promotions are very common: > >void f(char); >void g() { f(3); } > >and this is what we need to short-circuit for the testcase to start saving >memory. I tried something like: > > if (INTEGRAL_TYPE_P (to) && INTEGRAL_TYPE_P (from) > && same_type_p (type_promotes_to (to), type_promotes_to (from))) > return ocp_convert (to, expr, CONV_IMPLICIT, flags); > >but I'm not sure about those type_promotes_to, plus it segfaults for some >reason I'm investigating... > > I don't know about the segfault, but I'd worry that you might not win much once the tests get that complex, at least for code other than this one test case. Giant arrays with huge initializers are not the typical case, thankfully.
Great news. Thanks to fixing PR17596, we now outperform 3.3.4 by 25% for a reduced testcase (with an array of 240000 elements). Every additional element costs us a mere 44 instructions in cp_parser_binary_expression according to cachegrind. I'm not closing this because Giovanni's patch to lookahead for a comma may still make some difference, but I'm degrading its priority. Paolo
Subject: Re: [3.3/3.4/4.0 Regression] out of memory bonzini at gcc dot gnu dot org wrote: >------- Additional Comments From bonzini at gcc dot gnu dot org 2004-09-24 16:05 ------- >Great news. Thanks to fixing PR17596, we now outperform 3.3.4 by 25% for a >reduced testcase (with an array of 240000 elements). Every additional element >costs us a mere 44 instructions in cp_parser_binary_expression according to >cachegrind. > > That is fabulous news! >I'm not closing this because Giovanni's patch to lookahead for a comma may still >make some difference, but I'm degrading its priority. > Please also remove any regression tags, and remove any release target markings. (This is now an opportunity for improvement, not a regresison.)
Subject: Re: [3.3/3.4/4.0 Regression] out of memory >>Every additional element costs us a mere 44 instructions >>in cp_parser_binary_expression according to cachegrind. > > That is fabulous news! (Of course it does not count instructions elsewhere). > Please also remove any regression tags, and remove any release target > markings. (This is now an opportunity for improvement, not a regresison.) Done.
No, sorry, this is wrong. This bug still shows a big memory regression. As I am explaining in comment #26 and comment #28, I am working on a patch to process_init_constructor to fix it, but we are not there yet. (when I said "I confirm this is not a bottleneck on mainline anymore" I meant that the standard conversion code was not eating too much memory -- the testcase still cannot be compiled on my computer because of useless TREE_LISTs we build in process_init_constructor).
Now that this is an enhancement is there any chance of getting it fixed
Yes because this is still a regression. Note the mainline is already better than what 3.3.3 was in terms of memory usage.
Debora, I'm working on a patch which should definitely fix this bug. I hope to be able to finish it before 4.0 gets out and/or 3.3 is definitely closed.
Giovanni, any news?
Subject: Re: [3.3/3.4/4.0 Regression] out of memory while parsing array with many initializers steven at gcc dot gnu dot org <gcc-bugzilla@gcc.gnu.org> wrote: > Giovanni, any news? I have a patch around for a long time already, but I cannot find the time to do much GCC work right now (as you may have noticed). I would still like to tackle this bug unless somebody is in a hurry, so I guess you'll have to wait a little bit more. Giovanni Bajo
Everyone is in a hurry, this is a regression ;-) Can you attach the patch so someone can have a look and maybe finish it for you?
any word on this?
Created attachment 8814 [details] Preliminar patch Sorry about this. This is a preliminar patch which I did months ago and never got around testing and posting it. This greatly reduces memory occupation for the testcase in this PR, and basically fixes the only remaining memory hog.
Using the testcase from PR 12245: cp/parser.c:285 (cp_lexer_new_main) 0: 0.0% 372302336:88.5% 0: 0.0% 104391168:79.7% 9 cp/parser.c:270 (cp_lexer_new_main) 0: 0.0% 364288: 0.1% 0: 0.0% 102144: 0.1% 1 cp/parser.c:12407 (cp_parser_initializer_list) 22770552:24.8% 23955160: 5.7% 0: 0.0% 13171408:10.1% 19 cp/decl.c:4182 (reshape_init_array_1) 0: 0.0% 23955160: 5.7% 22770552:24.6% 13171408:10.1% 19 ggc-common.c:193 (ggc_calloc) 16770368:18.3% 0: 0.0% 16805272:18.1% 612: 0.0% 51 tree.c:828 (build_int_cst_wide) 480: 0.0% 0: 0.0% 52180672:56.3% 0: 0.0% 1630661 convert.c:671 (convert_to_integer) 52187584:56.8% 0: 0.0% 0: 0.0% 0: 0.0% 1630862 This is worse than the C front-end.
I see that Giovanni checked in a significant patch here: 2005-07-20 Giovanni Bajo <giovannibajo@libero.it> Make CONSTRUCTOR use VEC to store initializers. Is this PR still a significant regression from an earlier release?
We don't have clear evidence that this is worse, let alone substantially worse, than previous releases. Until and unless we do, I've downgraded this to P4. However, if this is fixed, let's just mark it so, and move on. In fact, if it's even in the ballpark, let's mark it fixed and move on. It's usually not very useful to chase the last few percent on a compile-time/memory testcase, as there are other places where we know we can have bigger impact.
Comment on attachment 8814 [details] Preliminar patch The patch is not relevant anymore after the commit that Ian pointed out.
Won't fix for 4.0.x
*** Bug 36516 has been marked as a duplicate of this bug. ***
Closing 4.1 branch.
Similarly as in PR c/12245 we build a tons of unnecesary CONVERT_EXPRs. Avoiding this by same patch as attached to PR c/12245 brings garbage donwn by 54% from: cp/lex.c:511 (build_lang_decl) 94176: 0.0% 116432: 0.0% 826264: 0.1% 98952: 0.0% 4247 toplev.c:1538 (realloc_for_line_map) 0: 0.0% 1310720: 0.1% 1316864: 0.1% 555008: 0.1% 7 ggc-common.c:187 (ggc_calloc) 134478488:12.0% 188112: 0.0% 134356240:14.6% 18504: 0.0% 2913 cp/decl.c:4683 (reshape_init_array_1) 0: 0.0% 374786120:20.5% 373090856:40.6% 211006352:43.0% 22 cp/parser.c:14709 (cp_parser_initializer_list) 373090856:33.3% 374786120:20.5% 88: 0.0% 211006360:43.0% 23 tree.c:1004 (build_int_cst_wide) 8640: 0.0% 0: 0.0% 402626592:43.8% 0: 0.0% 8388234 convert.c:752 (convert_to_integer) 603950328:54.0% 0: 0.0% 0: 0.0% 67105592:13.7% 8388199 Total 1118959027 1826757514 919671455 491189724 16977745 source location Garbage Freed Leak Overhead Times to: cp/lex.c:511 (build_lang_decl) 94176: 0.0% 116432: 0.0% 826264: 0.1% 98952: 0.0% 4247 toplev.c:1538 (realloc_for_line_map) 0: 0.0% 1310720: 0.1% 1316864: 0.1% 555008: 0.1% 7 ggc-common.c:187 (ggc_calloc) 134478488:26.1% 188112: 0.0% 134356240:14.6% 18504: 0.0% 2913 cp/decl.c:4683 (reshape_init_array_1) 0: 0.0% 374786120:20.5% 373090856:40.6% 211006352:49.8% 22 cp/parser.c:14709 (cp_parser_initializer_list) 373090856:72.4% 374786120:20.5% 88: 0.0% 211006360:49.8% 23 tree.c:1004 (build_int_cst_wide) 8640: 0.0% 0: 0.0% 402626592:43.8% 0: 0.0% 8388234 Total 515008771 1826757514 919671455 424084140 8589547 source location Garbage Freed Leak Overhead Times so saving about 0.5GB of RAM and speeding up correspondingly too. We can still improve but this seems low hanging fruit. Honza
Honza, you realize that the numbers are completely unreadable in bugzilla, right?
Subject: Re: [4.2/4.3/4.4 Regression] out of memory while parsing array with many initializers > Honza, you realize that the numbers are completely unreadable in bugzilla, > right? THey need some care to read, the columns are still intact, just interleaved... I wonder why bugzilla insists on the linebreaks? Honza
With patches proposed for c/12245 we now need 377MB (from original over 1GB) garbage and produce 920MB of IL. Pretty much all the garbage is coming from temporary list constructed here: /* Add it to the vector. */ CONSTRUCTOR_APPEND_ELT(v, identifier, initializer); in cp_parser_initializer_list. Perhaps explicitly freeing would be good idea? Honza
Subject: Re: [4.2/4.3/4.4 Regression] out of memory while parsing array with many initializers hubicka at gcc dot gnu dot org wrote: > Perhaps explicitly freeing would be good idea? I certainly have no objection to explicitly freeing storage if we know we don't need it anymore.
Subject: Re: [4.2/4.3/4.4 Regression] out of memory while parsing array with many initializers > > Perhaps explicitly freeing would be good idea? > > I certainly have no objection to explicitly freeing storage if we know > we don't need it anymore. Problem is that I don't know enough of C++ parser to be sure where we can safely free this vector? Honza
Closing 4.2 branch.
GCC 4.3.4 is being released, adjusting target milestone.
*** Bug 44066 has been marked as a duplicate of this bug. ***
GCC 4.3.5 is being released, adjusting target milestone.
4.3 branch is being closed, moving to 4.4.7 target.
Giovanni hasn't touched this bug since 2004, so I'm unassigning him. It seems to me that the best way to avoid the garbage from cp_parser_initializer_list would be to rewrite reshape_init to avoid copying initializer lists that need no reshaping. But that isn't a project for stage 4.
Created attachment 26317 [details] Testcase with just the character array I couldn't compile the original testcase with 2.95, so I've stripped out the important part, which is just a massive char array. For reference, compiling this on my Core i7 laptop (time and VM usage): 2.95 6s 717M 3.0 16s 1764M 3.2 17s 1813M 3.3 20s 2028M 3.4 15s 1803M 4.0 18s 1900M 4.1 9s 1635M 4.2 10s 1636M 4.3 9s 1158M 4.4 xxx 1161M (no time; non-optimized build) 4.5 11s 1097M 4.6 xxx 1258M (ditto) 4.7 14s 1704M (r183161, optimized, --enable-checking=release) So there was certainly a big jump from 2.95 to 3.0. 4.3 improved memory use quite a bit, but now it's gone up again.
(In reply to comment #61) > 4.7 14s 1704M (r183161, optimized, --enable-checking=release) Making the change to convert_to_integer mentioned in 12245 reduces VM size to 1509M; there's another 190M of garbage from cp_parser_initializer_list, but that still doesn't account for all the increase in VM size.
> Making the change to convert_to_integer mentioned in 12245 reduces VM size to > 1509M; there's another 190M of garbage from cp_parser_initializer_list, but > that still doesn't account for all the increase in VM size. --enable-gather-detailed-mem-stats dump should pinpoint this quite easilly... Honza
Yep, it turned out that there was a lot of allocation overhead from vector allocation in the token buffer. After fixing that as well with the patch at http://gcc.gnu.org/ml/gcc-patches/2012-01/msg00732.html this testcase is down to 967MB VM size. The only obvious area of improvement left is the 67MB of garbage from unnecessary reshape_init copying, which seems like more work than it's worth for this testcase, and definitely not something for 4.7.
Author: jason Date: Mon Jan 16 16:40:26 2012 New Revision: 183213 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=183213 Log: PR c++/14179 * vec.c (vec_gc_o_reserve_1): Use ggc_round_alloc_size. Modified: trunk/gcc/ChangeLog trunk/gcc/vec.c
Author: jason Date: Mon Jan 16 16:40:38 2012 New Revision: 183214 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=183214 Log: PR c/12245 PR c++/14179 * convert.c (convert_to_integer): Use fold_convert for converting an INTEGER_CST to integer type. Modified: trunk/gcc/ChangeLog trunk/gcc/convert.c
4.4 branch is being closed, moving to 4.5.4 target.
GCC 4.6.4 has been released and the branch has been closed.
The 4.7 branch is being closed, moving target milestone to 4.8.4.
GCC 4.8.4 has been released.
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.
GCC 4.9.3 has been released.
GCC 4.9 branch is being closed
Author: rguenth Date: Thu Feb 2 08:55:44 2017 New Revision: 245118 URL: https://gcc.gnu.org/viewcvs?rev=245118&root=gcc&view=rev Log: 2017-02-02 Richard Biener <rguenther@suse.de> PR cp/14179 * cp-gimplify.c (cp_fold): When folding a CONSTRUCTOR copy it lazily on the first changed element only and copy it fully upfront, only storing changed elements. Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/cp-gimplify.c
For C++ another inefficiency is that we call (for the testcase from PR12245) 1630776 times cxx_eval_outermost_constant_expr which always allocates a hash-map. All but one of the calls are with t == INTEGER_CST. Called via maybe_constant_init which has the same issue as maybe_constant_value (see thread starting at https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00046.html).
Top VM usage update: 4.7.2 14s 1660M (-O0) 7.0.1 20s 1100M (-O0 -fno-checking but checking enabled)
So the "low hanging fruit" remaining is reshape_init_array copying the whole array even if not necessary. INTEGER_CSTs still account for most of the memory use (200MB) apart from C++ preprocessor tokens (530MB) and the actual array of tree pointers for the constructors (2x 130MB at peak).
GCC 6 branch is being closed
The GCC 7 branch is being closed, re-targeting to GCC 8.4.
(In reply to Richard Biener from comment #77) > So the "low hanging fruit" remaining is reshape_init_array copying the whole > array even if not necessary. > > INTEGER_CSTs still account for most of the memory use (200MB) apart from C++ > preprocessor tokens (530MB) and the actual array of tree pointers for the > constructors (2x 130MB at peak). Still true, plus location wrappers. I think I'll look at turning off location wrappers in the initializer for a large array. Significant lines from -fmem-report: GGC memory Leak Garbage Freed Overhead Times -------------------------------------------------------------------------------------------------------------------------------------------- cp/parser.c:23438 (cp_parser_initializer_list) 0 : 0.0% 128M: 16.7% 128M: 9.1% 160 : 0.1% 22 cp/parser.c:657 (cp_lexer_new_main) 0 : 0.0% 0 : 0.0% 511M: 36.3% 72 : 0.1% 9 tree.c:14891 (maybe_wrap_with_location) 0 : 0.0% 511M: 66.7% 0 : 0.0% 0 : 0.0% 15M hash-table.h:802 (expand) 0 : 0.0% 0 : 0.0% 511M: 36.4% 1616 : 1.4% 17 cp/decl.c:6021 (reshape_init_array_1) 0 : 0.0% 128M: 16.7% 128M: 9.1% 160 : 0.1% 22 tree-inline.c:5499 (copy_tree_r) 128M: 28.4% 0 : 0.0% 0 : 0.0% 8 : 0.0% 1 hash-table.h:802 (expand) 128M: 28.4% 0 : 0.0% 127M: 9.1% 648 : 0.6% 13 tree.c:1615 (wide_int_to_tree_1) 191M: 42.7% 240 : 0.0% 0 : 0.0% 0 : 0.0% 8191k
The master branch has been updated by Jason Merrill <jason@gcc.gnu.org>: https://gcc.gnu.org/g:d2b9548f38c77edc29ab0e24e516f1fb341ecea7 commit r10-6387-gd2b9548f38c77edc29ab0e24e516f1fb341ecea7 Author: Jason Merrill <jason@redhat.com> Date: Thu Jan 30 18:49:29 2020 -0500 c++: Reduce memory consumption for large static arrays. PR14179 and the C counterpart PR12245 are about memory consumption of very large file-scope arrays. Recently, location wrappers increased memory consumption significantly: in an array of integer constants, each one will have a location wrapper, which added up to over 500MB in the 14179 testcase. For this kind of testcase tracking these locations isn't worth the cost, so this patch turns the wrappers off after 256 elements; any array that size or larger isn't likely to be interested in the location of individual integer constants. PR c++/14179 * parser.c (cp_parser_initializer_list): Suppress location wrappers after 256 elements.
The master branch has been updated by Jason Merrill <jason@gcc.gnu.org>: https://gcc.gnu.org/g:e98ebda074bf8fc5f630a93085af81f52437d851 commit r10-6388-ge98ebda074bf8fc5f630a93085af81f52437d851 Author: Jason Merrill <jason@redhat.com> Date: Fri Jan 31 00:21:44 2020 -0500 c++: Reduce memory consumption for arrays of non-aggregate type. The remaining low-hanging fruit for improvement on memory consumption in the 14179 testcase was the duplication of the CONSTRUCTOR for the array by reshape_init. This patch changes reshape_init to reuse a single constructor for an array of non-aggregate type such as the one in the testcase. PR c++/14179 * decl.c (reshape_init_array_1): Reuse a single CONSTRUCTOR with non-aggregate elements. (reshape_init_array): Add first_initializer_p parm. (reshape_init_r): Change first_initializer_p from bool to tree. (reshape_init): Pass init to it.
Those two patches remove 640MB from both the Garbage and Freed totals in -fmem-report. At this point I don't think there's anything obvious to do to reduce memory consumption for this testcase and we can consider the regression fixed. It might still make sense to leave the bug open at P4 so that I periodically check that we haven't made things worse.
GCC 8.4.0 has been released, adjusting target milestone.
GCC 8 branch is being closed.
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
GCC 9 branch is being closed
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
GCC 10 branch is being closed.
GCC 11 branch is being closed.