Bug 56746 - [4.8 regression] increased memory usage when compiling C++
Summary: [4.8 regression] increased memory usage when compiling C++
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: 4.9.0
Assignee: Not yet assigned to anyone
URL:
Keywords: memory-hog
Depends on:
Blocks:
 
Reported: 2013-03-26 18:06 UTC by Mathias Gaunard
Modified: 2018-03-24 19:09 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.7.2
Known to fail: 4.8.0
Last reconfirmed: 2013-03-27 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mathias Gaunard 2013-03-26 18:06:53 UTC
On my C++ project I have observed significant increased memory usage between GCC 4.6/4.7 and 4.8.

Unfortunately I do not have a testcase, but compiling the test "core.utility.functions.whereij.unit" of my template-heavy project NT2 (https://github.com/MetaScale/nt2) gives the following results:

g++-4.7
  User time (seconds): 155.76
  System time (seconds): 4.62
  Percent of CPU this job got: 99%
  Elapsed (wall clock) time (h:mm:ss or m:ss): 2:40.58
  Average shared text size (kbytes): 0
  Average unshared data size (kbytes): 0
  Average stack size (kbytes): 0
  Average total size (kbytes): 0
  Maximum resident set size (kbytes): 2781396
  Average resident set size (kbytes): 0
  Major (requiring I/O) page faults: 121
  Minor (reclaiming a frame) page faults: 726987
  Voluntary context switches: 288
  Involuntary context switches: 547
  Swaps: 0
  File system inputs: 31104
  File system outputs: 15320
  Socket messages sent: 0
  Socket messages received: 0
  Signals delivered: 0
  Page size (bytes): 4096
  Exit status: 0

g++-4.8
  User time (seconds): 155.13
  System time (seconds): 6.50
  Percent of CPU this job got: 99%
  Elapsed (wall clock) time (h:mm:ss or m:ss): 2:41.68
  Average shared text size (kbytes): 0
  Average unshared data size (kbytes): 0
  Average stack size (kbytes): 0
  Average total size (kbytes): 0
  Maximum resident set size (kbytes): 3972292
  Average resident set size (kbytes): 0
  Major (requiring I/O) page faults: 0
  Minor (reclaiming a frame) page faults: 1048923
  Voluntary context switches: 11
  Involuntary context switches: 576
  Swaps: 0
  File system inputs: 0
  File system outputs: 12368
  Socket messages sent: 0
  Socket messages received: 0
  Signals delivered: 0
  Page size (bytes): 4096
  Exit status: 0

So it goes from 2.65GB to 3.79GB.
Details of the versions used below.

$ g++-4.7 -v
Using built-in specs.
COLLECT_GCC=g++-4.7
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.7/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.7.2-5' --with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs --enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.7 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.7.2 (Debian 4.7.2-5)

$ g++-4.8 -v
Using built-in specs.
COLLECT_GCC=g++-4.8
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.8.0-1' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs --enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --with-system-zlib --enable-objc-gc --enable-multiarch --with-arch-32=i586 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.8.0 (Debian 4.8.0-1)
Comment 1 Richard Biener 2013-03-27 09:50:41 UTC
Please provide preprocessed source of the file that shows this change in behavior.
Also provide the options you used for compiling.
Comment 2 Mathias Gaunard 2013-03-27 11:04:08 UTC
The preprocessed file is 7 megabytes, which exceeds what I can attach here.
I do not think it is practical to reduce it with automatic tools.
Would it be ok to provide it as-is?

The flags used are -fno-strict-aliasing -Wall -Wno-unused -Wno-delete-non-virtual-dtor  -mxop -fabi-version=4 -O2.-O2
Comment 3 Paolo Carlini 2013-03-27 11:09:28 UTC
You can reduce it at least somewhat and then compress it with bzip2.
Comment 4 Jonathan Wakely 2013-03-27 11:12:45 UTC
You should be able to attach it if you compress it.
Comment 5 Mathias Gaunard 2013-03-27 16:41:16 UTC
While trying to isolate the problem, I have observed that the problem does not occur if -save-temps is used.
While using -save-temps does not change anything with GCC 4.7, using it does reduce memory usage significantly with GCC 4.8.

Did something change with regards to the way temporary files are handled?
Comment 6 Richard Biener 2013-03-28 09:51:35 UTC
(In reply to comment #5)
> While trying to isolate the problem, I have observed that the problem does not
> occur if -save-temps is used.
> While using -save-temps does not change anything with GCC 4.7, using it does
> reduce memory usage significantly with GCC 4.8.
> 
> Did something change with regards to the way temporary files are handled?

No, but using pre-processed source results in less pressure on line-tables
as no macro recording is taking place.  You could try -ftrack-macro-expansion=0
(which is undocumented - bah - Dodji, please fix that, invoke.texi).

Unreduced preprocessed source is ok, you can also upload it somewhere accessible
if it's rejected here as attachment (compress it before attaching).
Comment 7 Mathias Gaunard 2013-03-28 10:39:53 UTC
Using either -save-temps or -ftrack-macro-expansion=0-ftrack-macro-expansion=0 removes the memory hog.
Compiling the preprocessed source does not cause increased memory usage.

So it seems the macro expansion tracking is what's causing a lot of extra memory usage here.
Comment 8 Jason Merrill 2013-03-29 18:43:55 UTC
(In reply to comment #7)
> So it seems the macro expansion tracking is what's causing a lot of extra
> memory usage here.

OK, that makes sense, as the compiler is keeping more information around in order to give better diagnostic context with macros.
Comment 9 Jakub Jelinek 2013-04-26 18:09:58 UTC
So, NOTABUG?
Comment 10 Jakub Jelinek 2013-05-31 10:59:02 UTC
GCC 4.8.1 has been released.
Comment 11 Mathias Gaunard 2013-06-13 15:48:52 UTC
4.8.1 is still affected by this.

I wouldn't say it's NOTABUG if a new diagnostic feature enabled by default increases memory consumption by 50%, even when no diagnostic is emitted.

I cannot easily give a test case; the problem being preprocessor-related, the issues disappear once preprocessed.

The code incriminated includes hundreds of files (if not more), split in about 20 different include directories. Those files contain templates instantiated hundreds of times each, and their body is generated by macros which may end up creating lines that are thousands of characters long.
Comment 12 Mathias Gaunard 2013-06-13 15:54:24 UTC
This may be considered a duplicate of #53525, though that bug is more focused on performance than memory usage.
Comment 13 Óscar Fuentes 2013-10-05 09:29:52 UTC
My case is similar to the one described by Mathias Gaunard, but with a difference of 3x memory usage when -ftrack-macro-expansion=0 is not added to the command line.

I use Boost Preprocessor plus a number of macros to define and instantiate lots of templates. That's the case that requires 3x more memory (low estimate) with some TUs requiring way more than 1GB to compile (on a 32 bit machine, which means that parallel builds usually ends with massive swapping and the compile jobs killed due to memory starvation.)

I have a version of the same code base that uses variadic templates instead of Boost Preprocessor, although the macros for instantiating the templates are still there. That requires about 1.5x more memory.
Comment 14 Jakub Jelinek 2013-10-16 09:50:37 UTC
GCC 4.8.2 has been released.
Comment 15 Richard Biener 2014-05-22 09:05:56 UTC
GCC 4.8.3 is being released, adjusting target milestone.
Comment 16 Jakub Jelinek 2014-12-19 13:31:06 UTC
GCC 4.8.4 has been released.
Comment 17 Richard Biener 2015-06-23 08:34:10 UTC
Assuming fixed in 4.9.0 (no testcase).