Bug 81443 - [8 regression] build/genrecog.o: virtual memory exhausted: Cannot allocate memory
Summary: [8 regression] build/genrecog.o: virtual memory exhausted: Cannot allocate me...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 7.2.0
: P2 normal
Target Milestone: 7.3
Assignee: Eric Botcazou
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-14 11:30 UTC by Joshua Kinard
Modified: 2018-02-16 08:22 UTC (History)
4 users (show)

See Also:
Host: mips64-unknown-linux-gnu
Target: mips64-unknown-linux-gnu
Build: mips64-unknown-linux-gnu
Known to work: 6.4.0, 7.3.0
Known to fail: 7.1.0, 7.2.0, 8.0
Last reconfirmed: 2017-07-17 00:00:00


Attachments
genrecog.ii temp data from failing build command (169.73 KB, application/gzip)
2018-01-18 06:28 UTC, Joshua Kinard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Joshua Kinard 2017-07-14 11:30:41 UTC
I am attempting to compile gcc-7.1.0 on a 64-bit MIPS platform, targeting the MIPS-III ISA, glibc-userland, N32 ABI, big-endian architecture, and an unable to complete the compile due to exhausting all available virtual memory.  The machine in question has 2GB of physical RAM installed and ~7GB of swap space (3GB in partitions, 4GB as a temp swap file).  I at first thought it was the parallelization (-j3 to make), since the machine is dual CPU, but it also fails as -j2 and -j1.  I watched the swap usage with 'watch -n2 swapon --show', and when the compile terminates, swap is barely at 50%, so I don't think it's related to lack of available swap space.

I was able to successfully compile gcc-7.1 on the same machine, different chroot, under the O32 ABI for glibc-2.24, and O32 ABI for uclibc-ng-1.0.25 in another chroot.  So I suspect this is a regression for the N32 ABI case.

I am not sure what files or data will of use to run this one down.  I am preserving the last builddir for now, so if specific files are needed for analysis, let me know.

# /var/tmp/portage/sys-devel/gcc-7.1.0-r1/work/build/./prev-gcc/xg++ -v
Using built-in specs.
COLLECT_GCC=/var/tmp/portage/sys-devel/gcc-7.1.0-r1/work/build/./prev-gcc/xg++
Target: mips64-unknown-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-7.1.0-r1/work/gcc-7.1.0/configure --host=mips64-unknown-linux-gnu --build=mips64-unknown-linux-gnu --prefix=/usr --bindir=/usr/mips64-unknown-linux-gnu/gcc-bin/7.1.0 --includedir=/usr/lib/gcc/mips64-unknown-linux-gnu/7.1.0/include --datadir=/usr/share/gcc-data/mips64-unknown-linux-gnu/7.1.0 --mandir=/usr/share/gcc-data/mips64-unknown-linux-gnu/7.1.0/man --infodir=/usr/share/gcc-data/mips64-unknown-linux-gnu/7.1.0/info --with-gxx-include-dir=/usr/lib/gcc/mips64-unknown-linux-gnu/7.1.0/include/g++-v7 --with-python-dir=/share/gcc-data/mips64-unknown-linux-gnu/7.1.0/python --enable-languages=c,c++ --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 7.1.0-r1 p1.1' --disable-esp --enable-libstdcxx-time --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --disable-multilib --disable-altivec --disable-fixed-point --with-abi=n32 --disable-libgcj --disable-libgomp --disable-libmudflap --disable-libssp --disable-libcilkrts --disable-libmpx --disable-vtable-verify --disable-libvtv --disable-libquadmath --enable-lto --without-isl --disable-libsanitizer --disable-default-pie --enable-default-ssp
Thread model: posix
gcc version 7.1.0 (Gentoo 7.1.0-r1 p1.1)
Comment 1 Richard Biener 2017-07-17 11:28:14 UTC
What file is it compiling?
Comment 2 Joshua Kinard 2017-07-17 12:37:28 UTC
(In reply to Richard Biener from comment #1)
> What file is it compiling?

As far as I can tell, it looks somewhat random.  I initially thought that 'build/genrecog.o' was a single file, but after several re-runs, I noticed there's several files getting compiled that must get merged into genrecog.o.  I was running a parallel build, too, though (only -j3, 2x SMP machine), so I could have been seeing the other parallel thread(s) running.  I've restarted the builod where it stopped using -j1.  It takes about an hour to get back to where it will stop.  I'll report what that file is, and then launch it again and see if it stays consistent.
Comment 3 Joshua Kinard 2017-07-17 14:09:41 UTC
It's just build/genrecog.c.  I had a stale build environment file that was still sending "-j3" to 'make'.  I fixed that and restarted from where it last left off, and it gets to genrecog.c and spent about ~20 minutes doing something to that file before exhausting all virtual memory (so it claims).

Here's the last 'ps' output I got right before it stopped:
# ps uw 7054 | grep [g]enrecog
portage   7054 99.5 47.0 2056256 979712 pts/1  R+   09:36  20:57 /var/tmp/portage/sys-devel/gcc-7.1.0-r1/work/build/./prev-gcc/cc1plus -quiet -nostdinc++ -I . -I build -I /var/tmp/portage/sys-devel/gcc-7.1.0-r1/work/gcc-7.1.0/gcc -I /var/tmp/portage/sys-devel/gcc-7.1.0-r1/work/gcc-7.1.0/gcc/build -I /var/tmp/portage/sys-devel/gcc-7.1.0-r1/work/gcc-7.1.0/gcc/../include -I /var/tmp/portage/sys-devel/gcc-7.1.0-r1/work/gcc-7.1.0/gcc/../libcpp/include -iprefix /var/tmp/portage/sys-devel/gcc-7.1.0-r1/work/build/prev-gcc/../../../lib/gcc/mips64-unknown-linux-gnu/7.1.0/ -isystem /var/tmp/portage/sys-devel/gcc-7.1.0-r1/work/build/./prev-gcc/include -isystem /var/tmp/portage/sys-devel/gcc-7.1.0-r1/work/build/./prev-gcc/include-fixed -D_GNU_SOURCE -D IN_GCC -D HAVE_CONFIG_H -D GENERATOR_FILE -isystem /var/tmp/portage/sys-devel/gcc-7.1.0-r1/work/build/prev-mips64-unknown-linux-gnu/libstdc++-v3/include/mips64-unknown-linux-gnu -isystem /var/tmp/portage/sys-devel/gcc-7.1.0-r1/work/build/prev-mips64-unknown-linux-gnu/libstdc++-v3/include -isystem /var/tmp/portage/sys-devel/gcc-7.1.0-r1/work/gcc-7.1.0/libstdc++-v3/libsupc++ /var/tmp/portage/sys-devel/gcc-7.1.0-r1/work/gcc-7.1.0/gcc/genrecog.c -meb -quiet -dumpbase genrecog.c -march=mips3 -mtune=mips3 -mplt -mabi=n32 -mllsc -mips3 -mno-shared -auxbase-strip build/genrecog.o -gtoggle -O2 -Wextra -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wsuggest-attribute=format -Woverloaded-virtual -Wpedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -fno-PIE -o -

This particular MIPS machine, an old Silicon Graphics platform, has memory issues in the kernel (limited to 2GB of main memory because of what I think is a hardware quirk I haven't figured out how to work around yet), but this is the first time I've ever seen it run out of virtual memory compiling something in the last 10 years.  It just recently completed gcc-6.3.0 across multiple ISAs and ABIs about a week ago (it's a build machine), and also did gcc-7.1.0 in O32 under glibc and uclibc-ng.  So this is why I suspect there is something wrong with the MIPS-III/N32 case.  I have not tried MIPS-IV yet (the machine only supports the older four MIPS ISAs).  I do not have a bootstrapped N32 userland under a different libc other than glibc.  I also lack a pure N64 userland to test with as well.

The CPU is ~600MHz with 2MB of L2 cache, so I find that it spending 20+ minutes on one C file to be rather abnormal, and I think there's a runaway....something....going on (linked list allocation or such?).

If there's something else I can grab with gdb or forcing a coredump, let me know what commands to run.
Comment 4 Joshua Kinard 2018-01-16 05:07:50 UTC
After a week of bisecting, it looks like PR59461 is what causes this regression.  Indeed, looking at the comments on #59461, Matthew Fortune thought that N64 could have been broken, and I am assuming that was fixed in PR78660?  This issue I am seeing under N32 might be something different.

Here is the git bisect history I followed:

start:
bad da8dff89fa93 (HEAD)
good 45dd06cef49f (gcc-6_4_0-release)

 1. a050099a416f (good)
 2. 873a9b6435c7 (bad)
 3. eedf6f96c360 (good)
 4. 8139561f6fe6 (bad)
 5. c02417adbaf1 (good)
 6. 63c8aefc8cbf (bad)
 7. 48baf518aeb5 (skip)
 7. 36bb9d71a876 (good)
 8. 44618e466be5 (bad)
 9. 682d2b7ee96c (bad)
10. 4699a580bd1f (bad)
11. 15bd70ad1a73 (good)
12. 9dbb7881f36e (bad)
13. 454decdf75fc (bad)
14. 1998c023a3ed (bad)

Which then yields:

1998c023a3ed6c59d8f1eea3a34528a9d6a93fe1 is the first bad commit
commit 1998c023a3ed6c59d8f1eea3a34528a9d6a93fe1
Author: ebotcazou <ebotcazou@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Fri Nov 11 22:38:33 2016 +0000

            PR rtl-optimization/59461
            * doc/rtl.texi (paradoxical subregs): Add missing word.
            * combine.c (reg_nonzero_bits_for_combine): Do not discard results
            in modes with precision larger than that of last_set_mode.
            * rtlanal.c (nonzero_bits1) <SUBREG>: If WORD_REGISTER_OPERATIONS is
            set and LOAD_EXTEND_OP is appropriate, propagate results from inner
            REGs to paradoxical SUBREGs.
            (num_sign_bit_copies1) <SUBREG>: Likewise.  Check that the mode is not
            larger than a word before invoking LOAD_EXTEND_OP on it.


    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@242326 138bc75d-0d04-0410-961f-82ee72b054a4

:040000 040000 dd706ac97469731dec045095572859552211457b 59c5b8ea5e019c4e25f246a1295b244ba2a50c9a M      gcc

--------

Note that part of the way through the bisect (~step 8), I had to manually apply the patch from PR78338 to fix a build error.  The first attempt at step 7 also had to be skipped one time due to another build error that I couldn't find much on.

I'll try reverting the patch for PR59461 on gcc-7 HEAD and see if that actually completes or not and report back.

Also confirmed this happens on both MIPS-III and MIPS-IV (r12000) ISA.
Comment 5 Joshua Kinard 2018-01-16 05:09:35 UTC
After a week of bisecting, it looks like PR59461 is what causes this regression.  Indeed, looking at the comments on #59461, Matthew Fortune thought that N64 could have been broken, and I am assuming that was fixed in PR78660?  This issue I am seeing under N32 might be something different.

Here is the git bisect history I followed:

start:
bad da8dff89fa93 (HEAD)
good 45dd06cef49f (gcc-6_4_0-release)

 1. a050099a416f (good)
 2. 873a9b6435c7 (bad)
 3. eedf6f96c360 (good)
 4. 8139561f6fe6 (bad)
 5. c02417adbaf1 (good)
 6. 63c8aefc8cbf (bad)
 7. 48baf518aeb5 (skip)
 7. 36bb9d71a876 (good)
 8. 44618e466be5 (bad)
 9. 682d2b7ee96c (bad)
10. 4699a580bd1f (bad)
11. 15bd70ad1a73 (good)
12. 9dbb7881f36e (bad)
13. 454decdf75fc (bad)
14. 1998c023a3ed (bad)

Which then yields:

1998c023a3ed6c59d8f1eea3a34528a9d6a93fe1 is the first bad commit
commit 1998c023a3ed6c59d8f1eea3a34528a9d6a93fe1
Author: ebotcazou <ebotcazou@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Fri Nov 11 22:38:33 2016 +0000

            PR rtl-optimization/59461
            * doc/rtl.texi (paradoxical subregs): Add missing word.
            * combine.c (reg_nonzero_bits_for_combine): Do not discard results
            in modes with precision larger than that of last_set_mode.
            * rtlanal.c (nonzero_bits1) <SUBREG>: If WORD_REGISTER_OPERATIONS is
            set and LOAD_EXTEND_OP is appropriate, propagate results from inner
            REGs to paradoxical SUBREGs.
            (num_sign_bit_copies1) <SUBREG>: Likewise.  Check that the mode is not
            larger than a word before invoking LOAD_EXTEND_OP on it.


    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@242326 138bc75d-0d04-0410-961f-82ee72b054a4

:040000 040000 dd706ac97469731dec045095572859552211457b 59c5b8ea5e019c4e25f246a1295b244ba2a50c9a M      gcc

--------

Note that part of the way through the bisect (~step 8), I had to manually apply the patch from PR78338 to fix a build error.  The first attempt at step 7 also had to be skipped one time due to another build error that I couldn't find much on.

I'll try reverting the patch for PR59461 on gcc-7 HEAD and see if that actually completes or not and report back.

Also confirmed this happens on both MIPS-III and MIPS-IV (r12000) ISA.
Comment 6 Eric Botcazou 2018-01-17 08:30:07 UTC
Please retry with the current 7 branch, it contains additional fixes.
Comment 7 Joshua Kinard 2018-01-17 19:11:50 UTC
(In reply to Eric Botcazou from comment #6)
> Please retry with the current 7 branch, it contains additional fixes.

I did about a week ago.  Tried building gcc-7-branch HEAD with both gcc-6.4.0 and gcc-5.3.0, and both failed in the same spot.  If something was added in the last week, I can re-test.  I also tried gcc-HEAD with gcc-6.4.0 and that also failed.

FWIW, reversing PR59461 (some manual edits required) on gcc-7_1_0-release compiles cleanly, which is the first time that's happened on this machine under N32.  Successful builds on this machine take about 8 hours.  Fail builds....~3-4 hours.  It's during stage2-bubble when compiling genrecog.c will bail out, claiming no more virtual memory (2GB RAM + 3GB of swap).  So PR59461 clearly has some kind of issue on N32, with -march=mips3 or -march=r12000 (build machine is an SGI Octane w/ an R14000 CPU, so fairly old school).

The only other working MIPS machine I have is an SGI O2 with a rare RM7000 CPU.  But gcc on that thing takes about 2-3 days for a successful build.  However, it is running O32 ABI under uclibc-ng, so odds are likely gcc-7.x would compile successfully.
Comment 8 Eric Botcazou 2018-01-17 21:54:17 UTC
> FWIW, reversing PR59461 (some manual edits required) on gcc-7_1_0-release
> compiles cleanly, which is the first time that's happened on this machine
> under N32.  Successful builds on this machine take about 8 hours.  Fail
> builds....~3-4 hours.  It's during stage2-bubble when compiling genrecog.c
> will bail out, claiming no more virtual memory (2GB RAM + 3GB of swap).

Can you invoke the problematic command manually and add -save-temps to it?  This will give you a .i file, then gzip it and attach it to the PR.
Comment 9 Joshua Kinard 2018-01-18 06:25:58 UTC
(In reply to Eric Botcazou from comment #8)
> > FWIW, reversing PR59461 (some manual edits required) on gcc-7_1_0-release
> > compiles cleanly, which is the first time that's happened on this machine
> > under N32.  Successful builds on this machine take about 8 hours.  Fail
> > builds....~3-4 hours.  It's during stage2-bubble when compiling genrecog.c
> > will bail out, claiming no more virtual memory (2GB RAM + 3GB of swap).
> 
> Can you invoke the problematic command manually and add -save-temps to it? 
> This will give you a .i file, then gzip it and attach it to the PR.

Yup, I'll attach that in a moment.  I also have the 'genrecog.s' file, if needed.  I'll also add that it takes the command about 20-25mins to fail, which is very abnormal.  This machine might be old, but the CPUs are 600MHz, and they can still chew through some of the largest C/C++ source files in under a minute in most cases.  So it seems like something in the stage2-bubble xgcc/xg++ gets stuck in a loop and consumes additional memory with each iteration until it hits some kind of boundary and bails.
Comment 10 Joshua Kinard 2018-01-18 06:28:05 UTC
Created attachment 43166 [details]
genrecog.ii temp data from failing build command
Comment 11 Joshua Kinard 2018-01-18 06:29:44 UTC
(In reply to Joshua Kinard from comment #10)
> Created attachment 43166 [details]
> genrecog.ii temp data from failing build command

Forgot to add, this is generated from a checkout of commit id 1998c023a3ed, which was the last "bad" build, per git bisect.
Comment 12 Eric Botcazou 2018-01-18 16:06:56 UTC
I can reproduce with a cross-compiler on x86-64/Linux.
Comment 13 Eric Botcazou 2018-01-18 16:17:01 UTC
It's an unbounded recursion during combining between cached_num_sign_bit_copies and num_sign_bit_copies1.
Comment 14 Eric Botcazou 2018-01-22 09:57:39 UTC
It's rather a combinatorial explosion than an unbounded recursion.
Comment 15 Eric Botcazou 2018-01-23 12:15:45 UTC
Recategorizing.
Comment 16 Eric Botcazou 2018-01-23 12:21:54 UTC
Richard, would you be OK to apply a stopgap fix for the 7.3 release?

Index: rtlanal.c
===================================================================
--- rtlanal.c	(revision 256841)
+++ rtlanal.c	(working copy)
@@ -4976,7 +4976,7 @@ num_sign_bit_copies1 (const_rtx x, machi
       if (WORD_REGISTER_OPERATIONS
 	  && load_extend_op (inner_mode) == SIGN_EXTEND
 	  && paradoxical_subreg_p (x)
-	  && (MEM_P (SUBREG_REG (x)) || REG_P (SUBREG_REG (x))))
+	  && MEM_P (SUBREG_REG (x)))
 	return cached_num_sign_bit_copies (SUBREG_REG (x), mode,
 					   known_x, known_mode, known_ret);
       break;

It's a partial reversion of my change for PR rtl-optimization/59461, hence only a small pessimization for 64-bit WORD_REGISTER_OPERATIONS SIGN_EXTEND targets.
Comment 17 rguenther@suse.de 2018-01-23 13:19:53 UTC
On Tue, 23 Jan 2018, ebotcazou at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81443
> 
> Eric Botcazou <ebotcazou at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |rguenth at gcc dot gnu.org
> 
> --- Comment #16 from Eric Botcazou <ebotcazou at gcc dot gnu.org> ---
> Richard, would you be OK to apply a stopgap fix for the 7.3 release?
> 
> Index: rtlanal.c
> ===================================================================
> --- rtlanal.c   (revision 256841)
> +++ rtlanal.c   (working copy)
> @@ -4976,7 +4976,7 @@ num_sign_bit_copies1 (const_rtx x, machi
>        if (WORD_REGISTER_OPERATIONS
>           && load_extend_op (inner_mode) == SIGN_EXTEND
>           && paradoxical_subreg_p (x)
> -         && (MEM_P (SUBREG_REG (x)) || REG_P (SUBREG_REG (x))))
> +         && MEM_P (SUBREG_REG (x)))
>         return cached_num_sign_bit_copies (SUBREG_REG (x), mode,
>                                            known_x, known_mode, known_ret);
>        break;
> 
> It's a partial reversion of my change for PR rtl-optimization/59461, hence only
> a small pessimization for 64-bit WORD_REGISTER_OPERATIONS SIGN_EXTEND targets.

Yes, that works for me.  It looks like after the above change this part
of the code is exactly the same as on the GCC 6 branch (after hookizing
LOAD_EXTEND_OP)
Comment 18 Eric Botcazou 2018-01-23 20:55:04 UTC
Author: ebotcazou
Date: Tue Jan 23 20:54:32 2018
New Revision: 256998

URL: https://gcc.gnu.org/viewcvs?rev=256998&root=gcc&view=rev
Log:
	PR rtl-optimization/81443
	* rtlanal.c (num_sign_bit_copies1) <SUBREG>: Do not propagate results
	from inner REGs to paradoxical SUBREGs.

Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/rtlanal.c
Comment 19 Eric Botcazou 2018-01-23 20:57:00 UTC
This will be fixed in the upcoming 7.3 release.
Comment 20 Richard Biener 2018-01-25 08:26:59 UTC
GCC 7.3 is being released, adjusting target milestone.
Comment 21 Wilco 2018-01-27 17:44:56 UTC
See also PR84071 which has bad codegen from r242326.
Comment 22 Richard Biener 2018-01-29 10:16:15 UTC
Fixed on the GCC 7 branch (but not trunk?).  Adjusting target milestone/known-to-work for the moment.
Comment 23 Eric Botcazou 2018-02-16 08:21:04 UTC
Author: ebotcazou
Date: Fri Feb 16 08:20:32 2018
New Revision: 257724

URL: https://gcc.gnu.org/viewcvs?rev=257724&root=gcc&view=rev
Log:
	PR rtl-optimization/81443
	* rtlanal.c (num_sign_bit_copies1) <SUBREG>: Do not propagate results
	from inner REGs to paradoxical SUBREGs.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/rtlanal.c
Comment 24 Eric Botcazou 2018-02-16 08:22:17 UTC
Fixed on mainline too.