Buildroot hangs on building package "bullet" when compiling with g++ and optimization >= -O2, it's noted instead that the problem doesn't show anymore using -O1 as a work-around. Nothing changes trying to build with/without debug symbols.
Here is build hang log:
And the same behaviour is observed building lmbench too.
Here is the log:
Sorry for the noise. This is true when compiling c files too. :-)
Can you read https://gcc.gnu.org/bugs/ and provide the needed information?
Created attachment 47769 [details]
.ii of file where gcc hangs on building
This is the .ii of file where gcc hangs on building.
Here is the specific command line which compiles .cpp file:
/home/giuliobenetti/br_reproduce/9a405ec6fabfa306c14a671a5e09359ac623c25b/output/host/bin/riscv32-linux-g++ --sysroot=/home/giuliobenetti/br_reproduce/9a405ec6fabfa306c14a671a5e09359ac623c25b/output/host/riscv32-buildroot-linux-gnu/sysroot -DBT_USE_EGL -DBulletCollision_EXPORTS -DNO_OPENGL3 -DUSE_GRAPHICAL_BENCHMARK -I/home/giuliobenetti/br_reproduce/9a405ec6fabfa306c14a671a5e09359ac623c25b/output/build/bullet-2.89/src -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -save-temps -DNDEBUG -fPIC -o CMakeFiles/BulletCollision.dir/NarrowPhaseCollision/btPolyhedralContactClipping.o -c /home/giuliobenetti/br_reproduce/9a405ec6fabfa306c14a671a5e09359ac623c25b/output/build/bullet-2.89/src/BulletCollision/NarrowPhaseCollision/btPolyhedralContactClipping.cpp
then it sits there forever.
And this is how riscv32 gcc has been configured:
Using built-in specs.
Configured with: ./configure --prefix=/opt/br-riscv32-glibc-2019.11 --sysconfdir=/opt/br-riscv32-glibc-2019.11/etc --enable-static --target=riscv32-buildroot-linux-gnu --with-sysroot=/opt/br-riscv32-glibc-2019.11/riscv32-buildroot-linux-gnu/sysroot --enable-__cxa_atexit --with-gnu-ld --disable-libssp --disable-multilib --disable-decimal-float --with-gmp=/opt/br-riscv32-glibc-2019.11 --with-mpc=/opt/br-riscv32-glibc-2019.11 --with-mpfr=/opt/br-riscv32-glibc-2019.11 --with-pkgversion='Buildroot 2019.11' --with-bugurl=http://bugs.buildroot.net/ --disable-libquadmath --enable-tls --enable-threads --without-isl --without-cloog --with-arch=rv32imafd --with-abi=ilp32d --enable-languages=c,c++ --with-build-time-tools=/opt/br-riscv32-glibc-2019.11/riscv32-buildroot-linux-gnu/bin --enable-shared --disable-libgomp
Thread model: posix
gcc version 8.3.0 (Buildroot 2019.11)
Can't reproduce, neither with 8.3.1 20200111 nor current trunk, it compiles pretty much instantly (cross-compiler from x86_64-linux to riscv32-linux).
Do you mind to use official Buildroot script to reproduce?
Here is the procedure:
# git clone git://git.busybox.net/buildroot
# wget https://git.busybox.net/buildroot-test/tree/utils/br-reproduce-build
- modify BASE_GIT=... with your buildroot path in br-reproduce-build then:
# chmod a+x br-reproduce-build
# ./br-reproduce-build 9a405ec6fabfa306c14a671a5e09359ac623c25b
and wait until it hangs, otherwise I see it difficult to reproduce it.
Is it ok for you?
I tried the buildroot instructions. It didn't work on an ubuntu 16.04 server machine. There is a 'python3 pip3 -q docwriter' command that hangs. I also discovered that the script isn't restartable. It runs -rf on the build directory and exits with an error. I did get it to work on my ubuntu 18.04 laptop. And it does hang, but it isn't the btPolyhedralContactClipping.cpp file that hangs for me, it is the btBoxBoxDetector.cpp file. I was able to reproduce this with a gcc-8.3.0 build using -O2 -fPIC -fstack-protector-strong options to compile the file. It does not reproduce using the top of the gcc-8-branch svn tree, suggesting that either it is already fixed, or it is maybe a memory corruption problem that is hard to reproduce.
Using gdb to attach to the gcc-8.3.0 compiler, I see that it is looping in lra, but I haven't tried to debug that yet.
#0 0x0000000000705e7b in bitmap_find_bit (bit=42321, bit@entry=330,
head=0x376ae88) at ../../gcc-8.3.0/gcc/bitmap.c:539
#1 bitmap_set_bit (head=0x376ae88, bit=bit@entry=42321)
#2 0x000000000099b95f in mark_regno_dead (regno=42321, mode=<optimized out>,
point=<optimized out>) at ../../gcc-8.3.0/gcc/lra-lives.c:362
#3 0x000000000099c9c4 in process_bb_lives (dead_insn_p=false,
curr_point=@0x7ffc9a90cccc: 181876, bb=<basic_block 0x7f8e439c50d0 (38)>)
#4 lra_create_live_ranges_1 (all_p=all_p@entry=true,
#5 0x000000000099e7c0 in lra_create_live_ranges (all_p=all_p@entry=true,
#6 0x0000000000982d0c in lra (f=<optimized out>)
#7 0x000000000093fa32 in do_reload () at ../../gcc-8.3.0/gcc/ira.c:5465
#8 (anonymous namespace)::pass_reload::execute (this=<optimized out>)
Created attachment 47774 [details]
testcase that reproduces for me
compile with -O2 -fPIC -fstack-protector-strong
I'm able to reproduce with the gcc-8-branch now. Maybe I made a mistake with my earlier build. Anyways, it looks like it is going wrong here in the reload dump
Creating newreg=1856, assigning class NO_REGS to save r1856
434: fa0:SF=call [`sqrtf'] argc:0
Add reg<-save after:
432: NOTE_INSN_BASIC_BLOCK 24
Add save<-reg after:
then later we appear to end up in a loop generating secondary reloads that need secondary reloads themselves, and so forth. The instruction above looks funny, trying to use a subreg to convert DFmode to SFmode. I don't think we should be generating that.
So it looks like a caller save problem. If I add -fno-caller-saves the compile finishes. It appears that we need a definition for HARD_REGNO_CALLER_SAVE_MODE because the default definition can't work here. The comment in sparc.h for HARD_REGNO_CALLER_SAVE_MODE looks relevant. The same definition may work for RISC-V. Looks like the MIPS port does it the same way too.
A bisection on mainline between the gcc-8 and gcc-9 releases shows that this testcase was fixed by a combine patch for PR87600 that stops combining hard regs with pseudos to reduce register pressure. The commentary refers to ira and lra problems. A combine patch won't be as safe as a RISC-V backend patch though.
I tried testing the riscv HARD_REGNO_CALLER_SAVE_MODE patch with buildroot but it turns out that it is downloading a pre-built compiler instead of building one. So dropping in the patch doesn't do anything. I will have to figure out what is going on there.
Trying the riscv patch with mainline on the testcase, I see that I get better rematerialization without the confusing subregs, and I also get smaller stack frames since we are saving SFmode now to the stack instead of DFmode now. Otherwise, I don't see any significant changes to the code.
I tried a make check with the riscv patch on mainline, and got an unexpected g++ testsuite failure, so I will have to look into that.
Created attachment 47794 [details]
untested patch to fix the problem
thanks for providing this patch, it fixes the problem.
I mark this bug as resolved by:
(In reply to Giulio Benetti from comment #15)
> I mark this bug as resolved by:
The patch has not been applied to the sources yet.
(In reply to Andrew Pinski from comment #16)
> (In reply to Giulio Benetti from comment #15)
> > I mark this bug as resolved by:
> > https://gcc.gnu.org/bugzilla/attachment.cgi?id=47794
> The patch has not been applied to the sources yet.
Oops, sorry, I'm not very used to bugzilla/gcc.
Thanks again, for providing that patch.
The master branch has been updated by Jim Wilson <firstname.lastname@example.org>:
Author: Jim Wilson <email@example.com>
Date: Sat Feb 8 13:57:36 2020 -0800
RISC-V: Improve caller-save code generation.
Avoid paradoxical subregs when caller save. This reduces stack frame size
due to smaller loads and stores, and more frequent rematerialization.
* config/riscv/riscv.h (HARD_REGNO_CALLER_SAVE_MODE): Define.
Patch applied to mainline. This is just a minor optimization for gcc-10 as a combiner patch between gcc-8 and gcc-9 reduces register pressure enough to prevent the hang. Hence there is no real need for the patch in gcc-9. The patch might be useful in gcc-8, but the problem is hard to reproduce, buildroot is the only one that ran into the problem, and they can always add the patch to their tree, so not clear if we really need it on the gcc-8 branch.
Thanks for confirming that it solves the buildroot build problem.
My gcc mainline g++ test failure turned out to be a thread related issue with qemu cross testing. The testcase works always on hardware, but fails maybe 10-20% of the time when run under qemu. RISC-V qemu is known to still have a few bugs in this area, though they might already be fixed in newer qemu versions than what I have.