Hello, When building a rootfs for sparcv8 using the Buildroot defconfig qemu_sparc_ss10_defconfig, the system produce some illegal instruction messages. gcc 8.3, 9.2 are the latest working gcc version. git bisect between gcc 8.3 and 8.4 allowed to identify the commit that introduced the regression: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=0a83f1a441d7aaadecb368c237b6ee70bd7b91d6 The commit has been introduced to fix the following bub: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92095 It has been backported to gcc 8.4 and 9.3. Reverting this patch allowed to produce a working rootfs. This issue can be reproduced using the following steps: $ git://git.busybox.net/buildroot $ cd buildroot/ $ git checkout 2020.11.1 $ make qemu_sparc_ss10_defconfig $ make $ ./output/images/start-qemu.sh The kernel boot correctly but the login program (busybox) crash while trying to login: [...] Starting syslogd: Welcome to Buildroot buildroot login: root Welcome to Buildroot buildroot login: root For now, It's just a basic test that allow to reproduce the issue. We can use a shell instead of init program but even with a simple command such 'ls' the system crash: sh-5.0# ls CPU: 0 PID: 1 Comm: sh Not tainted 4.19.16 #1 [f0022fbc : do_exit+0x948/0x968 ] [f000afec : do_signal+0x5f8/0x79c ] [f000b4b4 : do_notify_resume+0x48/0x58 ] [f0008c88 : signal_p+0x14/0x24 ] [0004b874 : 0x4b874 ] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 Best regards, Romain
Thanks for reporting the problem, although it's rather elusive at this point. What is miscompiled exactly? The kernel or the userland? The change is supposed to make a difference only in PIC/PIE mode, why does this mode matter here?
Hello, The kernel and userland are built with the same toolchain, but this is the userpace program (such busybox) that crash. Busybox is built with the following flags: Toolchain wrapper executing: 'qemu_sparc_ss10_defconfig/host/bin/sparc-buildroot-linux-uclibc-gcc.br_real' '--sysroot' 'qemu_sparc_ss10_defconfig/host/sparc-buildroot-linux-uclibc/sysroot' '-Wp,-MD,libbb/.vfork_daemon_rexec.o.d' '-std=gnu99' '-Iinclude' '-Ilibbb' '-include' 'include/autoconf.h' '-D_GNU_SOURCE' '-DNDEBUG' '-D_LARGEFILE_SOURCE' '-D_LARGEFILE64_SOURCE' '-D_FILE_OFFSET_BITS=64' '-DBB_VER="1.33.0"' '-D_LARGEFILE_SOURCE' '-D_LARGEFILE64_SOURCE' '-D_FILE_OFFSET_BITS=64' '-Os' '-Wall' '-Wshadow' '-Wwrite-strings' '-Wundef' '-Wstrict-prototypes' '-Wunused' '-Wunused-parameter' '-Wunused-function' '-Wunused-value' '-Wmissing-prototypes' '-Wmissing-declarations' '-Wno-format-security' '-Wdeclaration-after-statement' '-Wold-style-definition' '-finline-limit=0' '-fno-builtin-strlen' '-fomit-frame-pointer' '-ffunction-sections' '-fdata-sections' '-fno-guess-branch-probability' '-funsigned-char' '-static-libgcc' '-falign-functions=1' '-falign-jumps=1' '-falign-labels=1' '-falign-loops=1' '-fno-unwind-tables' '-fno-asynchronous-unwind-tables' '-fno-builtin-printf' '-Os' '-DKBUILD_BASENAME="vfork_daemon_rexec"' '-DKBUILD_MODNAME="vfork_daemon_rexec"' '-c' '-o' 'libbb/vfork_daemon_rexec.o' 'libbb/vfork_daemon_rexec.c' So -fPIE is not used here but there is a side effect when the patch is applied. Note: This is an initial report, I don't have any clue about the real issue. Best regards, Romain
> So -fPIE is not used here but there is a side effect when the patch is applied. You need to look at the output of 'gcc -v' to be sure of that.
(In reply to Eric Botcazou from comment #3) > > So -fPIE is not used here but there is a side effect when the patch is applied. > > You need to look at the output of 'gcc -v' to be sure of that. output/host/bin/sparc-buildroot-linux-uclibc-gcc -v Using built-in specs. COLLECT_GCC=/output/host/bin/sparc-buildroot-linux-uclibc-gcc.br_real COLLECT_LTO_WRAPPER=/output/host/libexec/gcc/sparc-buildroot-linux-uclibc/10.2.0/lto-wrapper Target: sparc-buildroot-linux-uclibc Configured with: ./configure --prefix=/output/host --sysconfdir=/output/host/etc --enable-static -q --target=sparc-buildroot-linux-uclibc --with-sysroot=/output/host/sparc-buildroot-linux-uclibc/sysroot --enable-__cxa_atexit --with-gnu-ld --disable-libssp --disable-multilib --disable-decimal-float --with-gmp=output/host --with-mpc=output/host --with-mpfr=output/host --with-pkgversion='Buildroot 2020.11-999-g57d61a3986' --with-bugurl=http://bugs.buildroot.net/ --without-zstd --disable-libitm --disable-libquadmath --disable-libquadmath-support --disable-libsanitizer --disable-libsanitizer --enable-tls --enable-threads --without-isl --without-cloog --with-cpu=v8 --enable-languages=c,c++ --with-build-time-tools=/output/host/sparc-buildroot-linux-uclibc/bin --enable-shared --disable-libgomp --silent Thread model: posix Supported LTO compression algorithms: zlib gcc version 10.2.0 (Buildroot 2020.11-999-g57d61a3986) I can't give more debug info since all program segfault, even gdb or gdbserver.
> Thread model: posix > Supported LTO compression algorithms: zlib > gcc version 10.2.0 (Buildroot 2020.11-999-g57d61a3986) > > I can't give more debug info since all program segfault, even gdb or > gdbserver. Thanks, definitively puzzling. I don't see how the patch can change anything in a compilation not involving PIC/PIE or TLS. If it does, then something really weird might be going on but, at the same time, nobody has been testing --with-cpu=v8 for a decade or two I think so this is plausible. I'm going to give it a try.
Hello, Thanks for the help, The previous gcc command line was from the busybox build (without -fPIC) but this is not busybox that crash... this is the libc. See how the libc (uClibc) was built: output/host/bin/sparc-buildroot-linux-uclibc-gcc -c libc/stdlib/atoll.c -o libc/stdlib/atoll.os -Wall -Wstrict-prototypes -Wstrict-aliasing -Wno-nonnull-compare -funsigned-char -fno-builtin -fcommon -fno-asm -fmerge-all-constants -std=gnu99 -mcpu=v8 -fno-stack-protector -nostdinc -I./include -I./include -include libc-symbols.h -I./libc/sysdeps/linux/sparc -I./libc/sysdeps/linux -I./ldso/ldso/sparc -I./ldso/include -I. -Os -fstrict-aliasing -D__USE_STDIO_FUTEXES__ -DHAVE_FORCED_UNWIND -D_LIBC_REENTRANT -I./libpthread/nptl -I./libpthread/nptl -I./libpthread/nptl/sysdeps/unix/sysv/linux/sparc -I./libpthread/nptl/sysdeps/sparc -I./libpthread/nptl/sysdeps/sparc -I./libpthread/nptl/sysdeps/unix/sysv/linux -I./libpthread/nptl/sysdeps/unix/sysv/linux -I./libpthread/nptl/sysdeps/pthread -I./libpthread/nptl/sysdeps/pthread/bits -I./libpthread/nptl/sysdeps/generic -I./libc/sysdeps/linux/common -isystem output/host/lib/gcc/sparc-buildroot-linux-uclibc/10.2.0/include-fixed -isystem output/host/lib/gcc/sparc-buildroot-linux-uclibc/10.2.0/include -Ioutput/build/linux-headers-5.4.88/usr/include/ -DNDEBUG -DIN_LIB=libc -fPIC -MT libc/stdlib/atoll.os -MD -MP -MF libc/stdlib/.atoll.os.dep Indeed we have "-fPIC" The system boot correctly if I replace the libc library with a working one. I'm not familiar with gcc internals but I tried to remove "!optimize" from the if clauses [1] : "if (!flag_pic || !crtl->uses_pic_offset_table)" It seems to work (ok probably not the correct fix). Is the issue related to the optimization level (Os vs O1) ? [1] https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/sparc/sparc.c;h=aefced85fe142885b1b31fa878a0ff0dfd4e921a;hb=ee5c3db6c5b2c3332912fb4c9cfa2864569ebd9a#l13097
> The previous gcc command line was from the busybox build (without -fPIC) but > this is not busybox that crash... this is the libc. > > See how the libc (uClibc) was built: > > output/host/bin/sparc-buildroot-linux-uclibc-gcc -c libc/stdlib/atoll.c -o > libc/stdlib/atoll.os -Wall -Wstrict-prototypes -Wstrict-aliasing > -Wno-nonnull-compare -funsigned-char -fno-builtin -fcommon -fno-asm > -fmerge-all-constants -std=gnu99 -mcpu=v8 -fno-stack-protector -nostdinc > -I./include -I./include -include libc-symbols.h -I./libc/sysdeps/linux/sparc > -I./libc/sysdeps/linux -I./ldso/ldso/sparc -I./ldso/include -I. -Os > -fstrict-aliasing -D__USE_STDIO_FUTEXES__ -DHAVE_FORCED_UNWIND > -D_LIBC_REENTRANT -I./libpthread/nptl -I./libpthread/nptl > -I./libpthread/nptl/sysdeps/unix/sysv/linux/sparc > -I./libpthread/nptl/sysdeps/sparc -I./libpthread/nptl/sysdeps/sparc > -I./libpthread/nptl/sysdeps/unix/sysv/linux > -I./libpthread/nptl/sysdeps/unix/sysv/linux > -I./libpthread/nptl/sysdeps/pthread -I./libpthread/nptl/sysdeps/pthread/bits > -I./libpthread/nptl/sysdeps/generic -I./libc/sysdeps/linux/common -isystem > output/host/lib/gcc/sparc-buildroot-linux-uclibc/10.2.0/include-fixed > -isystem output/host/lib/gcc/sparc-buildroot-linux-uclibc/10.2.0/include > -Ioutput/build/linux-headers-5.4.88/usr/include/ -DNDEBUG -DIN_LIB=libc > -fPIC -MT libc/stdlib/atoll.os -MD -MP -MF libc/stdlib/.atoll.os.dep > > Indeed we have "-fPIC" OK, this makes sense now and this looks like a bootstrap problem, e.g. the code setting up _GLOBAL_OFFSET_TABLE_ in the libc might be trying to access it or something along this line. Can you find out which module of uClibc sets up _GLOBAL_OFFSET_TABLE_ and confirm that it is compiled with -fPIC as well? If so, would it be possible *not* to compile with -fPIC?
> OK, this makes sense now and this looks like a bootstrap problem, e.g. the > code setting up _GLOBAL_OFFSET_TABLE_ in the libc might be trying to access > it or something along this line. I misremembered: the code loading the GOT register is eliminated if not used in the end, but it can block the leaf register optimization, i.e. a register window is allocated although it is not needed. So does uClibc depend on the fact that a register window is not allocated in some specific spot?
(In reply to Eric Botcazou from comment #7) > > The previous gcc command line was from the busybox build (without -fPIC) but > > this is not busybox that crash... this is the libc. > > > > See how the libc (uClibc) was built: > > > > output/host/bin/sparc-buildroot-linux-uclibc-gcc -c libc/stdlib/atoll.c -o > > libc/stdlib/atoll.os -Wall -Wstrict-prototypes -Wstrict-aliasing > > -Wno-nonnull-compare -funsigned-char -fno-builtin -fcommon -fno-asm > > -fmerge-all-constants -std=gnu99 -mcpu=v8 -fno-stack-protector -nostdinc > > -I./include -I./include -include libc-symbols.h -I./libc/sysdeps/linux/sparc > > -I./libc/sysdeps/linux -I./ldso/ldso/sparc -I./ldso/include -I. -Os > > -fstrict-aliasing -D__USE_STDIO_FUTEXES__ -DHAVE_FORCED_UNWIND > > -D_LIBC_REENTRANT -I./libpthread/nptl -I./libpthread/nptl > > -I./libpthread/nptl/sysdeps/unix/sysv/linux/sparc > > -I./libpthread/nptl/sysdeps/sparc -I./libpthread/nptl/sysdeps/sparc > > -I./libpthread/nptl/sysdeps/unix/sysv/linux > > -I./libpthread/nptl/sysdeps/unix/sysv/linux > > -I./libpthread/nptl/sysdeps/pthread -I./libpthread/nptl/sysdeps/pthread/bits > > -I./libpthread/nptl/sysdeps/generic -I./libc/sysdeps/linux/common -isystem > > output/host/lib/gcc/sparc-buildroot-linux-uclibc/10.2.0/include-fixed > > -isystem output/host/lib/gcc/sparc-buildroot-linux-uclibc/10.2.0/include > > -Ioutput/build/linux-headers-5.4.88/usr/include/ -DNDEBUG -DIN_LIB=libc > > -fPIC -MT libc/stdlib/atoll.os -MD -MP -MF libc/stdlib/.atoll.os.dep > > > > Indeed we have "-fPIC" > > OK, this makes sense now and this looks like a bootstrap problem, e.g. the > code setting up _GLOBAL_OFFSET_TABLE_ in the libc might be trying to access > it or something along this line. > > Can you find out which module of uClibc sets up _GLOBAL_OFFSET_TABLE_ and > confirm that it is compiled with -fPIC as well? If so, would it be possible > *not* to compile with -fPIC? There is an option [1] to build all of uClibc as PIC objects but some other part are build unconditionally as PIC objects. This option us always set in the Buildroot's uClibc configuration. By disabling this option doesn't make any diffrence. Removing -fPIC in Makefiles produce a non working libc. [1] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/extra/Configs/Config.in?id=ab1dd83bec59c9e65c31efd6e887182948f627be#n296
(In reply to Eric Botcazou from comment #8) > > OK, this makes sense now and this looks like a bootstrap problem, e.g. the > > code setting up _GLOBAL_OFFSET_TABLE_ in the libc might be trying to access > > it or something along this line. > > I misremembered: the code loading the GOT register is eliminated if not used > in the end, but it can block the leaf register optimization, i.e. a register > window is allocated although it is not needed. So does uClibc depend on the > fact that a register window is not allocated in some specific spot? Since some part of uClibc code come from glibc, I'm trying to compare with glibc 2.30... but there are some differences. For example there is no SETUP_PIC_REG_LEAF definition for sparc32 in uClubc: (SETUP_PIC_REG_LEAF use internally _GLOBAL_OFFSET_TABLE_) https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/libc/sysdeps/linux/sparc/sysdep.h?id=ab1dd83bec59c9e65c31efd6e887182948f627be https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/sparc/sysdep.h;h=31a8addebcbeec2f60ece377677bc2be137b3664;hb=d811d240c06a8191db88ad4f1e60e1b672e4cc66 The uClibc code doesn't seems up-to-date with the glibc version... But I can't try to reproduce the issue with glibc since the support for sparc has been removed from Buildroot since a long time and from glibc for sparcv8 since 2.31: https://lwn.net/Articles/811275/ resync the sparc port for uclibc with glibc requires a lot of work. Best regards, Romain
GCC 8 branch is being closed.
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
Created attachment 51073 [details] working uClibc build This is a working libc build, that allows to boot Linux, run init, daemons and get to usual shell prompt.
Created attachment 51074 [details] non-working uClibc-ng build This is a build of uClibc-ng that does not allow normal Linux booting. You can boot to a shell by doing init=/bin/sh but any command you type (like ls) would run the command and then crash the system because init (in fact the shell, the parent of the forked command) gets killed. This might be related to issue in signal handling since the parent is supposed to receive a SIGCHILD signal when its child exits. Difference with previous attachment is just that the follow buildroot patch is not applied to gcc: https://git.buildroot.net/buildroot/commit/?id=4d16e6f5324f0285f51bfbb5a3503584f3b3ad12
GCC 9 branch is being closed
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
GCC 10 branch is being closed.
Hi, still happens with gcc 13.2.0. You can boot a shell and then in strace you see a segfault error: [pid 28] fstat64(3, {st_mode=S_IFDIR|S_ISVTX|0777, st_size=400, ...}) = 0 [pid 28] brk(0x154000) = 0x154000 [pid 28] getdents64(3, 0xefb11b80 /* 20 entries */, 4096) = 496 [pid 28] brk(0x155000) = 0x155000 [pid 28] lstat64("./init", {st_mode=S_IFLNK|0777, st_size=10, ...}) = 0 [pid 28] lstat64("./var", {st_mode=S_IFDIR|0755, st_size=40, ...}) = 0 [pid 28] lstat64("./usr", {st_mode=S_IFDIR|0755, st_size=120, ...}) = 0 [pid 28] lstat64("./tmp", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=40, ...}) = 0 [pid 28] lstat64("./sys", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 [pid 28] lstat64("./sbin", {st_mode=S_IFDIR|0755, st_size=1420, ...}) = 0 [pid 28] lstat64("./run", {st_mode=S_IFDIR|0777, st_size=40, ...}) = 0 [pid 28] lstat64("./root", {st_mode=S_IFDIR|0755, st_size=60, ...}) = 0 [pid 28] lstat64("./proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 [pid 28] lstat64("./mnt", {st_mode=S_IFDIR|0755, st_size=40, ...}) = 0 [pid 28] lstat64("./media", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=40, ...}) = 0 [pid 28] lstat64("./linuxrc", {st_mode=S_IFLNK|0777, st_size=11, ...}) = 0 [pid 28] lstat64("./lib", {st_mode=S_IFDIR|0755, st_size=260, ...}) = 0 [pid 28] lstat64("./etc", {st_mode=S_IFDIR|0755, st_size=640, ...}) = 0 [pid 28] lstat64("./dev", {st_mode=S_IFDIR|0755, st_size=640, ...}) = 0 [pid 28] lstat64("./boot", {st_mode=S_IFDIR|0755, st_size=40, ...}) = 0 [pid 28] lstat64("./bin", {st_mode=S_IFDIR|0755, st_size=1920, ...}) = 0 [pid 28] getdents64(3, 0xefb11b80 /* 0 entries */, 4096) = 0 [pid 28] close(3) = 0 [pid 28] write(1, "\33[1;34mbin\33[m \33[1;34metc\33[m"..., 109bin etc linuxrc proc sbin usr ) = 109 [pid 28] write(1, "\33[1;34mboot\33[m \33[1;36minit\33["..., 109boot init media root sys var ) = 109 [pid 28] write(1, "\33[1;34mdev\33[m \33[1;34mlib\33[m"..., 90dev lib mnt run tmp ) = 90 [pid 28] exit_group(0) = ? [pid 28] +++ exited with 0 +++ <... rt_sigsuspend resumed>) = ? ERESTARTNOHAND (To be restarted if no handler) --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=28, si_uid=0, si_status=0, si_utime=0, si_stime=3 /* 0.03 s */} --- getrusage(RUSAGE_CHILDREN, {ru_utime={tv_sec=0, tv_usec=0}, ru_stime={tv_sec=0, tv_usec=0}, ...}) = 0 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED|WCONTINUED, NULL) = 28 getrusage(RUSAGE_CHILDREN, {ru_utime={tv_sec=0, tv_usec=8000}, ru_stime={tv_sec=0, tv_usec=32000}, ...}) = 0 wait4(-1, 0xefbe678c, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes) sigreturn({mask=[INT RT_1 RT_8 RT_15 RT_21 RT_23 RT_31]}) = -1 (errno 629) --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=NULL} --- +++ killed by SIGSEGV +++ Any tipps how to debug this? best regards Waldemar