Bug 106624 - [13 Regression] LTO plugin fails to build in parallel builds: xgcc: fatal error: cannot execute '/build/build/./prev-gcc/collect2': execv: Bad address since r13-2011-g53e3b2bf16a486
Summary: [13 Regression] LTO plugin fails to build in parallel builds: xgcc: fatal err...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: driver (show other bugs)
Version: 13.0
: P3 normal
Target Milestone: 13.0
Assignee: Martin Liška
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-08-15 12:14 UTC by Sergei Trofimovich
Modified: 2023-01-24 10:55 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2022-08-15 00:00:00


Attachments
bad.log.xz (78.29 KB, application/x-xz)
2022-08-15 12:14 UTC, Sergei Trofimovich
Details
good.log.xz (196.93 KB, application/x-xz)
2022-08-15 12:14 UTC, Sergei Trofimovich
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sergei Trofimovich 2022-08-15 12:14:10 UTC
Created attachment 53460 [details]
bad.log.xz

This week's gcc snapshot fails bootstrap builds in parallel mode surprisingly frequently. The symptom is the same: build fails while .libs/liblto_plugin.so is being linked.

I attached successful good.log.xz (-j1) and failing bad.log.xz (-j16) builds. Can you help me understand why it fails?

Failure snippet is:

checking for fgets_unlocked... /nix/store/i3ibpx67yncp4w4mpkf5pwvjjsd0aqln-bootstrap-tools/bin/bash ./libtool --tag=CC --tag=disable-static  --mode=link /build/build/./prev-gcc/xgcc -B/build/build/./prev-gcc/ -B/nix/store/v06bn3lc2s0yjci9px8l829mbks695fm-gfortran-13.0.0/x86_64-unknown-linux-gnu/bin/ -O2 -I/nix/store/q7l8qdpbvm594q4ayf4xr8wfqknc0nmg-glibc-2.35-163-dev/include -B/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib/ -idirafter /nix/store/q7l8qdpbvm594q4ayf4xr8wfqknc0nmg-glibc-2.35-163-dev/include -idirafter /nix/store/i3ibpx67yncp4w4mpkf5pwvjjsd0aqln-bootstrap-tools/lib/gcc/x86_64-unknown-linux-gnu/8.3.0/include-fixed -Wl,-rpath,/nix/store/m3wi1gn0309l15zrha95yv9mw39972db-gfortran-13.0.0-lib/lib -Wl,-L/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib -Wl,-rpath -Wl,/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib -Wl,-dynamic-linker=/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib/ld-linux-x86-64.so.2 --sysroot=/  -fno-checking -Wall -fcf-protection -DBASE_VERSION='"13.0.0"' -O2 -I/nix/store/q7l8qdpbvm594q4ayf4xr8wfqknc0nmg-glibc-2.35-163-dev/include -B/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib/ -idirafter /nix/store/q7l8qdpbvm594q4ayf4xr8wfqknc0nmg-glibc-2.35-163-dev/include -idirafter /nix/store/i3ibpx67yncp4w4mpkf5pwvjjsd0aqln-bootstrap-tools/lib/gcc/x86_64-unknown-linux-gnu/8.3.0/include-fixed -Wl,-rpath,/nix/store/m3wi1gn0309l15zrha95yv9mw39972db-gfortran-13.0.0-lib/lib -Wl,-L/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib -Wl,-rpath -Wl,/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib -Wl,-dynamic-linker=/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib/ld-linux-x86-64.so.2 -fno-checking -gtoggle -Wc,-static-libgcc -pthread  -module -avoid-version -bindir /nix/store/v06bn3lc2s0yjci9px8l829mbks695fm-gfortran-13.0.0/libexec/gcc/x86_64-unknown-linux-gnu/13.0.0 -Wl,--version-script=../../gcc-13-20220814/lto-plugin/lto-plugin.map    -Xcompiler '-static-libstdc++' -Xcompiler '-static-libgcc' '-O2' '-I/nix/store/q7l8qdpbvm594q4ayf4xr8wfqknc0nmg-glibc-2.35-163-dev/include' -Xcompiler '-B/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib/' '-idirafter' '/nix/store/q7l8qdpbvm594q4ayf4xr8wfqknc0nmg-glibc-2.35-163-dev/include' '-idirafter' '/nix/store/i3ibpx67yncp4w4mpkf5pwvjjsd0aqln-bootstrap-tools/lib/gcc/x86_64-unknown-linux-gnu/8.3.0/include-fixed' '-Wl,-rpath,/nix/store/m3wi1gn0309l15zrha95yv9mw39972db-gfortran-13.0.0-lib/lib' '-Wl,-L/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib' '-Wl,-rpath' '-Wl,/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib' '-Wl,-dynamic-linker=/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib/ld-linux-x86-64.so.2' -o liblto_plugin.la -rpath /nix/store/v06bn3lc2s0yjci9px8l829mbks695fm-gfortran-13.0.0/libexec/gcc/x86_64-unknown-linux-gnu/13.0.0 lto-plugin.lo  -Wc,../libiberty/pic/libiberty.a
yes
checking for fileno_unlocked... libtool: link:  /build/build/./prev-gcc/xgcc -B/build/build/./prev-gcc/ -B/nix/store/v06bn3lc2s0yjci9px8l829mbks695fm-gfortran-13.0.0/x86_64-unknown-linux-gnu/bin/ -O2 -I/nix/store/q7l8qdpbvm594q4ayf4xr8wfqknc0nmg-glibc-2.35-163-dev/include -B/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib/ -idirafter /nix/store/q7l8qdpbvm594q4ayf4xr8wfqknc0nmg-glibc-2.35-163-dev/include -idirafter /nix/store/i3ibpx67yncp4w4mpkf5pwvjjsd0aqln-bootstrap-tools/lib/gcc/x86_64-unknown-linux-gnu/8.3.0/include-fixed -Wl,-rpath,/nix/store/m3wi1gn0309l15zrha95yv9mw39972db-gfortran-13.0.0-lib/lib -Wl,-L/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib -Wl,-rpath -Wl,/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib -Wl,-dynamic-linker=/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib/ld-linux-x86-64.so.2 --sysroot=/  -fno-checking -shared  -fPIC -DPIC  .libs/lto-plugin.o    -Wl,-rpath -Wl,/nix/store/m3wi1gn0309l15zrha95yv9mw39972db-gfortran-13.0.0-lib/lib -Wl,-L/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib -Wl,-rpath -Wl,/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib -Wl,-dynamic-linker=/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib/ld-linux-x86-64.so.2 -Wl,-rpath -Wl,/nix/store/m3wi1gn0309l15zrha95yv9mw39972db-gfortran-13.0.0-lib/lib -Wl,-L/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib -Wl,-rpath -Wl,/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib -Wl,-dynamic-linker=/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib/ld-linux-x86-64.so.2 -static-libgcc -pthread -Wl,--version-script=../../gcc-13-20220814/lto-plugin/lto-plugin.map -static-libstdc++ -static-libgcc -B/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib/ -Wl,-rpath -Wl,/nix/store/m3wi1gn0309l15zrha95yv9mw39972db-gfortran-13.0.0-lib/lib -Wl,-L/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib -Wl,-rpath -Wl,/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib -Wl,-dynamic-linker=/nix/store/z1as323dfsk12agzlp9ia35p5801isd7-glibc-2.35-163/lib/ld-linux-x86-64.so.2 ../libiberty/pic/libiberty.a   -pthread -Wl,-soname -Wl,liblto_plugin.so -o .libs/liblto_plugin.so
xgcc: fatal error: cannot execute '/build/build/./prev-gcc/collect2': execv: Bad address
compilation terminated.
make[4]: *** [Makefile:472: liblto_plugin.la] Error 1
make[4]: Leaving directory '/build/build/lto-plugin'
make[3]: *** [Makefile:383: all] Error 2
make[3]: Leaving directory '/build/build/lto-plugin'
make[2]: *** [Makefile:15579: all-stage2-lto-plugin] Error 2
make[2]: *** Waiting for unfinished jobs....
Comment 1 Sergei Trofimovich 2022-08-15 12:14:38 UTC
Created attachment 53461 [details]
good.log.xz
Comment 2 Sergei Trofimovich 2022-08-15 13:26:25 UTC
Used configure options:

configure flags: --prefix=/nix/store/fx45rjgwi61c5xx6xyxz9lih1bkyv374-gfortran-13.0.0 --with-gmp-include=/nix/store/gyr707p3ac6ss8pcmf14g0hx041vj9xf-gmp-with-cxx-stage3-6.2.1-dev/include --with-gmp-lib=/nix/store/lcnnbhzzvknkfnlm5qh89xn4in9jm035-gmp-with-cxx-stage3-6.2.1/lib --with-mpfr-include=/nix/store/nfxamp6dnv1jhydhjndnln3maixsw22d-mpfr-stage3-4.1.0-dev/include --with-mpfr-lib=/nix/store/gwrfldp0x95sgsd6kqi2ms52kp68qrk7-mpfr-stage3-4.1.0/lib --with-mpc=/nix/store/3m2bgmj266d857m3x4sfzcbx0rpsqyfd-libmpc-stage3-1.2.1 --with-libelf=/nix/store/7gv2c6bfr8gzzikkp04l5py3yd6w5w13-libelf-0.8.13 --with-native-system-header-dir=/nix/store/q7l8qdpbvm594q4ayf4xr8wfqknc0nmg-glibc-2.35-163-dev/include --with-build-sysroot=/ --program-prefix= --enable-lto --disable-libstdcxx-pch --without-included-gettext --with-system-zlib --enable-checking=release --enable-static --enable-languages=fortran --disable-multilib --enable-plugin --with-isl=/nix/store/vcik6gi61dpw72ygd7lqv8g074m5p4cw-isl-stage3-0.20 --build=x86_64-unknown-linux-gnu --host=x86_64-unknown-linux-gnu --target=x86_64-unknown-linux-gnu
Comment 3 Martin Liška 2022-08-15 13:47:18 UTC
Works for me (with system dependencies), tries for g:e236d671d460dd47262accdea2e9d1d80820ae88.
Comment 4 Sergei Trofimovich 2022-08-15 19:50:06 UTC
Bisected locally down to:

53e3b2bf16a486c15c20991c6095f7be09012b55 is the first bad commit
commit 53e3b2bf16a486c15c20991c6095f7be09012b55
Author: Martin Liska <mliska@suse.cz>
Date:   Tue Aug 9 13:59:36 2022 +0200

    lto: support --jobserver-style=fifo for recent GNU make

    gcc/ChangeLog:

            * opts-jobserver.h: Add one member.
            * opts-common.cc (jobserver_info::jobserver_info): Parse FIFO
            format of --jobserver-auth.

 gcc/opts-common.cc   | 17 +++++++++++++++--
 gcc/opts-jobserver.h |  2 ++
 2 files changed, 17 insertions(+), 2 deletions(-)

Which makes some sense as I locally run GNU make with --shuffle enabled by default: https://savannah.gnu.org/bugs/index.php?62100

It should generate environment something like 'MAKEFLAGS= -j2 --jobserver-auth=3,4 --shuffle=1660054175'.
Comment 5 Martin Liška 2022-08-15 20:26:30 UTC
(In reply to Sergei Trofimovich from comment #4)
> Bisected locally down to:
> 
> 53e3b2bf16a486c15c20991c6095f7be09012b55 is the first bad commit
> commit 53e3b2bf16a486c15c20991c6095f7be09012b55
> Author: Martin Liska <mliska@suse.cz>
> Date:   Tue Aug 9 13:59:36 2022 +0200
> 
>     lto: support --jobserver-style=fifo for recent GNU make
> 
>     gcc/ChangeLog:
> 
>             * opts-jobserver.h: Add one member.
>             * opts-common.cc (jobserver_info::jobserver_info): Parse FIFO
>             format of --jobserver-auth.
> 
>  gcc/opts-common.cc   | 17 +++++++++++++++--
>  gcc/opts-jobserver.h |  2 ++
>  2 files changed, 17 insertions(+), 2 deletions(-)

Funny.

> 
> Which makes some sense as I locally run GNU make with --shuffle enabled by
> default: https://savannah.gnu.org/bugs/index.php?62100

Well, it's more likely caused by the fact that recent GNU make uses of the newly added
fifo style for jobserver. Let me try reproducing it with the current make master.

> 
> It should generate environment something like 'MAKEFLAGS= -j2
> --jobserver-auth=3,4 --shuffle=1660054175'.
Comment 6 Martin Liška 2022-08-15 21:01:45 UTC
> Which makes some sense as I locally run GNU make with --shuffle enabled by
> default: https://savannah.gnu.org/bugs/index.php?62100

Do you have a special patch on top of that? Which exact revision of the make do you use?
Comment 7 Sergei Trofimovich 2022-08-16 06:02:42 UTC
I'm using GNU make from https://git.savannah.gnu.org/cgit/make.git/commit/?id=621d3196fae94e9006a7e9c5ffdaf5ec209bf832 commit (from around 22 June, before FIFO support).

On top of that I apply --shuffle=random by default:

--- a/src/main.c
+++ b/src/main.c
@@ -1513,6 +1513,10 @@ main (int argc, char **argv, char **envp)
       arg_job_slots = env_slots;
   }
 
+  /* Set less conservative default. */
+  if (! shuffle_mode)
+    shuffle_mode= xstrdup ("random");
+
   /* Handle shuffle mode argument.  */
   if (shuffle_mode)
     {

But I think I also see crashes with GNU make-4.2.1.

I don't yet see anything wrong with `lto: support --jobserver-style=fifo for recent GNU make` patch. I'll keep digging what's wrong with my environment.
Comment 8 Sergei Trofimovich 2022-08-16 11:05:45 UTC
I think I understand now why it's such a mysterious failure. gcc uses putenv() incorrectly!

I think the real bug was introduced in: commit 1270ccda70ca09f7d4 "Factor out jobserver_active_p.".

It's gist is the change from `xputenv (concat ("MAKEFLAGS=", dup, NULL));` to `xputenv (jinfo.skipped_makeflags.c_str ());`.

The difference here is what happens with memory allocated to be put into putenv().

putenv() is an odd API as it does not copy data, it just interns the pointer:


// $ cat a.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char arr[1000] = "FOO=1234";

int main() {
    putenv(arr);
    printf("getenv(FOO)='%s'\n", getenv("FOO"));
    sprintf(arr + strlen("FOO="), "!!!!");
    printf("getenv(FOO)='%s'\n", getenv("FOO"));
}

Thus `xputenv (jinfo.skipped_makeflags.c_str ());` gets clobbered with garbage as soon as string is freed and reallocated. I think commit 53e3b2bf16a486c "lto: support --jobserver-style=fifo for recent GNU make" only happens to tickle string reallocation as it does things with more std::strings.

As a hack it looks like the following is enough to build a gcc for me:

--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -9182,7 +9182,7 @@ driver::detect_jobserver () const
 {
   jobserver_info jinfo;
   if (!jinfo.is_active && !jinfo.skipped_makeflags.empty ())
-    xputenv (jinfo.skipped_makeflags.c_str ());
+    xputenv (xstrdup(jinfo.skipped_makeflags.c_str ()));
 }

 /* Determine what the exit code of the driver should be.  */

Not sure what should be used instead for proper memory management.
Comment 9 Martin Liška 2022-08-16 11:27:10 UTC
Thanks for finding out!

To be honest, I verified that path leading to env_manager::xput, but it does string copy only if m_can_restore.

The patch is fine, please send it to gcc-patches as obvious!
Comment 10 Sergei Trofimovich 2022-08-16 11:50:19 UTC
Let's declare it a driver bug.

Proposed the patch as:

    https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599799.html
Comment 11 GCC Commits 2022-08-16 12:16:09 UTC
The master branch has been updated by Sergei Trofimovich <slyfox@gcc.gnu.org>:

https://gcc.gnu.org/g:2b403297b111c990c331b5bbb6165b061ad2259b

commit r13-2075-g2b403297b111c990c331b5bbb6165b061ad2259b
Author: Sergei Trofimovich <siarheit@google.com>
Date:   Tue Aug 16 12:35:07 2022 +0100

    driver: fix environ corruption after putenv() [PR106624]
    
    The bug appeared afte r13-2010-g1270ccda70ca09 "Factor out
    jobserver_active_p" slightly changed `putenv()` use from allocating
    to non-allocating:
    
        -xputenv (concat ("MAKEFLAGS=", dup, NULL));
        +xputenv (jinfo.skipped_makeflags.c_str ());
    
    `xputenv()` (and `putenv()`) don't copy strings and only store the
    pointer in the `environ` global table. As a result `environ` got
    corrupted as soon as `jinfo.skipped_makeflags` store got deallocated.
    
    This started causing bootstrap crashes in `execv()` calls:
    
        xgcc: fatal error: cannot execute '/build/build/./prev-gcc/collect2': execv: Bad address
    
    The change restores memory allocation for `xputenv()` argument.
    
    gcc/
    
            PR driver/106624
            * gcc.cc (driver::detect_jobserver): Allocate storage xputenv()
            argument using xstrdup().
Comment 12 Sergei Trofimovich 2022-08-16 12:17:10 UTC
Should be fixed now.
Comment 13 GCC Commits 2023-01-24 10:52:27 UTC
The releases/gcc-12 branch has been updated by Martin Liska <marxin@gcc.gnu.org>:

https://gcc.gnu.org/g:193f7e62815b4089dfaed4c2bd34fd4f10209e27

commit r12-9061-g193f7e62815b4089dfaed4c2bd34fd4f10209e27
Author: Sergei Trofimovich <siarheit@google.com>
Date:   Tue Aug 16 12:35:07 2022 +0100

    driver: fix environ corruption after putenv() [PR106624]
    
    The bug appeared afte r13-2010-g1270ccda70ca09 "Factor out
    jobserver_active_p" slightly changed `putenv()` use from allocating
    to non-allocating:
    
        -xputenv (concat ("MAKEFLAGS=", dup, NULL));
        +xputenv (jinfo.skipped_makeflags.c_str ());
    
    `xputenv()` (and `putenv()`) don't copy strings and only store the
    pointer in the `environ` global table. As a result `environ` got
    corrupted as soon as `jinfo.skipped_makeflags` store got deallocated.
    
    This started causing bootstrap crashes in `execv()` calls:
    
        xgcc: fatal error: cannot execute '/build/build/./prev-gcc/collect2': execv: Bad address
    
    The change restores memory allocation for `xputenv()` argument.
    
    gcc/
    
            PR driver/106624
            * gcc.cc (driver::detect_jobserver): Allocate storage xputenv()
            argument using xstrdup().
    
    (cherry picked from commit 2b403297b111c990c331b5bbb6165b061ad2259b)
Comment 14 GCC Commits 2023-01-24 10:53:43 UTC
The releases/gcc-11 branch has been updated by Martin Liska <marxin@gcc.gnu.org>:

https://gcc.gnu.org/g:9d21cc4edd94f8f2b1a3241fab5cf75649003226

commit r11-10479-g9d21cc4edd94f8f2b1a3241fab5cf75649003226
Author: Sergei Trofimovich <siarheit@google.com>
Date:   Tue Aug 16 12:35:07 2022 +0100

    driver: fix environ corruption after putenv() [PR106624]
    
    The bug appeared afte r13-2010-g1270ccda70ca09 "Factor out
    jobserver_active_p" slightly changed `putenv()` use from allocating
    to non-allocating:
    
        -xputenv (concat ("MAKEFLAGS=", dup, NULL));
        +xputenv (jinfo.skipped_makeflags.c_str ());
    
    `xputenv()` (and `putenv()`) don't copy strings and only store the
    pointer in the `environ` global table. As a result `environ` got
    corrupted as soon as `jinfo.skipped_makeflags` store got deallocated.
    
    This started causing bootstrap crashes in `execv()` calls:
    
        xgcc: fatal error: cannot execute '/build/build/./prev-gcc/collect2': execv: Bad address
    
    The change restores memory allocation for `xputenv()` argument.
    
    gcc/
    
            PR driver/106624
            * gcc.c (driver::detect_jobserver): Allocate storage xputenv()
            argument using xstrdup().
    
    (cherry picked from commit 2b403297b111c990c331b5bbb6165b061ad2259b)
Comment 15 GCC Commits 2023-01-24 10:55:06 UTC
The releases/gcc-10 branch has been updated by Martin Liska <marxin@gcc.gnu.org>:

https://gcc.gnu.org/g:6ced00d53d91ea429948b34e6600b4633f962030

commit r10-11172-g6ced00d53d91ea429948b34e6600b4633f962030
Author: Sergei Trofimovich <siarheit@google.com>
Date:   Tue Aug 16 12:35:07 2022 +0100

    driver: fix environ corruption after putenv() [PR106624]
    
    The bug appeared afte r13-2010-g1270ccda70ca09 "Factor out
    jobserver_active_p" slightly changed `putenv()` use from allocating
    to non-allocating:
    
        -xputenv (concat ("MAKEFLAGS=", dup, NULL));
        +xputenv (jinfo.skipped_makeflags.c_str ());
    
    `xputenv()` (and `putenv()`) don't copy strings and only store the
    pointer in the `environ` global table. As a result `environ` got
    corrupted as soon as `jinfo.skipped_makeflags` store got deallocated.
    
    This started causing bootstrap crashes in `execv()` calls:
    
        xgcc: fatal error: cannot execute '/build/build/./prev-gcc/collect2': execv: Bad address
    
    The change restores memory allocation for `xputenv()` argument.
    
    gcc/
    
            PR driver/106624
            * gcc.c (driver::detect_jobserver): Allocate storage xputenv()
            argument using xstrdup().
    
    (cherry picked from commit 2b403297b111c990c331b5bbb6165b061ad2259b)