Toolchain bootstrapping advice needed

Bryan Ischo bryan@ischo.com
Tue Aug 9 08:44:00 GMT 2011


Greetings GCC users and developers!

I've recently embarked upon a possibly futile effort to create a script 
to bootstrap a GNU toolchain - binutils, gcc, and glibc - from a system 
with the most minimal of prerequisites.  My goal is to have a script 
that can, on any build host, create a toolchain that can run on a 
specific host type (which could be the same as the build system type or 
could be different), and targeting a possibly different target type.  In 
other words, I'm trying to build a single script that can build a 
toolchain for any arbitrary combination of build, host, and target 
system type.

I've gotten pretty far, although it's taken quite a long time to 
understand the intricate dance that must be performed to bootstrap gcc 
and glibc, and also has required some patches to glibc, mostly to get 
past what I consider to be a deficiency in its autoconf scripts: namely, 
that they error out on tests that check for a working linker when in 
fact glibc ought to be buildable without any liker at all (although its 
utility programs can't be built, but those aren't necessary during the 
bootstrapping process).  At this point I can produce a working binutils 
and glibc, but the "final" build of gcc is giving me some problems that 
I am still working through.

But what I really want to talk about is my general approach and to get a 
validation of it and my assumptions about the value of (or lack thereof) 
my approach.

Basically, I want my build script to not assume that the system compiler 
is anything other than an ISO C90 compiler, with a standard C library 
that may not have anything to do with glibc but is a complete 
implementation of the C library standard.

What this forces me to do is to not simply compile all the tools with 
the system compiler directly; the only tools I can build with the system 
compiler are binutils and gcc.  glibc itself has a strict requirement 
that it be compiled with gcc and I don't even want to assume that any 
old version of gcc on the system is sufficient; I'd rather let the 
version of gcc being bootstrapped be the one to compile glibc.  In this 
way it's up to the toolchain builder to choose versions of binutils, 
gcc, and glibc that are known to work together, not to require that his 
or her build system has a binutils, gcc, or glibc that is compatible 
with whatever target versions are being built.  Like I said, I just want 
to assume an ISO C90 compiler and C library, and nothing more.

The set of steps that I have come up with to accomplish this 
bootstrapping is:

1. Build binutils
2. Build stage1 gcc, building just the "gcc" and "install-gcc" targets, 
not the full build (which would try to compile libraries that require 
glibc, which has not yet been built)
3. Build stage1 glibc using the stage1 gcc compiler; this uses the 
binutils from (1) and the stage1 gcc from (2).  This version of glibc is 
built with only static libraries and without any of the helper programs 
of glibc, because the stage1 gcc cannot build shared libraries or 
executables.
4. Build stage2 gcc against the stage1 glibc, with executable and shared 
library support, but without libmudflap which cannot be built against 
the purely static stage1 glibc.
5. Build final glibc with stage2 gcc, this is a complete and final glibc 
with shared library support and support of all features.
6. Build final gcc against final glibc, which is a complete gcc with 
full support for all features.

(my remaining difficulty is with step 6, the problem being that the 
stage2 gcc uses a sysroot that is causing it to fail to be able to link 
against final glibc properly, but I'll work that out)

These steps are complicated by gcc's library dependencies (zlib, gmp, 
mpfr, mpc) that must be built for both the build system, host system, 
and target system at various points during the process, and also by 
multiple versions of binutils needing to be built because of binutils 
"feature" of requiring sysroot to be a compile-time option instead of a 
runtime option.

What the above sequence produces is a cross-compiler built to run on the 
build system targeting a given target system, which is not the end goal 
of the bootstrapping process, but does produce cross-compilers that are 
needed to complete the process.

That sequence is run twice: once to produce a cross-compiler that runs 
on the build system and targets the host system, and once to produce a 
cross-compiler that runs on the build system and targets the target 
system (if host=target, then only one build is necessary).

Finally, once a cross-compiler for both the host and target system is 
available, a final binutils version to run on the host system and target 
the target system is built, along with a gcc for the host system 
targeting the target system.

These steps result in quite a few compiles:

- binutils is built 9 times
- gcc is built 6 times
- glibc is built 5 times

But I believe that this process is successful in not depending on the 
version of the build system compiler at all; it simply needs to be ISO 
90 compliant so that it can build gcc and binutils (like I mentioned, 
the bootstrapped gcc and binutils are themselves used to create glibc).  
At each step of the bootstrapping process, each tool is only dependent 
on the other tools being built, except of course for the build system 
ISO C90 compiler and C library.

One shortcoming of my approach is that the final version of glibc is not 
built by a gcc that was built by itself; it is instead built by a gcc 
that was built by the build system compiler.  Does this matter?  If so I 
think the easiest thing for me to do would be to adapt my script to 
first build binutils, gcc, and glibc with build=host=target, and then 
use that as the "build system toolchain" for the other steps I outlined 
above.  Then the versions of the compiler and binutils that will be used 
to produce the final versions of glibc will have been built by the 
target toolchain itself instead of by the system toolchain.  This will 
add 4 more binutils builds, 3 more gcc builds, and 2 more glibc builds 
to the mix, but it will hopefully produce even more robust output.

I think that some of my steps could be simplified if I could convince 
myself that I don't need to use sysroots during various stages of the 
bootstrapping, and can just reference the build system includes and 
libraries instead of trying to always be sure that every step references 
only the toolchain being built.  Is it a worthwhile goal to try to make 
every build step rely only on the toolchain being built instead of the 
build system toolchain?

Finally, can someone validate my assumptions here:

1. When gcc is built, it should be built with a --with-build-sysroot 
that references the version of glibc being built rather than the build 
system's libc.

2. When glibc is built, it is OK for it to reference the build system's 
libc header files rather than its own.  I haven't figured out how to 
configure glibc's build to reference only its own headers instead of the 
system libc headers (I try to avoid CFLAGS because it wreaks havoc with 
configure).

3. --with-build-sysroot is a sufficient option to cause gcc builds to 
only reference glibc headers and libs produced during the bootstrapping 
process

Sorry for the disjoint and wordy nature of this post; I'm really tired 
after many long hours of hacking on this script.

Thanks!
Bryan



More information about the Gcc-help mailing list