This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: If you had a month to improve gcc build parallelization, where would you begin?
- From: Jeff Law <law at redhat dot com>
- To: Joern Rennecke <joern dot rennecke at embecosm dot com>
- Cc: David Fang <fang at csl dot cornell dot edu>, Geert Bosch <bosch at adacore dot com>, Simon Baldwin <simonb at google dot com>, gcc at gcc dot gnu dot org
- Date: Wed, 03 Apr 2013 22:25:55 -0600
- Subject: Re: If you had a month to improve gcc build parallelization, where would you begin?
- References: <CAPTY64o0UBQBwnq_GMNOBRmdBV4QTc+En3Q7pLn6iR1aKXKQTA at mail dot gmail dot com> <43ABCE7D-03A4-4534-9A3C-79360A7AEC75 at adacore dot com> <Pine dot LNX dot 4 dot 64 dot 1304031751290 dot 19270 at hal-00 dot csl dot cornell dot edu> <20130403195359 dot dxngsssuo8cgsww4-nzlynne at webmail dot spamcop dot net> <515CDC53 dot 5000009 at redhat dot com> <20130403234402 dot or5qa8khmsgs8k40-nzlynne at webmail dot spamcop dot net>
On 04/03/2013 09:44 PM, Joern Rennecke wrote:
Quoting Jeff Law <law@redhat.com>:
Using distcc and ccache is trivial; I spread my builds across ~20
processors around the house...
CC=distcc
CXX=distcc g++
CC_FOR_BUILD=distcc
CXX_FOR_BUILD=distcc
It's not quite that simple if you want bootstraps and/or Canadian crosses.
It is for bootstraps. Been using it for years.
STAGE_CC_WRAPPER=distcc
STAGE_CXX_WRAPPER=distcc
How does that work?
The binaries have to get the all the machines of the clusters somewhere.
NFS with wired connections. A mix of 100M and 1000M interfaces on the
boxes.
Does this assume you are using NFS or similar for your build directory?
Won't the overhead of using that instead of local disk kill most of the
parallelization benefit of a cluster over a single SMP machine?
What I've found works best is to have the machine with the disks handle
the configury, preprocessing, linking & java nonsense and farm all the
actual compilations to the rest of the cluster. I've manually
distributed testing through the years with varying degrees of success.
The net result is the gcc bootstrap itself can be parallelized well.
We're left with the configury serialization, java which isn't handled by
distcc, lameness in the multilib builds & testing as the big holdups.
We probably lose a little trying to distribute stuff like libgcc where
each file is so trivial. Not surprising those are the areas I
suggested for improvement.
I played with pump mode which basically moves preprocessing to the
clients (by shipping them the headers). That would push a significant
amount of the load off the master to the rest of the cluster, but never
got it to work with bootstraps.
Jeff