This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Improving parallel build (was: [C++/Obj-C++ PATCH] Fix Objective-C++ breakage)


[ trimming Cc: list ]

Hello Richard,

* Richard Guenther wrote on Fri, Mar 28, 2008 at 11:58:03AM CET:
> Bootstrap takes
> 
> 11566.82user 835.02system 37:41.31elapsed 548%CPU

> on a moderately old 8 core machine (with enough memory to allow
> -j10 bootstrap and -j8 test).

> As you can see we can not even fully utilize all the CPUs (the big
> generator files are likely a problem and bad parallelism in the
> libjava build is another),

FWIW, until gij, excj, classpath/tools/tools.zip are built, I see good
parallelism in libjava: 707%CPU on an 8-way.  After that it's pretty
much single cpu.  I have not looked into that any further yet, probably
some of the more expensive objects could be moved.

> If we can improve on the bootlenecks (dejagnu anyone?  splitting
> insn-* and gen*, or building the generator files optimized during
> stage1?) it would maybe scale even better, which even I would
> appreciate.

Here's a small analysis for gcc/.

Summary conclusion: better schedule hinting for GNU make is cheap and
should be done before splitting insn-attrtab.c and insn-recog.o.  You
should use the patch below and -j8.

First, a few measurements, all done on an otherwise-quiet 8-way system
with plenty of RAM:

The work of gen-* is not relevant yet (shown is time to update target
without its dependencies):

s-attrtab
12.57user 0.14system 0:12.72elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+64437minor)pagefaults 0swaps
s-automata
1.73user 0.08system 0:01.84elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+37814minor)pagefaults 0swaps

Of insn-*, these are the most costly files:
insn-attrtab.o
107.99user 0.89system 1:48.90elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+441498minor)pagefaults 0swaps
insn-recog.o
48.07user 0.34system 0:48.42elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+116151minor)pagefaults 0swaps
insn-emit.o
11.71user 0.21system 0:11.97elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+67973minor)pagefaults 0swaps
insn-automata.o
1.42user 0.12system 0:01.55elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+42380minor)pagefaults 0swaps

FWIW, the most costly non-insn:
fold-const.o
26.98user 0.30system 0:27.41elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+97186minor)pagefaults 0swaps

insn-recog.c has lots of non-large static functions, could be split
easily (but they'd have to be global then).

insn-attrtab.c's functions are already global, but at least for x86_64,
two functions make up for roughly a third of the file each, so only
splitting at function borders only will still leave it at the top of the
list.  Which means it makes sense to look at insn-emit only after
splitting the big switch statements inside internal_dfa_insn_code and
insn_default_latency (in that order), and only after you've bought a
16-way system.

But before splitting, really make could do better, and this is why:

$ make clean; make -j20 all
872.38user 25.53system 3:28.88elapsed 429%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6247181minor)pagefaults 0swaps

$ make clean; \time make -j20 insn-attrtab.o
131.47user 2.96system 2:05.84elapsed 106%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+822606minor)pagefaults 0swaps

$ make all; touch insn-attrtab.o; \time make -j20 all
6.60user 2.95system 0:03.79elapsed 252%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+283480minor)pagefaults 0swaps

This shows that if make knew to give insn-attrtab.o and its dependencies
appropriate priority, the build could be pushed down to approximately
2:10.  (make doesn't do that right now, because after triggering the
dependencies for insn-*, i.e., the compiles for gen-*, it walks the
complete set of objects before returning to insn-*).

Below is a patch to use an order-only dependency of all objects on all
the generated files.  With it, we get down to

$ make clean; \time make -j20
880.34user 26.30system 3:06.76elapsed 485%CPU (0avgtext+0avgdata 0maxresident)k
0:inputs+0outputs (0major+6245542minor)pagefaults 0swaps

which is noticeably better but not what we expected.  Now the rule for
insn-attrtab.o is spawned nicely early, but it has to compete with other
processes for cpu time.  So let's not let the kernel scheduler interfere
with make's:

$ make clean; \time make -j8
882.91user 24.80system 2:19.52elapsed 650%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6283548minor)pagefaults 0swaps

Note that it's really the order-only prerequisite that makes the job
number matter: without the patch below, -j8 is as fast as -j20.

I do not know if this patch could also trigger the make bug that
prevented automatic dependency tracking, but I haven't been able
to get make to hang with some casual testing.

Otherwise, it passed bootstrap.  OK for trunk?

Thanks for reading this far,
Ralf

gcc/ChangeLog:
2008-03-30  Ralf Wildenhues  <Ralf.Wildenhues@gmx.de>

	* Makefile.in (ALL_OBJS): New macro.
	$(ALL_OBJS): Order-depend on the generated files, for parallel
	efficiency.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 407e2fe..eef6613 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1255,6 +1255,9 @@ OBJS = $(OBJS-common) $(OBJS-md) $(OBJS-archive)
 
 OBJS-onestep = libbackend.o $(OBJS-archive)
 
+ALL_OBJS = $(OBJS) $(GNAT1_OBJS) $(GNATBIND_OBJS) $(CXX_OBJS) $(F95_OBJS) \
+  $(JAVA_OBJS) $(OBJC_OBJS) $(OBJCXX_OBJS)
+
 BACKEND = main.o @TREEBROWSER@ libbackend.a $(CPPLIB) $(LIBDECNUMBER)
 
 MOSTLYCLEANFILES = insn-flags.h insn-config.h insn-codes.h \
@@ -3008,6 +3011,11 @@ $(simple_generated_c:insn-%.c=s-%): s-%: build/gen%$(build_exeext) \
 	$(SHELL) $(srcdir)/../move-if-change tmp-$*.c insn-$*.c
 	$(STAMP) s-$*
 
+# In order for parallel make to really start compiling the expensive
+# objects from $(OBJS-common) as early as possible, build all their
+# prerequisites strictly before all objects.
+$(ALL_OBJS) : | $(simple_generated_h) $(simple_generated_c)
+
 # genconstants needs to run before insn-conditions.md is available
 # (because the constants may be used in the conditions).
 insn-constants.h: s-constants; @true


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]