This is the mail archive of the
mailing list for the GCC project.
Re: The new scheduler and x86 CPUs
- To: "Vladimir N. Makarov" <vmakarov at cygnus dot com>
- Subject: Re: The new scheduler and x86 CPUs
- From: Bernd Schmidt <bernds at redhat dot com>
- Date: Wed, 29 Aug 2001 10:41:58 +0100 (BST)
- cc: "jh at suse dot cz" <jh at suse dot cz>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
> Even UltrasparcI was an OOO processor.
> Potential fine grain parallelism is described in e.g.
> Limits of Instruction Level Parallelism: David W. Wall, Proc. 4th ASPLOS.
> All potential parallelism is investigated in
> On the Limits of Program Parallelism and its Smoothability (1992) Kevin B.
> Theobald, Guang R. Gao, Laurie J. Hendren
> It can achieve 1000 for some SPEC programs.
On machine models that assume
1) an infinite number of execution units
2) an infinite issue window
3) a fixed memory latency of 1 cycle
4) perfect renaming so that output- and anti-dependencies are irrelevant
5) no control dependencies
6) All data dependencies are known exactly; alias analysis is perfect (see
_Please_. This may be fun for researchers, but it's of little use in the
real world, where memory latencies are occasionally greater than 1.
I'd also like to point out that all their analysis is done at the equivalent
of run-time, i.e. on a trace of a program. Their model basically does what
an OOO CPU with infinite resources would do. Nowhere do they even begin to
discuss how a compiler could come up with code that extracts this kind of
Now, I'm not entirely familiar with all of the benchmarks they used. Can
you show that the ones with the highest degrees of parallelism could not be
split up into several worker threads? If that's an option, it will be a
lot cheaper to get high parallelism by using multiple CPUs with modest ILP
each, rather than by using one CPU with 100 ALUs and load/store units, even
if such a thing were technically feasible.
Read their conclusions section - they explicitly say
1) current imperative languages (and therefore of course also their
compilers) are entirely unsuited for getting high parallelism
2) speculative execution is important.