This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Faster compilation speed
Le Sat, Aug 10, 2002, à 08:32:26AM -0500, Robert Lipe a écrit:
> Linus Torvalds wrote:
>
> > One fundamental fact on modern hardware is that data cache locality is
> > good, and not being in the cache sucks. This is not likely to change.
>
> This is a fact.
> Measuring this sort of thing is possible. (Optimizing without
> measuring is seldom a good idea.) In the absence of processor pods
> and bus analyzers, has anyone thrown gcc at a tool like 'valgrind' or
> cachegrind?
I just did (I was forming the idea while reading the thread, but you beat me
in suggesting it before I implemented it).
I have tried on a grand total of three files, two from today's mainline CVS
(updated from anonymous about four hours ago), and one from Linux 2.5.30; as
my machine is not exactly the dual-multi-gigahertz, "HT"-interconnected
(HyperTransport ?) with gobs of memory bandwith (and what else? 64 bits?)
monsters Linus has been bragging about recently, please bear with lack of
patience to run CG over the whole aforementioned packages...
Some detailed results here: http://www.chepelov.org/cyrille/gcc-valgrind
Excerpt:
java/parse.c
==17875== I refs: 275,598,220
==17875== I1 misses: 43,600
==17875== L2i misses: 41,948
==17875== I1 miss rate: 0.1%
==17875== L2i miss rate: 0.1%
==17875==
==17875== D refs: 145,894,312 (94,095,162 rd + 51,799,150 wr)
==17875== D1 misses: 322,121 ( 259,431 rd + 62,690 wr)
==17875== L2d misses: 313,318 ( 251,817 rd + 61,501 wr)
==17875== D1 miss rate: 0.2% ( 0.2% + 0.1% )
==17875== L2d miss rate: 0.2% ( 0.2% + 0.1% )
==17875==
==17875== L2 refs: 365,721 ( 303,031 rd + 62,690 wr)
==17875== L2 misses: 355,266 ( 293,765 rd + 61,501 wr)
==17875== L2 miss rate: 0.0% ( 0.0% + 0.1% )
emit-rtl.c:
==17968== I refs: 2,315,492,628
==17968== I1 misses: 5,888,264
==17968== L2i misses: 5,481,716
==17968== I1 miss rate: 0.25%
==17968== L2i miss rate: 0.23%
==17968==
==17968== D refs: 1,172,342,347 (702,376,465 rd + 469,965,882 wr)
==17968== D1 misses: 7,920,482 ( 6,205,391 rd + 1,715,091 wr)
==17968== L2d misses: 7,134,597 ( 5,455,816 rd + 1,678,781 wr)
==17968== D1 miss rate: 0.6% ( 0.8% + 0.3% )
==17968== L2d miss rate: 0.6% ( 0.7% + 0.3% )
==17968==
==17968== L2 refs: 13,808,746 ( 12,093,655 rd + 1,715,091 wr)
==17968== L2 misses: 12,616,313 ( 10,937,532 rd + 1,678,781 wr)
==17968== L2 miss rate: 0.3% ( 0.3% + 0.3% )
linux/kernel/signal.c:
==22924==
==22924== I refs: 1,020,746
==22924== I1 misses: 1,030
==22924== L2i misses: 946
==22924== I1 miss rate: 0.10%
==22924== L2i miss rate: 0.9%
==22924==
==22924== D refs: 480,927 (335,166 rd + 145,761 wr)
==22924== D1 misses: 2,075 ( 1,535 rd + 540 wr)
==22924== L2d misses: 2,072 ( 1,532 rd + 540 wr)
==22924== D1 miss rate: 0.4% ( 0.4% + 0.3% )
==22924== L2d miss rate: 0.4% ( 0.4% + 0.3% )
==22924==
==22924== L2 refs: 3,105 ( 2,565 rd + 540 wr)
==22924== L2 misses: 3,018 ( 2,478 rd + 540 wr)
==22924== L2 miss rate: 0.2% ( 0.1% + 0.3% )
I don't want to fuel any kind of flamewars (after all, it's only software),
but the miss rates above don't seem too horrible (maybe they are, after all).
What cachegrind doesn't show (yet ?) is if the access pattern kills
opportunities for the memory interface to use burst transfers; after all,
SDRAM also has some form of "seek time". It is possible that something's
hidden there. Also, I didn't spend much time trying to figure the proper
vg_annotate include path, so some functions appear as unknown in the
detailed cachegrind outputs. Well, that's a start.
-- Cyrille
--
Grumpf.