This is the mail archive of the
mailing list for the GCC project.
Re: A sick idea - mmapped file output
- To: zackw at Stanford dot EDU (Zack Weinberg)
- Subject: Re: A sick idea - mmapped file output
- From: Bourne-again Superuser <toor at dyson dot jdyson dot com>
- Date: Tue, 7 Nov 2000 20:30:41 -0500 (EST)
- Cc: gcc at gcc dot gnu dot org
Zack Weinberg said:
> What I found was that the total system and wall-clock time charged to
> the process scaled linearly with file size when read was used, and was
> constant when mmap was used. (As measured by getrusage - which
> *should* be counting time spent in page faults.) User mode time was
> the same either way. For small files, read was faster than mmap, but
> when the file got above four pages or so, mmap was faster. The gain
> was substantial for very large files. This is why we have
> MMAP_THRESHOLD in cppfiles.c.
> I'd be interested to know the details of your testing, and how it
> compares with your results.
Alot of my info is long-gone or stashed in long-unused directory trees (I
haven't significantly played with VM code in 2 or 3 yrs now.) The tradeoffs
are sensitive to the VM environment, and (as you probably/obviously know)
general rules across diverse platforms are not going to be easy to
Perhaps an easy to evaluate test, might be to find the crossover on
FreeBSD and Linux (AND NO, THIS ISN"T a LINUX vs. FreeBSD issue!!!) The
information collected would be for demonstrating (or determining) the
break-even point. If I get in the mood to do an investigation (I have
been playing with multimedia software issues in the last year or so),
it shouldn't be too awful hard for me to create a simple benchmark (given
the context from my previous stuff -- code and notes mostly scattered
amongst lots of disk drives :-).) The issue about these 'simple' benchmarks is that
there are often so many variables (not only the obvious cache issues and
the differences in handling mmap vs. VFS-type I/O, but also the amount of
concurrency) that the actual 'advantage' is hard to determine.
This election thing tonight is irritating, and if I can pull myself
away from my current obscession (the signal processing stuff), I
might take a look. It is still worrisome to develop a piece of
software to evaluate a performance measure, and then not take all
of the real-world situation into account... Pronouncements of more
or less total performance are often hard to truly verify, unless the
differences are substantial.
IMO, the actual amount of advantage is usually so small in the non-trivial
cases, that it is more a matter of style and the interest of the developer
and not 'performance' that is the reason for such a choice. It'd sure be
good if mmap was more commonly used, not because it is intrinsically better,
but because it would simply be more used, and the prejudice for and against
mmap-type I/O would be less of an issue. Mmap certainly provides a different
view of on-disk files than what read/write does -- and as we learn how to use
it 'better', it might become an even better tool.