This is the mail archive of the
mailing list for the GCC project.
Re: A sick idea - mmapped file output
On Mon, Nov 06, 2000 at 02:26:49PM -0500, Bourne-again Superuser wrote:
> In rather exhaustive tests, I have found that there is sometimes some
> gain for using mmap for read operations. If there is any significant
> amount of usage of that read data, then the gain for mmap is proportionally
> very small. There are all sorts of tradeoffs (cache issues) and the like,
> but for data that is going to be significantly processed, the gain is
When I implemented mmapped read for cpplib, I benchmarked it
moderately thoroughly. cpp does do extensive processing of its input
(how much depends on the form of the input, of course) but it always
does one almost-linear scan through the entire file. I rigged up a
test harness around cpplib itself, so it would scan a file and throw
away the preprocessed output; then I generated files of increasing
size but similar form and ran them through several thousand times each
(to bulk up the times to the point where they were measurable and
squash the error term).
What I found was that the total system and wall-clock time charged to
the process scaled linearly with file size when read was used, and was
constant when mmap was used. (As measured by getrusage - which
*should* be counting time spent in page faults.) User mode time was
the same either way. For small files, read was faster than mmap, but
when the file got above four pages or so, mmap was faster. The gain
was substantial for very large files. This is why we have
MMAP_THRESHOLD in cppfiles.c.
I'd be interested to know the details of your testing, and how it
compares with your results.