This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: #include_next and absolute pathnames
> Cc: neil at daikokuya dot co dot uk, gcc at gcc dot gnu dot org
> From: Zack Weinberg <zack at codesourcery dot com>
> Date: Mon, 03 Mar 2003 16:20:11 -0800
> User-Agent: Gnus/5.090016 (Oort Gnus v0.16) Emacs/21.2
> X-OriginalArrivalTime: 04 Mar 2003 00:20:13.0025 (UTC) FILETIME=[D2CCAD10:01C2E1E3]
>
> Geoff Keating <geoffk at geoffk dot org> writes:
> >> From: Neil Booth <neil at daikokuya dot co dot uk>
> >> I'd love to hear what your include improvements are. I had plans
> >> based on caching lookups, indexed by basename.
> >
> > I'm being a bit more ambitious. The idea is that we remove all the
> > existing stat-caching infrastructure, and move to a new scheme where
> > we cache the presence of files along the search path of a particular
> > name.
>
> This is similar to the way it used to work before I rewrote it in June
> 2000. See http://gcc.gnu.org/ml/gcc-patches/2000-06/msg00720.html.
> Please consider carefully the bugs which were fixed by that rewrite
> and make sure you are not reintroducing them. Please also make sure
> you are not making the code even harder to understand than it is
> already.
I can avoid the bugs, but can't help about the code-complexity problem.
If I leave the code as is, the performance is unacceptable in some
cases; I believe that Apple's local version of this patch gave a 15%
speedup on one real-world testcase, between the failing open() calls
and the splay tree lookups, and we've made the rest of GCC faster
since then, so it'd be even more now.
> > The cache is pre-filled using readdir. This helps a lot when using
> > lots of -I paths, because we don't have to keep stat()ing just to
> > find that there is no time.h there. This is done only at the
> > toplevel, so it'll find time.h but not sys/time.h.
>
> My gut feeling is that this is going to be a waste of time on the
> majority of inputs. If you've got numbers saying otherwise then
> I'm fine with it. Or, perhaps you could consider pre-filling only
> when the include path is long enough that it becomes worth it.
Yup. I'm not sure where the threshold will be, that'll have to wait
on timing numbers when I finally get a patch. I think timing results
on the local version of the patch found that readdir() didn't hurt
even "hello-world" programs.
> Note that the existing code never uses stat() to find out if a file
> exists; always open(), because we're going to have to open it anyway
> if it does exist. Please preserve this.
There will be some rare cases where the new code looks to see if a
file exists even though it doesn't plan to open it, in order to keep
its data structures efficient. I'm planning to use open() for this,
though.
> A somewhat pie-in-the-sky idea I've had for awhile is, hold open all
> the directories on the search path, plus the normal working directory,
> and fchdir() into each directory to access the files there -- this
> avoids having to concatenate path names in cpplib and should be less
> work for the kernel to boot. (If you can persuade your kernel
> developers to provide openrelative(dirfd, path, flags, mode) then that
> would be even niftier.) But it would have to be suitably
> conditionalized for the sake of OS-s without fchdir, and could run
> into problems with max-simultaneous-open-file limits.
The problem with that would be that instead of one syscall, you'd end
up with two. The new code shouldn't rely so much on the kernel's
path-parsing code, so should avoid the need for this.
--
- Geoffrey Keating <geoffk at geoffk dot org>