This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: #include_next and absolute pathnames


Geoff Keating wrote:-

> I'm being a bit more ambitious.   The idea is that we remove all the
> existing stat-caching infrastructure, and move to a new scheme where
> we cache the presence of files along the search path of a particular
> name.
> 
> So if someone writes
> 
> #include <time.h>
> 
> we'd look up "time.h" in the cache, and get an entry like:
> 
> there is a "time.h" in /usr/lib/gcc-lib/.../include,
>    with contents <pointer>   // from fixincludes
> there is a "time.h" in /usr/include, which we haven't opened yet
> 
> and we'd work out that the <> searching starts earlier than 
> /usr/lib/gcc-lib/.../include, so the one we want is there, and we then
> look into the pointer to the information about its contents, and can
> see things like the multiple-inclusion-prevention macro.

That's roughly what I meant, but without the prefill you mention.

> The cache is pre-filled using readdir.  This helps a lot when using
> lots of -I paths, because we don't have to keep stat()ing just to find
> that there is no time.h there.  This is done only at the toplevel, so
> it'll find time.h but not sys/time.h.

I can imagine lots of cases where the readdir() fill is a loss, though.
With what I had in mind, we do the usual search (once per #include
basename), but only once for that basename.  After that you hit cache.

> The next trick, which is even better, is that if we have
> 
> #include <sys/time.h>
> 
> "sys" (the toplevel directory) is looked up in the cache, and only
> those places on the search path are searched using stat() or open().
> 
> To deal with cases where <sys/time.h> has
> 
> #include "types.h"
> 
> (trying to get the types.h in the current directory), some filename
> rearrangement is done so that sys/types.h is looked up in the current
> directory first.
> 
> Finally, this all flows naturally into a proper #import
> implementation.  The 'content' structure is put into an index by file
> size.  When a new file is encountered, the files of the same size are
> examined, and if it might matter if one of those are the same file as
> the new file, checksums can be computed.  We can also quickly compare
> simplified pathnames or device/inode numbers before doing anything as
> expensive as a checksum, which also makes the multiple-inclusion macro
> stuff more efficient.

I'd like to see some performance numbers for various cases, from
no #include, a single #include like <stdio.h> through to heavy
inclusion.

Neil.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]