This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: #import future (summary and proposed solutions)


Geoffrey Keating <geoffk@apple.com> writes:

> Possible solutions
> 
> These are the solutions that I've considered and not yet rejected.
> 
> 5. Change the semantics of #import so that it's a semantic property of the token
> 
>     stream.
>     This is the patch I posted.  It changes #import so that "the same file"
> means
> 
>     that the name by which it's referenced in the source is the same.

It should be pointed out that the C standard does not impose any
interpretation on the filename token sequence passed to #include.  So
any implementation of #import that attempted to simplify the filename
goes into undefined implementation dependent territory and cannot be
used portably.  So the practical choice is either the exact byte
sequence matches or content matching.

Additionally this thread has already shown many examples of how
filename matching with any other rule will still not meet the obvious
criteria of sameness.

> 6. Use checksums to determine "same file"-ness, but otherwise as before.
>     This would be OK, although not great, for the FSF semantics; the cost of
> using
>     checksums would only be required once #import was used, although it would
> then
>     apply to every new file seen.  Unfortunately, for the Apple semantics, it
>     would require checksumming every #included file at PCH creation time,
>     whether or not #import was used, in case someone later #import-ed one of
> those
>     files.  I'll try this out to see how bad it is.
>     [Of course, with this solution you would avoid checksumming any file more
> than
>     once per compiler run.]

And in the case you are checksumming the file you must read all of the data
anyway so the cost may easily be lost in the noise.

I would suggest having a file included with #import or that includes
"#pragma once" to be treated as if it was wrapped with it was wrapped
by 
#ifndef __CHECKSUM__XXXXX
#define __CHECKSUM__XXXXX
#endif  __CHECKSUM__XXXXX

With a well defined checksum algorithm, and a well defined stage of the process
that a checksum is placed upon the file.

With this the strange corner cases become weird but completely predictable.
So if you want apple semantics a file may also be manually wrapped with the
#ifndef __CHECKSUM__XXXX. 

A "#pragma many" may be desirable to allow for multiple inclusion behavior.

A good definition is one in which the same code may be compiled with two
different compilers and the same results are achieved, and if not there is
a clear delineation of which result matches the specification.

I would suggest the checksum be applied after the file was encoded in
UTF-8, and after trigraphs are expanded, and the file has been
internally converted to use standard C line endings ("\n").  I guess I
mean the token stream before preprocessing with tokens numerically in
their UTF-8 encodings.  UTF-8 has a nice byte oriented representation,
and should be able to represent anyones text file.

As for the checksum algorithm I would suggest something well defined
and weak like the ipv4 checksum.  Given that the algorithm is well
defined, a stronger checksum is not needed as the text of the file
may be deterministicly modified to remove false positives.  The 
important point is having a well defined algorithm so that false
positives are well defined, leading to deterministic behavior of #import.

Eric


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]