Merge cpplib and front end hashtables, part 1

Thu May 17 12:58:00 GMT 2001

On 17-May-2001, Zack Weinberg <zackw@Stanford.EDU> wrote:
> On Fri, May 18, 2001 at 02:16:05AM +1000, Fergus Henderson wrote:
> > On 17-May-2001, Neil Booth <neil@daikokuya.demon.co.uk> wrote:
> > > So we need a way to specify it on a per-file basis, presumably in the
> > > file itself.
> > 
> > Here I disagree on two counts.
> > 
> > Firstly, I think in most cases it would be more convenient to
> > specify it on a per-directory basis rather than a per-file basis.
> 
> What sort of interface do you have in mind for this?

Oh, I was thinking along the lines of a command-line option;
perhaps a variant of `-I' that also specifies the encoding.

> At the moment I
> don't see any which doesn't violate the invariant that the meaning of
> a file must not depend on external information.

Hmm, yes.

> Also, I don't think per-directory is fine grained enough.  Suppose you
> have three third-party libraries all of which install headers in
> /usr/include, written in three different countries, and all the
> headers have comments in the native language of the country of origin.
> I don't think that's at all far-fetched.

Well, IMHO third-party libraries shouldn't be installing their headers in
/usr/include.  For one thing, this prevents having multiple versions
of each third-party library installed simultaneously.
Instead, each third-party library should install its headers in a
version-specific and hence package-specific directory.

(Each package should also come with a config script, e.g. `foo-config'
for package `foo', that you can use to find out which directory the
header files have been installed in.  Then you can just need to set
your PATH to find the appropriate versions of each library.)

No doubt some people will want to put everything in /usr/include.
So a different approach may be needed for that case.

However, these different approaches to specifying the encoding are
not necessarily mutually exclusive.  GCC could support multiple methods,
e.g.

	- putting the encoding in the file itself
	- specifying the encoding of individual files via a command line option
	- specifying the encoding of directories via a command line option
	- guessing based on the file contents

perhaps in the above order of priority.

> Note that MULE uses tags inside the file to determine character sets,
> and this seems to work well enough for most people, so I think it's
> not a problem in practice.

Fair enough.  I don't have much practical experience working with multiple
encodings; my main experience of it is with the many @#%^ problems that arise
regarding DOS-vs-Unix encoding of text files.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
                                    |  of excellence is a lethal habit"
WWW: < http://www.cs.mu.oz.au/~fjh >  |     -- the last words of T. S. Garp.