This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: Merge cpplib and front end hashtables, part 1

To: Neil Booth <neil at daikokuya dot demon dot co dot uk>
Subject: Re: Merge cpplib and front end hashtables, part 1
From: Michael Meissner <meissner at cygnus dot com>
Date: Thu, 17 May 2001 03:11:57 -0400
Cc: Zack Weinberg <zackw at Stanford dot EDU>, gcc-patches at gcc dot gnu dot org
References: <20010512212945.A31175@daikokuya.demon.co.uk> <20010513175419.A20351@daikokuya.demon.co.uk> <20010513115202.C434@stanford.edu> <20010513212521.A28870@daikokuya.demon.co.uk> <20010516231557.E774@stanford.edu> <20010517074102.B26669@daikokuya.demon.co.uk>

On Thu, May 17, 2001 at 07:41:03AM +0100, Neil Booth wrote:
> Zack Weinberg wrote:-
> 
> > I put a brain dump on charset handling into the cpplib projects web
> > page.  It remains a pretty good statement of what I think our end goal
> > should be in terms of user-visible behavior.  It'd be reasonable to do
> > a subset of this stuff to begin with, then get better as things go on.
> 
> I don't understand how the user is going to communicate the encoding
> of a file to us.  My understanding is charset encoding is on a
> per-file basis; i.e. there is only one encoding per file.

No, it is dependent on the current locale as set by setlocale.  Ie, you could
do one setlocale, open up a file and read it via the mb functions, and then
close the file, do a different setlocale, open the exact same file, and get a
different set of multibytes.

> But since some header files are system header files, clearly the whole
> translation unit cannot be in a single charset.

Ummm, I know it is currently late at night for me, but for C89, IIRC, it was
the intention of the committee that the entire translation unit be in a single
charset and that the compiler does the equivalent of setlocale (LC_ALL, "").
Certainly the way I read the first stage of translation in C99's 5.1.1.2, the
compiler does logically translate everything into the source character set.

	5.1.2.2 Translation phases

	The precedence among the syntax rules of translation is specified by
	the following phases [5]

	    1.	Physical source file multibyte characters are mapped to the
		source character set (introducing new-line characters for
		end-of-line indicators) if necessary.  Trigraph sequences are
		replaced by corresponding single-character internal
		representations.

> So we need a way to specify it on a per-file basis, presumably in the
> file itself.  But how can we grok what's in the file if we don't know
> what charset it's written in?  It seems like chicken-and-egg to me.

The characters needed for the C langauge must be present in any encoding, and I
believe they must have the exact same encoding (though I don't recall exactly
where in the standard this is set down, though it may be the section that
describes L"" strings).  Thus for instance:

	"X"[0] == L"X"[0]

-- 
Michael Meissner, Red Hat, Inc.  (GCC group)
PMB 198, 174 Littleton Road #3, Westford, Massachusetts 01886, USA
Work:	  meissner@redhat.com		phone: +1 978-486-9304
Non-work: meissner@spectacle-pond.org	fax:   +1 978-692-4482

Follow-Ups:
- Re: Merge cpplib and front end hashtables, part 1
  - From: Neil Booth
- Re: Merge cpplib and front end hashtables, part 1
  - From: Fergus Henderson
- Re: Merge cpplib and front end hashtables, part 1
  - From: Zack Weinberg

References:
- Merge cpplib and front end hashtables, part 1
  - From: Neil Booth
- Re: Merge cpplib and front end hashtables, part 1
  - From: Neil Booth
- Re: Merge cpplib and front end hashtables, part 1
  - From: Zack Weinberg
- Re: Merge cpplib and front end hashtables, part 1
  - From: Neil Booth
- Re: Merge cpplib and front end hashtables, part 1
  - From: Zack Weinberg
- Re: Merge cpplib and front end hashtables, part 1
  - From: Neil Booth

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]