This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Universal Character Names, v2

From: martin at v dot loewis dot de (Martin v. Löwis)
To: Neil Booth <neil at daikokuya dot co dot uk>
Cc: Zack Weinberg <zack at codesourcery dot com>, gcc-patches at gcc dot gnu dot org
Date: 02 Dec 2002 01:35:11 +0100
Subject: Re: Universal Character Names, v2
References: <200211282334.gASNYdTA004058@mira.informatik.hu-berlin.de><87r8d5rq2b.fsf@egil.codesourcery.com><20021129071218.GB8045@daikokuya.co.uk><87u1hxbe0z.fsf@egil.codesourcery.com><20021202002441.GA3539@daikokuya.co.uk>

Neil Booth <neil@daikokuya.co.uk> writes:

> I've had more thoughts about arbitrary charsets.  Rather than converting
> to UTF-8 on a per-character basis, the obvious place is to convert
> a line-at-a-time from the new-line handler (plus a call when starting
> a buffer to get the process started).  

Would there be anything wrong with converting the entire *file*?

Some encodings may have shift states that can expand beyond the line
end (although I think this is discouraged in many encodings), so you
might have difficulties interpreting the line end before performing
the charset conversion.

Regards,
Martin

Follow-Ups:
- Re: Universal Character Names, v2
  - From: Neil Booth
- Re: Universal Character Names, v2
  - From: Neil Booth

References:
- Re: Universal Character Names, v2
  - From: Zack Weinberg
- Re: Universal Character Names, v2
  - From: Neil Booth

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]