This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Universal Character Names, v2
- From: martin at v dot loewis dot de (Martin v. Löwis)
- To: Neil Booth <neil at daikokuya dot co dot uk>
- Cc: Zack Weinberg <zack at codesourcery dot com>, gcc-patches at gcc dot gnu dot org
- Date: 02 Dec 2002 01:35:11 +0100
- Subject: Re: Universal Character Names, v2
- References: <200211282334.gASNYdTA004058@mira.informatik.hu-berlin.de><87r8d5rq2b.fsf@egil.codesourcery.com><20021129071218.GB8045@daikokuya.co.uk><87u1hxbe0z.fsf@egil.codesourcery.com><20021202002441.GA3539@daikokuya.co.uk>
Neil Booth <neil@daikokuya.co.uk> writes:
> I've had more thoughts about arbitrary charsets. Rather than converting
> to UTF-8 on a per-character basis, the obvious place is to convert
> a line-at-a-time from the new-line handler (plus a call when starting
> a buffer to get the process started).
Would there be anything wrong with converting the entire *file*?
Some encodings may have shift states that can expand beyond the line
end (although I think this is discouraged in many encodings), so you
might have difficulties interpreting the line end before performing
the charset conversion.
Regards,
Martin