This is the mail archive of the
fortran@gcc.gnu.org
mailing list for the GNU Fortran project.
Re: RFC Wide Characters in I/O
- From: Thomas Koenig <tkoenig at netcologne dot de>
- To: Jerry DeLisle <jvdelisle at verizon dot net>
- Cc: Fortran List <fortran at gcc dot gnu dot org>
- Date: Fri, 16 May 2008 21:00:22 +0200
- Subject: Re: RFC Wide Characters in I/O
- References: <482DD108.5080006@verizon.net>
Hi Jerry,
> Modify the front-end in trans-io.c to build a call to a new function called
> transfer_wide_character. This function is used if we have an ENCODING= specifier
> or a wide character to transfer. This will retain the existing
> transfer_character call to maintain compatibility.
We will need to look at the following eight cases:
a) kind=1 character and default endocing, formatted
b) kind=4 character and default encoding, formatted
c) kind=1 character and UTF-8 encoding, formatted
d) kind=4 character and UTF-8 encoding, formatted
e) kind=1 character, unformatted, default conversion (trivial :-)
f) kind=4 character, unformatted, default conversion
g) kind=1 character, unformatted, swap conversion (also trivial :-)
h) kind=4 character, unformatted, swap conversion
I would probably put the info about encoding into
dtp->u.p.current_unit->flags , and use transfer_wide_character like
you suggested.
> If we are given a wide character to transfer as unformatted, we simply transfer
> all the bytes as is. (I will confirm with the standard on this.)
I didn't find anything to the contrary.
> If the user has specified an ENCODING="default" and the kind is 1, we do what we
> do now and transfer as 8bit (mostly ASCII). If kind is greater than 1, I
> suggest we transfer each byte as is. So for kind=4, we would transfer 4 bytes.
> This would enable doing some packed 4x1 byte character stuff. I think the
> standard would allow this and thats why "default" is so loosely defined.
I think you're right, but maybe that's a point that could be raised on
c.l.f.
> If the user has specified an ENCODING="UTF-8" and the kind is less than 4,
> strictly speaking, thats an error,
Why? As far as I understand it (not much :-) the lower 127 ASCII
numbers correspond to UTF-8 encoding. The problem starts when
converting values >= 128 to UTF-8, but we could either throw a runtime
error or assume iso-8859-1.
> For kind=4 and ENCODING="UTF-8", we do a complete conforming translation of the
> 4 byte hex to/from the variable width UTF-8 encoding.
Agreed.
Thomas