This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC Wide Characters in I/O


Hi Jerry,


> Modify the front-end in trans-io.c to build a call to a new function called 
> transfer_wide_character. This function is used if we have an ENCODING= specifier 
> or a wide character to transfer. This will retain the existing 
> transfer_character call to maintain compatibility.

We will need to look at the following eight cases:

a) kind=1 character and default endocing, formatted
b) kind=4 character and default encoding, formatted
c) kind=1 character and UTF-8 encoding, formatted
d) kind=4 character and UTF-8 encoding, formatted
e) kind=1 character, unformatted, default conversion (trivial :-)
f) kind=4 character, unformatted, default conversion
g) kind=1 character, unformatted, swap conversion (also trivial :-)
h) kind=4 character, unformatted, swap conversion

I would probably put the info about encoding into
dtp->u.p.current_unit->flags , and use transfer_wide_character like
you suggested.


> If we are given a wide character to transfer as unformatted, we simply transfer 
> all the bytes as is. (I will confirm with the standard on this.)

I didn't find anything to the contrary.

> If the user has specified an ENCODING="default" and the kind is 1, we do what we 
> do now and transfer as 8bit (mostly ASCII).  If kind is greater than 1, I 
> suggest we transfer each byte as is.  So for kind=4, we would transfer 4 bytes. 
> This would enable doing some packed 4x1 byte character stuff.  I think the 
> standard would allow this and thats why "default" is so loosely defined.

I think you're right, but maybe that's a point that could be raised on
c.l.f.


> If the user has specified an ENCODING="UTF-8" and the kind is less than 4, 
> strictly speaking, thats an error,

Why?  As far as I understand it (not much :-) the lower 127 ASCII
numbers correspond to UTF-8 encoding.  The problem starts when
converting values >= 128 to UTF-8, but we could either throw a runtime
error or assume iso-8859-1.

> For kind=4 and ENCODING="UTF-8", we do a complete conforming translation of the 
> 4 byte hex to/from the variable width UTF-8 encoding.

Agreed.

	Thomas


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]