This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [patch, fortran] Wide character I/O Part 1
- From: FX <fxcoudert at gmail dot com>
- To: Jerry DeLisle <jvdelisle at verizon dot net>
- Cc: Fortran List <fortran at gcc dot gnu dot org>, gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 29 May 2008 23:09:44 +0100
- Subject: Re: [patch, fortran] Wide character I/O Part 1
- References: <483E0FBE.8090301@verizon.net>
Here's my review. I don't think the patch is OK, unless I made a
mistake below (see the place where I talk about GFC_SIZE_OF_CHAR_KIND).
FX
@@ -327,6 +328,13 @@ gfc_build_io_library_fndecls (void)
void_type_node, 3, dt_parm_type,
pvoid_type_node, gfc_int4_type_node);
+ iocall[IOCALL_X_CHARACTER_WIDE] =
+ gfc_build_library_function_decl (get_identifier
+ (PREFIX("transfer_character_wide")),
+ void_type_node, 4, dt_parm_type,
+ pvoid_type_node, gfc_int4_type_node,
+ gfc_int4_type_node);
+
iocall[IOCALL_X_REAL] =
gfc_build_library_function_decl (get_identifier (PREFIX
("transfer_real")),
void_type_node, 3, dt_parm_type,
You give them gfc_int4_type_node type and they're just "int" in the
library version. I know there are a few other places like that lying
around, but we shouldn't introduce new ones :) I think the kind
argument should be an int, and the length should be a
gfc_charlen_type_node (which is really an integer(kind=4), but that's
clearer that way).
case BT_CHARACTER:
+ if (kind == 4)
+ {
+ if (se->string_length)
+ arg2 = se->string_length;
+ else
+ {
+ tmp = build_fold_indirect_ref (addr_expr);
+ gcc_assert (TREE_CODE (TREE_TYPE (tmp)) == ARRAY_TYPE);
+ arg2 = TYPE_MAX_VALUE (TYPE_DOMAIN (TREE_TYPE (tmp)));
+ }
Even though it's probable safe already, I'd be tempted to fold_convert
() arg2 into a gfc_charlen_type_node, just to make sure.
/* This is the offset (in bytes) required to cast from logical(8)* to
logical(4)*. and still get the same result. Will be 0 for
little-endian
- machines and 4 for big-endian machines. */
+ machines and 1 for big-endian machines. */
int l8_to_l4_offset = 0;
The thing is, you've corrected the values, but it's not "the offset
(in bytes) required to cast from logical(8) to logical(4)". I suggest
also remording the start of the comment, and maybe renaming that
constant into "big_endian". But, if you don't want to do that last
part right now, I can understand :)
+ for (n = 0; n < m; n++, dest += 4, s++)
+ dest[endian_off] = *s;
+
+ for (n = 0; n < length - (int) w; n++, dest++)
+ dest[endian_off] = ' ';
In the second for loop, shouldn't that be "dest += 4" also? And also,
maybe try to make sure it's tested somewhere in the testcases...
@@ -1406,10 +1422,23 @@ formatted_transfer (st_parameter_dt *dtp
tmp = (char *) p;
/* Big loop over all the elements. */
- for (elem = 0; elem < nelems; elem++)
+ if (type == BT_CHARACTER && kind == 4)
{
- dtp->u.p.item_count++;
- formatted_transfer_scalar (dtp, type, tmp + size*elem, kind,
size);
+ size_t stride;
+ stride = size * 4;
+ for (elem = 0; elem < nelems; elem++)
+ {
+ dtp->u.p.item_count++;
+ formatted_transfer_scalar (dtp, type, tmp + stride*elem, kind,
size);
+ }
+ }
+ else
+ {
+ for (elem = 0; elem < nelems; elem++)
+ {
+ dtp->u.p.item_count++;
+ formatted_transfer_scalar (dtp, type, tmp + size*elem, kind,
size);
+ }
}
I'd prefer having a single loop and use the GFC_SIZE_OF_CHAR_KIND
macro. Something like that:
/* Big loop over all the elements. */
size_t stride = (type == BT_CHARACTER ? size *
GFC_SIZE_OF_CHAR_KIND(kind) : size;
for (elem = 0; elem < nelems; elem++)
{
dtp->u.p.item_count++;
formatted_transfer_scalar (dtp, type, tmp + stride*elem, kind,
size);
}
(not tested). I actually see you've done that in list_formatted_write
() already, so it's probably that you forgot it here.
+/* The primary difference between write_a_char4 and write_a is
that we have to
+ deal with writing from the first byte of the 4-byte character
and take care
+ of endianess. This currently implements encoding="default"
which means we
+ ignore the upper 3 bytes of the character. TODO: Implement
encoding="UTF-8"
+ which will process all 4 bytes and translate to the encoded
output. */
Well, we shouldn't really ignore them as we do want to check them and
potentially translate into a '?'. See below.
+ if (source[kk + endian_off] == '\n')
That doesn't work if, for example, you have a wide character c such
that (c % 255) == '\n'. You need to correctly extract source as a
gfc_char4_t and compare that to '\n' directly.
+ for (j = k = 0; j < bytes; j++, k += 4)
+ p[j] = source[k + endian_off];
Here, shouldn't you check for too large characters and print '?'
instead? See below.
+ for (j = k =0; j < bytes; j++, k += 4)
+ p[j] = source[k + endian_off];
Same here.
+ for (j = k = 0; j < wlen; j++, k += 4)
+ p[j] = source[k + endian_off];
+ else
+ {
+ memset (p, ' ', wlen - len);
+ for (j = k = wlen - len; j < wlen; j++, k += 4)
+ p[j] = source[k + endian_off];
And here (twice).
if (d == ' ')
- memcpy (p, source, length);
+ for (i = 0; i < length; i++)
+ p[i] = (unsigned int) source[i * kind] > 255 ?
+ '?' : source[i * kind + endian_off];
I think you should make a macro of that check:
#define FOLD_TO_CHAR1(x) ((x) > 255 ? '?' : (unsigned char) x)
Furthermore, I don't follow the logic here: source[i * kind] might
not be the significant thing. Hum... that one is a mess, and I think
we can't do it without having two loops depending on kind value. For
gfc_char4_t, you need to extract the value from source like that:
*(gfc_char4_t *)&source[i * kind]
and call the FOLD_TO_CHAR1 macro on that. For kind 1, just do the
usual thing.
- *p++ = source[i];
- if (source[i] == d)
+ *p++ = (unsigned int) source[i * kind] > 255 ?
+ '?' : source[i * kind + endian_off];
+ if ((unsigned int) source[i * kind] <= 255
+ && source[i * kind + endian_off] == d)
*p++ = d;
Same here.
FX
--
François-Xavier Coudert
http://www.homepages.ucl.ac.uk/~uccafco/