This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [patch, fortran] Wide character I/O Part 1

From: Jerry DeLisle <jvdelisle at verizon dot net>
To: FX <fxcoudert at gmail dot com>
Cc: Fortran List <fortran at gcc dot gnu dot org>, gcc-patches <gcc-patches at gcc dot gnu dot org>
Date: Thu, 29 May 2008 16:57:54 -0700
Subject: Re: [patch, fortran] Wide character I/O Part 1
References: <483E0FBE.8090301@verizon.net> <1BB99FEE-C1EC-4EDB-83D4-8D0FD7B54099@gmail.com>

FX wrote:

Here's my review. I don't think the patch is OK, unless I made a mistake below (see the place where I talk about GFC_SIZE_OF_CHAR_KIND).

FX

Well, it works pretty good so far. :)

Thanks for quick review, see my replies below.

@@ -327,6 +328,13 @@ gfc_build_io_library_fndecls (void)
                      void_type_node, 3, dt_parm_type,
                      pvoid_type_node, gfc_int4_type_node);
+ iocall[IOCALL_X_CHARACTER_WIDE] = + gfc_build_library_function_decl (get_identifier + (PREFIX("transfer_character_wide")), + void_type_node, 4, dt_parm_type, + pvoid_type_node, gfc_int4_type_node, + gfc_int4_type_node); + iocall[IOCALL_X_REAL] = gfc_build_library_function_decl (get_identifier (PREFIX("transfer_real")), void_type_node, 3, dt_parm_type,
You give them gfc_int4_type_node type and they're just "int" in the library version. I know there are a few other places like that lying around, but we shouldn't introduce new ones :) I think the kind argument should be an int, and the length should be a gfc_charlen_type_node (which is really an integer(kind=4), but that's clearer that way).

OK, no problem, I was following what was there already. :)

     case BT_CHARACTER:
+      if (kind == 4)
+    {
+      if (se->string_length)
+        arg2 = se->string_length;
+      else
+        {
+          tmp = build_fold_indirect_ref (addr_expr);
+          gcc_assert (TREE_CODE (TREE_TYPE (tmp)) == ARRAY_TYPE);
+          arg2 = TYPE_MAX_VALUE (TYPE_DOMAIN (TREE_TYPE (tmp)));
+        }

Even though it's probable safe already, I'd be tempted to fold_convert() arg2 into a gfc_charlen_type_node, just to make sure.

Let me give this a try, again I just followed what was done before.

/* This is the offset (in bytes) required to cast from logical(8)* to logical(4)*. and still get the same result. Will be 0 for little-endian - machines and 4 for big-endian machines. */ + machines and 1 for big-endian machines. */ int l8_to_l4_offset = 0;
The thing is, you've corrected the values, but it's not "the offset (in bytes) required to cast from logical(8) to logical(4)". I suggest also remording the start of the comment, and maybe renaming that constant into "big_endian". But, if you don't want to do that last part right now, I can understand :)

Ha! Someone actually noticed this. I meant to mention in my post that we really ought to rename this. I was holding off because it is used in a few other places not related to my patch. I will just go ahead and fix it.

+  for (n = 0; n < m; n++, dest += 4, s++)
+      dest[endian_off] = *s;
+
+  for (n = 0; n < length - (int) w; n++, dest++)
+      dest[endian_off] = ' ';
In the second for loop, shouldn't that be "dest += 4" also? And also, maybe try to make sure it's tested somewhere in the testcases...

Yes, good catch.

@@ -1406,10 +1422,23 @@ formatted_transfer (st_parameter_dt *dtp
   tmp = (char *) p;
/* Big loop over all the elements. */ - for (elem = 0; elem < nelems; elem++) + if (type == BT_CHARACTER && kind == 4) { - dtp->u.p.item_count++; - formatted_transfer_scalar (dtp, type, tmp + size*elem, kind, size); + size_t stride; + stride = size * 4; + for (elem = 0; elem < nelems; elem++) + { + dtp->u.p.item_count++; + formatted_transfer_scalar (dtp, type, tmp + stride*elem, kind, size); + } + } + else + { + for (elem = 0; elem < nelems; elem++) + { + dtp->u.p.item_count++; + formatted_transfer_scalar (dtp, type, tmp + size*elem, kind, size); + } }
I'd prefer having a single loop and use the GFC_SIZE_OF_CHAR_KIND macro. Something like that:

/* Big loop over all the elements. */ size_t stride = (type == BT_CHARACTER ? size * GFC_SIZE_OF_CHAR_KIND(kind) : size; for (elem = 0; elem < nelems; elem++) { dtp->u.p.item_count++; formatted_transfer_scalar (dtp, type, tmp + stride*elem, kind, size); }

(not tested). I actually see you've done that in list_formatted_write() already, so it's probably that you forgot it here.

OK, I had them separate just for working through the logic of what I was trying to do (proof of principle) Good to refine now.

+/* The primary difference between write_a_char4 and write_a is that we have to + deal with writing from the first byte of the 4-byte character and take care + of endianess. This currently implements encoding="default" which means we + ignore the upper 3 bytes of the character. TODO: Implement encoding="UTF-8" + which will process all 4 bytes and translate to the encoded output. */
Well, we shouldn't really ignore them as we do want to check them and potentially translate into a '?'. See below.

I need to reword that. I am not ignoring and I am trying to check the upper three bytes and see if they are not hex 00,00,00 and if they are, transfer '?' See below:

+ if (source[kk + endian_off] == '\n')
That doesn't work if, for example, you have a wide character c such that (c % 255) == '\n'. You need to correctly extract source as a gfc_char4_t and compare that to '\n' directly.
+          for (j = k = 0; j < bytes; j++, k += 4)
+            p[j] = source[k + endian_off];
Here, shouldn't you check for too large characters and print '?' instead? See below.
+      for (j = k =0; j < bytes; j++, k += 4)
+        p[j] = source[k + endian_off];
Same here.
+    for (j = k = 0; j < wlen; j++, k += 4)
+      p[j] = source[k + endian_off];
+      else
+    {
+      memset (p, ' ', wlen - len);
+      for (j = k = wlen - len; j < wlen; j++, k += 4)
+        p[j] = source[k + endian_off];
And here (twice).
   if (d == ' ')
-    memcpy (p, source, length);
+    for (i = 0; i < length; i++)

The following is the check that transfers the '?' I will update if I missed some spots. Having a macro for this is OK, but a nit. :)

+      p[i] = (unsigned int) source[i * kind] > 255 ?
+         '?' : source[i * kind + endian_off];
I think you should make a macro of that check:

#define FOLD_TO_CHAR1(x) ((x) > 255 ? '?' : (unsigned char) x)

Furthermore, I don't follow the logic here: source[i * kind] might not be the significant thing.

I am not sure I understand your question here. i is the i'th character and multiplying by kind gives the index into the source string. source is a character byte pointer. When kind == 1, it has no effect. I am grabbing every fourth byte for kind = 4.

Hum... that one is a mess, and I think we

can't do it without having two loops depending on kind value. For gfc_char4_t, you need to extract the value from source like that:

*(gfc_char4_t *)&source[i * kind]

and call the FOLD_TO_CHAR1 macro on that. For kind 1, just do the usual thing.
-      *p++ = source[i];
-      if (source[i] == d)
+      *p++ = (unsigned int) source[i * kind] > 255 ?
+         '?' : source[i * kind + endian_off];
+      if ((unsigned int) source[i * kind] <= 255
+          && source[i * kind + endian_off] == d)
         *p++ = d;

I will take your comments and see what I can do with them. ;)

Cheers,

Jerry

References:
- [patch, fortran] Wide character I/O Part 1
  - From: Jerry DeLisle
- Re: [patch, fortran] Wide character I/O Part 1
  - From: FX

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]