This is the mail archive of the
fortran@gcc.gnu.org
mailing list for the GNU Fortran project.
Re: [patch, fortran] Wide character I/O Part 1.1 Round 2
- From: FX <fxcoudert at gmail dot com>
- To: Jerry DeLisle <jvdelisle at verizon dot net>
- Cc: Fortran List <fortran at gcc dot gnu dot org>, gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 9 Jun 2008 00:13:43 +0100
- Subject: Re: [patch, fortran] Wide character I/O Part 1.1 Round 2
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:to:mime-version:content-type :resent-date:message-id:cc:content-transfer-encoding:resent-to:from :subject:resent-from:resent-message-id:date:x-mailer; bh=NrMqpOQI0bTc7rUU90Q+kU34dqeauD+BqgG122u0axM=; b=aahOwAZc3QFL7sLHibKJ7WxJOuGEcQDxTi2/L5znNXzdRGayXUyU2KywnmF8taIP+3 VA70vBtvsia58M1SHstqihjxkLgj3FcqJ5PNcyGUzSowh7BXKyi+X5l3LKi3qQmCp3v3 aPgCyRmaIxsjGb/uP42qVzKeiit+gKlMPJABo=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=to:mime-version:content-type:resent-date:message-id:cc :content-transfer-encoding:resent-to:from:subject:resent-from :resent-message-id:date:x-mailer; b=iNwhOC/fKsVNAuFzX95DT2mQoNKzRpKobp8VnEClDrIhXEjvwfGsNPk2hc6zK5iYV/ 6NhyRDwjKMiC2hXlDG+Gf3Pj+lChQgUPcH0A7mY/2RgUCvJUCQ/+SaS5NQbbg5Lx8+AK 8ZVr3O5Aav7yKjH4Ghg7RrY8s6jQ3D5MYYCxU=
Hi Jerry,
Sorry I took so long to review your patch. Still not fully OK for me,
but we're definitely very close to it. We discussed it on IRC
already, but for the record: I suggest you simplify things by dumping
endian_off completely (it's not needed, and adds loads from memory
and additions) and access things by casting them to the right type.
Here's the detailed review:
+ for (i = j = 0; i < m; i++, j += 4)
+ q[j + endian_off] = dtp->u.p.saved_string[i];
If you do that, you need to be absolutely sure that all bytes in the
saved_string has prevously been zero'ed. Is it so? But I don't
understand why you don't want to do it that way:
for (i = j = 0; i < m; i++, j++)
((gfc_char4_t *) q)[j] = (unsigned char) dtp->u.p.saved_string
[i];
+ for (i = m, j = m * 4; i < (int) size - m; i++, j += 4)
+ q[j + endian_off] = ' ';
Same here, if you do it that way, you need to be sure that q was
previously zero'ed out. (And it may break in the future is someone
decides to not zero this buffer after all.) I think it'd be best
rewritten as:
for (i = j = m; i < (int) size - m; i++, j++)
((gfc_char4_t *) q)[j] = (unsigned char) ' ';
+ for (n = 0; n < m; n++, dest += 4, s++)
+ dest[endian_off] = *s;
+
+ for (n = 0; n < length - (int) w; n++, dest += 4)
+ dest[endian_off] = ' ';
Same again.
+ /* Handle wide chracters. */
+ if (type == BT_CHARACTER && kind == 4)
+ {
+ sz = (size_t) kind;
+ for (i = 0; i < size * nelems; i++)
+ {
+ read_block_direct (dtp, buffer, &sz);
+ reverse_memcpy (p, buffer, kind);
+ p += kind;
+ }
+ return;
+ }
Couldn't you just do the same thing as for complex, and have it use
the same loop as all others with tweaking the values of nelems and
size? Something like:
if (type == BT_CHARACTER && kind != 1)
{
nelems *= size;
size = kind;
}
+ /* Handle wide chracters. */
+ if (type == BT_CHARACTER && kind == 4)
+ {
+ sz = (size_t) kind;
+ for (i = 0; i < size * nelems; i++)
+ {
+ reverse_memcpy(buffer, p, sz);
+ p += sz;
+ write_buf (dtp, buffer, sz);
+ }
+ return;
+ }
Same thing here: do "nelems *= size" and "size = kind", and I believe
the "standard" a few lines down will do fine (and avoid code
duplication).
+/* This macro casts a kind=1 character to a kind=4. */
+#define GFC_CHAR4(x) (*(gfc_char4_t *)&x)
The comment is wrong: it takes an element of an array, and returns
the 4 bytes that start with this element, as an character(kind=4).
But that's a bit convoluted...
+ /* Scan the source string looking for '\n' and convert it if
found. */
+ for (i = kk = 0; i < wlen; i++, kk += 4)
+ {
+ if (GFC_CHAR4(source[kk]) == '\n')
It's weird to do it that way; I think the code gives the good results
in all cases (so I'm not objecting to it strongly), but I think it's
certainly hard to read. I'd be tempted to have kk go by increments of
1 instead of 4, and do:
if (((gfc_char4_t *)source)[kk] == (unsigned char) '\n')
(I'm not even sure if the "unsigned char" cast is needed, but it
can't hurt.)
+ {
+ /* Write out the previously scanned characters in the
string. */
+ if (bytes > 0)
+ {
+ p = write_block (dtp, bytes);
+ if (p == NULL)
+ return;
+ for (j = k = 0; j < bytes; j++, k += 4)
The you'd also make k go by increments of 1.
+ p[j] = GFC_CHAR4(source[k]) > 255 ?
+ '?' : source[k + endian_off];
And that would be:
p[j] = (((gfc_char4_t *) source)[k] > 255 ? '?' : (unsigned char)
((gfc_char4_t *) source)[k]);
Once again, remove the need for endian_off and let unsigned
arithmetics wrapping do its job.
+ for (j = k = 0; j < bytes; j++, k += 4)
+ p[j] = GFC_CHAR4(source[k]) > 255 ? '?' : source[k +
endian_off];
And again, same thing here.
+ for (j = k = 0; j < wlen; j++, k += 4)
+ p[j] = GFC_CHAR4(source[k]) > 255 ?
+ '?' : source[k + endian_off];
And there. (Macro?)
+ p[j] = GFC_CHAR4(source[k]) > 255 ?
+ '?' : source[k + endian_off];
And there. (Macro, for sure!)
+ for (i = j = 0; i < length; i++, j += 4)
+ if (GFC_CHAR4(source[j]) == (gfc_char4_t) d)
+ extra++;
Same here. Actually, I believe you could have a macro that looks up
for a character in a char4 string:
#define CHAR4_AT_POS(string,pos) (((gfc_char4_t *) (string))[pos])
That way, you could use directly CHAR4_AT_POS(source, j)
and same thing everywhere you used GFC_CHAR4. That'd be way clearer
(at least to me, but other opinions are welcome!).
+ for (i = j = 0; i < length; i++, j += 4)
+ p[i] = GFC_CHAR4(source[j]) > 255 ?
+ '?' : source[j + endian_off];
Again.
+ for (i = j = 0; i < length; i++, j += 4)
+ {
+ *p++ = GFC_CHAR4(source[j]) > 255 ?
+ '?' : source[i * kind + endian_off];
+ if (GFC_CHAR4(source[j]) == (gfc_char4_t) d)
+ *p++ = d;
+ }
And again.
FX
--
François-Xavier Coudert
http://www.homepages.ucl.ac.uk/~uccafco/