This is the mail archive of the
fortran@gcc.gnu.org
mailing list for the GNU Fortran project.
TRIM and TRIM_LEN
- From: Daniel Kraft <d at domob dot eu>
- To: Fortran List <fortran at gcc dot gnu dot org>
- Date: Fri, 16 Jan 2009 19:08:08 +0100
- Subject: TRIM and TRIM_LEN
Hi,
I've just written a CSV parser because I need to import data originating
from MS Access into my MySQL database and doing so via CSV seems to be
the best way. It is no very sophisticated code and probably not
implemented very clever, but anyways my importer beats MS Access' CSV
export by an order of magnitude; so gfortran's IO is surely not the
worst possible :)
On the other hand, to counteract the unknown-string-length-problem, I
use strings with a fixed "buffer size" of 1024 or 256 characters for
whole lines and CSV tokens read. On those, I then do TRIMs and
concatenate them back to form SQL statements; and because TRIM (or
TRIM_LEN for the matter) has to skip over all those 1024 blanks,
profiling showed that I'm spending 30% of time in libgfortran's trim and
another 30% in libgfortran's trim_len. (Comments on how to do this
better welcome, though the situation is ok with me at the moment, I
won't do much imports.)
A look at the code gave me some ideas for improvement:
1) Does TRIM really have to allocate and copy the trimmed string to a
new memory location? In theory, it should be possible to just set the
length to some smaller value, shouldn't it? But this may be problematic
in some situations, I don't know... What do you think?
2) Minor one, the TRIM code could use the TRIM_LEN code for the
calculation of trimmed length, so we don't have to duplicate this code.
3) Inspired by how glibc does strlen, we could do the search over
trailing blanks in four-byte-steps (or eight on 64-bit systems). Just
take 0x20202020 and compare the string cast to int* (special handling
needed for trailing bytes of course so we get to 4-byte-alignment).
This should work nicely and could speed up the search up to 4 or 8
times. At least for KIND=1 we could do this and use the naive search
code for wide strings.
I suspect that TRIM / TRIM_LEN *is* in fact called very often on strings
far longer than the trimmed one because my reason may be common; so this
could really help. What do you think about those ideas? If you don't
have any arguments against, I will try to ready a patch if I find time
to do so.
Cheers,
Daniel
--
Done: Arc-Bar-Cav-Rog-Sam-Tou-Val-Wiz
To go: Hea-Kni-Mon-Pri-Ran