This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

TRIM and TRIM_LEN


Hi,

I've just written a CSV parser because I need to import data originating from MS Access into my MySQL database and doing so via CSV seems to be the best way. It is no very sophisticated code and probably not implemented very clever, but anyways my importer beats MS Access' CSV export by an order of magnitude; so gfortran's IO is surely not the worst possible :)

On the other hand, to counteract the unknown-string-length-problem, I use strings with a fixed "buffer size" of 1024 or 256 characters for whole lines and CSV tokens read. On those, I then do TRIMs and concatenate them back to form SQL statements; and because TRIM (or TRIM_LEN for the matter) has to skip over all those 1024 blanks, profiling showed that I'm spending 30% of time in libgfortran's trim and another 30% in libgfortran's trim_len. (Comments on how to do this better welcome, though the situation is ok with me at the moment, I won't do much imports.)

A look at the code gave me some ideas for improvement:

1) Does TRIM really have to allocate and copy the trimmed string to a new memory location? In theory, it should be possible to just set the length to some smaller value, shouldn't it? But this may be problematic in some situations, I don't know... What do you think?

2) Minor one, the TRIM code could use the TRIM_LEN code for the calculation of trimmed length, so we don't have to duplicate this code.

3) Inspired by how glibc does strlen, we could do the search over trailing blanks in four-byte-steps (or eight on 64-bit systems). Just take 0x20202020 and compare the string cast to int* (special handling needed for trailing bytes of course so we get to 4-byte-alignment). This should work nicely and could speed up the search up to 4 or 8 times. At least for KIND=1 we could do this and use the naive search code for wide strings.

I suspect that TRIM / TRIM_LEN *is* in fact called very often on strings far longer than the trimmed one because my reason may be common; so this could really help. What do you think about those ideas? If you don't have any arguments against, I will try to ready a patch if I find time to do so.

Cheers,
Daniel

--
Done:  Arc-Bar-Cav-Rog-Sam-Tou-Val-Wiz
To go: Hea-Kni-Mon-Pri-Ran


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]