On Thu, Dec 25, 2008 at 09:32:57AM -0800, Jerry DeLisle wrote:
This is a Merry Christmas patch.
This patch recovers the performance from this regression by creating a
stream read_char function which is simply a trimmed down version of sread
(fd_read). I was actually surprised when I saw the test results. I
suspect that the simplification allows some better optimizations.
The patch also refactors next_char in list_read.c to eliminate goto's and
inlining a small portion of the "done:" code. The refactoring of
next_char alone gains 2.8% over current trunk. The use of the new
read_char function gains significant additional performance.
Using the countlines.f test case in the PR for comparison, average 5 runs.
gfortran 4.3: 3.357 seconds
gfortran 4.4 current trunk: 3.821 seconds
gfortran 4.4 patched: 3.164 seconds
This is a 5.7% improvement over 4.3 for this test case and 17%
improvement over current trunk.
I also believe this refactoring will make for some easier further
improvements. I don't know the status of Janne's patch so this patch may
end up being short lived. However, it is not very intrusive in the sense
that it is mostly reorganizing in simple ways our existing code paths.
Since it involves a regression, I think it would be OK for 4.4
Regression tested on x86-64.
OK to commit?
Jerry
Jerry,
I am seeing about a 10% performance improvement with the patch when
using...
gfortran -O countlines.f
to compile the testcase and using the temp4 file created by the maketemp4.f
program in the PR. I used average of the last five of ten runs each time to minimize
effects of any disk caching. What did you use for the test file? I noticed the
temp4 file has identical lines. It may not be unfair to use the same line length
but we should probably randomize the contents of the lines.
Jack
ps This was on x86_64-apple-darwin10.