This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Re: libstdc++/4150: catastrophic performance decrease in C++ code
>>>>> "Loren" == Loren James Rittle <rittle@latour.rsch.comm.mot.com> writes:
> Under the current architecture (which I have only ever tweaked for
> performance, compliance and QoS of interactive cases not dictated by
> standard), the whole reason for the backup to the point before the
> read is that until a character is actually consumed by the
> higher-layer of libstdc++-v3 IO, the lower-layer C stdio file-pointer
> must not appear to move forward w.r.t. other C stdio. Granted it
> seems less-than-ideal to always use that algorithm even when not
> sync'd to stdio.
Then I suppose we should use a buffer size of 0.
> It is why I told RTH the other day in an e-mail that I thought some
> basic re-architecture would be required to solve all performance
> issues related to outstanding libstdc++-v3 PRs. When the higher-layer
> knows it will consume more than X characters in sync'd IO cases, it
> should be able to pull >1&<X characters from the lower-layer (current
> architecture limits us to pulls of 1 character). Or, if the
> higher-layer knows it is looking for a newline character (another very
> common case), it should be able to use the C stdio optimized routine
> to pull >1 character from the lower layer (bounded only by newline or
> the provided buffer size, aka the fgets function call). Under a
> re-architecture, it seems to me that only when the higher-layer of
> libstdc++-v3 is in a scanning mode not directly supported by libc that
> it must conditionally pull 1 character at a time through the layer
> when sync'd to stdio
Makes sense to me.
> Now, I actually have no idea if the abstraction layer dictated by the
> standard even allows these optimizations. I looked at this situation >6
> months ago and I actually think not.
Why not? It seems to me that the optimizations you suggest would conform
fine to the spec for xs{put,get}n. The spec for basic_streambuf::xsgetn
talks about implementation "as if" by repeated calls to sbumpc, but then
also says that derived classes can provide more efficient implementations.
To optimize getline, we'd need to introduce a virtual helper function in
streambuf, but I don't see any reason why that would violate the standard.
> With your patch (plus the removal of the related _GLIBCPP_AVOID_FSEEK
> region in src/ios.cc), I see one automatic regression here:
> assertion "(off_2 == (off_1 + 2 + 1 + 1))" failed: file
> "[...]/27_io/filebuf_virtuals.cc", line 428
> FAIL: 27_io/filebuf_virtuals.cc execution test
Yep, I'm aware of that. I knew that the patch I posted was incomplete; it
was meant more as a concrete illustration of my proposal. I'm still
working on it.
> [1] I don't know if this is widely known information thus I want to
> make sure you tested my patch to enable _GLIBCPP_AVOID_FSEEK on
> Linux properly. If you bootstrap all of gcc, then when
> libstdc++-v3 is built, it will be built with flags set by
> top-level Makefile (nominally, `-O2 -g'). If you later run make
> in libstdc++-v3, it will rebuild (some/all?) files with `-O0 -g'
> (except stuff built in libmath which appears to get top-level
> flags)... IMHO, the only way to test performance patches in
> libstdc++-v3, is to `rm -rf <target>/libstdc++-v3' and rerun make
> at top-level. This way libstdc++-v3 is built exactly as when it
> is bootstrapped.
I'm aware of the difference; in all cases, I was building without
optimization, on the assumption that the calls to the C layer would be
where we were spending our time. So I was comparing apples to apples, but
perhaps not the most useful apples. :)
Jason