This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch] Batch of basic_string correctness and performance work


Paolo Carlini wrote:

Gawain Bolton wrote:

Hi Paolo,

I am seriously worried about performance improvements obtained by inlining functions. I'm not disputing the performance measurements you made below, but this type of test is very much contrived I'm sure you'll agree.


Which "type of test"? I'm sure you don't really think I have only measured and tested and so on, only that couple of testcases in the performance testsuite...

The type of contrived performance testing I'm referring to is the 21_strings/string_append.cc test where you have a for loop executed 10 000 000.



Also, more important perhaps, the reason why those functions are inlined is because are small, much smaller than before, and reserve is still out of line. I cannot believe that seriously you don't want to inline this


 void
 push_back(_CharT __c)
 {
   const size_type __len = 1 + this->size();
   if (__len > this->capacity() || _M_rep()->_M_is_shared())
     this->reserve(__len);
   traits_type::assign(_M_data()[this->size()], __c);
   _M_rep()->_M_set_length_and_sharable(__len);
 }

which basically is a conditional and 4-5 assignments (+ reserve non-inline) and now is intrinsically 4 times faster.

Yes this is exactly the type of code I'm against inlining. It contains 2 tests which will never benefit from branch prediction, which for modern CPUs is a major handicap.


Also, this function has a non-negligeable amount of instructions. These additional instructions will impact the CPU's code cache.


As for the sizes, sometimes are slightly smaller, sometimes slightly bigger and in any case we are talking about differences of order << 1% in the static stripped executable. For instance, string_append.cc, that basically only uses basic_string, is less than 0.5% bigger.

Please, let's look at the difference number in terms of octets or instructions involved. Looking at a percentage increase is silly as it depends on the application. Furthermore, looking at the total size of a statically linked executable drastically underestimates the size increase.



That said, all those append (and operator+=), in the present form perform much better anyway, also if not-inlined (see, f.i., append(const _CharT*, size_type) or, better example, append(const basic_string&, size_type, size_type), which I purposedly kept off-line), therefore, please provide a little bit of evidence that one of your applications would take advantage from moving the functions out of line and I will happily do that!

Evidence? How about "Optimizing Pixomatic for Modern x86 Processors: Part III" in DDJ, October 2004 (http://www.ddj.com/articles/2004/0410) which clearly states the non-obvious effect branch prediction has on performance. Although the article is Pentium specific, the principles of branch prediction are equally applicable to superscalar processor architectures in general.


I think that perhaps libstdc++ developers need a guideline on inlining. Something like: "No functions containing more than two C++ statements shall be inlined." is a suggestion.

And yes of course template functions and template based classes are exceptions to the rule.

Cheers,


Gawain




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]