This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Optimising std::find on x86 and PPC


Hello,

I recently tried changing the std::find random_access overload to change the main loop from:

difference_type __trip_count = (__last - __first) >> 2;
for(; __trip_count > 0 ; --__trip_count) { if(*__first = __val) return __first; ++__first; (4 times) }


to:

Iterator __newlast = __last - (__last - __first) % 4;
for( ; __first < __newlast;){ if(*__first = __val) return __first; ++__first; (4 times) }


This knocked about 30% off the time taken on x86 (Note that in a final version I'd change the %4 into some kind of &ing and/or shifting).

Unfortunatly, a quick test on Mac OS X by Andrew Pinski (thank you!) found that this slightly decreased both performance in terms of both space and time on the, as this new version will no longer use the specialised "count" operator.

Seeing as so many functions use find internally, it seems silly to not include some kind of improvement. However at the same time of course we don't want to damage the OSX performance. I suspect the reason the loop optimisers can't deal with this by themselves is because we unroll this loop 4 iterations manually, which I suspect confuses it.

If anyone with more knowledge than me knows either how I could tweak this code so it optimises well on both x86 or PPC, or how hard it might be to poke the optimiser so one version of this code can be efficent on both processors.

The last option of course is to start having different code for different processors.. I'd imagine doing so would only be a route of last resort however.

Chris



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]