This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Re: Optimising std::find on x86 and PPC
- From: Gabriel Dos Reis <gdr at integrable-solutions dot net>
- To: Chris Jefferson <caj at cs dot york dot ac dot uk>
- Cc: libstdc++ <libstdc++ at gcc dot gnu dot org>
- Date: 14 Dec 2004 16:52:25 +0100
- Subject: Re: Optimising std::find on x86 and PPC
- Organization: Integrable Solutions
- References: <41BEF98C.3000403@cs.york.ac.uk>
Chris Jefferson <caj@cs.york.ac.uk> writes:
| Hello,
|
| I recently tried changing the std::find random_access overload to
| change the main loop from:
|
| difference_type __trip_count = (__last - __first) >> 2;
| for(; __trip_count > 0 ; --__trip_count) { if(*__first = __val) return
| __first; ++__first; (4 times) }
|
| to:
|
| Iterator __newlast = __last - (__last - __first) % 4;
| for( ; __first < __newlast;){ if(*__first = __val) return __first;
| ++__first; (4 times) }
|
| This knocked about 30% off the time taken on x86 (Note that in a final
| version I'd change the %4 into some kind of &ing and/or shifting).
|
| Unfortunatly, a quick test on Mac OS X by Andrew Pinski (thank you!)
| found that this slightly decreased both performance in terms of both
| space and time on the, as this new version will no longer use the
| specialised "count" operator.
|
| Seeing as so many functions use find internally, it seems silly to not
| include some kind of improvement. However at the same time of course
| we don't want to damage the OSX performance. I suspect the reason the
| loop optimisers can't deal with this by themselves is because we
| unroll this loop 4 iterations manually, which I suspect confuses it.
|
| If anyone with more knowledge than me knows either how I could tweak
| this code so it optimises well on both x86 or PPC, or how hard it
| might be to poke the optimiser so one version of this code can be
| efficent on both processors.
|
| The last option of course is to start having different code for
| different processors.. I'd imagine doing so would only be a route of
| last resort however.
This is a compiler issue. You really do not want to clutter the
library with plateform-specific hacks because compiler deficiencies.
Convince middl-end andback-end poeple, you'll get better benefits.
-- Gaby