This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Optimize manual byte swap implementations v3


Mark Mitchell wrote:
Andreas Krebbel wrote:

The timings look much better now:
8.05s			8.097s		0.58%	

Certainly better, but still significant.

These are only the costs for walking over the ssa statements looking for OR operations plus a bit more work if an OR has been found. With the tweaks I've added after the first reviews I don't think this can be done significantly faster. Please also consider that this might also be the base for further optimizations on vectors as Richard pointed out. For the vector optimizations the stmt walk is needed as well so in the end the added compile time overhead will just be the check if it is an OR operation.


What is the performance improvement provided by this pass on what
benchmarks on what CPUs?  You may have indicated this elsewhere, in
which case I apologize for asking a redundant question; please just
point me at the URL.

Unfortunately I'm not able to come up with performance numbers. I agree that this is a special purpose optimization but byte swaps occur much more often than one might think at first glance. A quick scan reveals that the Linux Kernel uses over 3000 bswaps (sha*, wpa (wlan), ...) with the default config on x86_64, libc over 200 and openssh about 10. Ok, the optimization doesn't help x86 for these packages since they already make heavy use of inline assemblies to implement bswaps.


But how would you recommend an application developer should implement a byte swap which is neither bound to GCC (even to a specific version) by using the bswap builtins nor bound to GCC and a specific CPU by using inline assemblies? I think the only way is to use a bunch of ORs, ANDs and SHIFTs. With the bswap pass you can use such a compiler- and platform-independent implementation without losing performance on Linux with GCC and an architecture which is able to do it better.

Another point is other programming languages. The bswap optimizer is pretty well able to optimize bswaps implemented in Fortran, Java and others. The Integer.reverseBytes method in the Java library is already detected and with the bswap builtins enabled for Java the library can benefit from hw instructions.

I know this is no replacement for tough performance numbers so I'll keep looking for an appropriate benchmark or application.

Bye,

-Andreas-


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]