This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] Optimize manual byte swap implementations -refreshed
- From: Michael Meissner <meissner at linux dot vnet dot ibm dot com>
- To: Andreas Krebbel <krebbel at linux dot vnet dot ibm dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Mon, 15 Jun 2009 13:04:40 -0400
- Subject: Re: [PATCH] Optimize manual byte swap implementations -refreshed
- References: <20090612102811.GA10314@bart>
On Fri, Jun 12, 2009 at 12:28:11PM +0200, Andreas Krebbel wrote:
> here is a refreshed version of the bswap optimization pass. I again
> tried to do some measurements with builds of the Linux kernel. This
> time I've also tried to evaluate how much of the time is consumed by
> just walking the statements.
> Unfortunately I must admit that there is quite a deviation in the
> results. At least for the statement walk measurements the standard
> deviation is so high that you should look at it with care. My
> statistics teacher probably would kick me through the hallway for
> these numbers ;)
> With the latest version - after integrating some more comments from
> Richard - the overhead went down to 0.24%.
> I've built the Linux kernel with -j4 (version 2.6.28) 5 times. The
> timings show the total time spent in user space measured with the
> \time command - not the bash builtin.
> x86_64 Intel Quad Core 9550 8GB 2.83 GHz
> GCC svn revision: 147107
> clean stmt walk only optimized
> 3599.21s 3599.23s 3607.28s
> 3604.17s 3609.16s 3608.33s
> 3600.32s 3601.75s 3610.47s
> 3600.49s 3608.81s 3611.62s
> 3601.26s 3604.6s 3611.51s
> +-1.87s +-0.05% +-4.34s +-0.12% +-1.95s +-0.05% <- standard deviation
> 3601.09 3604.71 +0.10% 3609.84 +0.24%
> Bootstrapped on x86_64. No regressions.
> Ok for mainline?
I'm just wondering out loud whether it would be useful to add tests of bswap
from a memory location in addition to a register (rs6000's main bswap is
load from and store to memory, and from s390.md, it looks like the s390 also
has a bswap from memory). Most of the other bswap targets like the x86 are
register only, so presumably it would would work also.
Now, I can certainly put in extra powerpc tests, but I'm wondering if it would
be useful to add the tests as global tests.
Also, getting back to the issue raised in my latest bswap patches, I wonder
whether it would be useful to move bswap16 to generic, and add bswaphi support
to your patches? I know the powerpc would find it useful, and IIRC, you could
do a 16-bit rotate on x86.
Michael Meissner, IBM
4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA