[PATCH] Optimize manual byte swap implementations v3
Richard Guenther
rguenther@suse.de
Fri Jun 5 09:20:00 GMT 2009
On Fri, 5 Jun 2009, Andreas Krebbel wrote:
> > What is lacking is to have a look at the performance overhead. I would
> > consider the kernel as some kind of worst case scenario and I will try
> > to do some benchmarking on my x86_64 machine with it.
>
> I've built the Linux kernel with -j4 (version 2.6.28) 5 times with and
> without the bswap patch:
>
> x86_64 Intel Quad Core 9550 8GB 2.83 GHz
>
> GCC svn revision: 147107
>
> clean:
> user system elapsed
> 3602.04 314.83 17:02.93
> 3604.14 314.14 17:01.82
> 3602.02 315.20 17:07.43
> 3603.38 314.94 17:04.00
> 3603.64 314.67 17:03.13
>
> 3603.04
>
> patched:
> user system elapsed
> 3619.95 314.29 17:10.17
> 3618.13 316.58 17:11.39
> 3616.37 316.41 17:09.57
> 3619.10 318.04 17:12.32
> 3622.39 316.74 17:15.53
>
> 3619.19 +0.45%
>
> In the process 4786 byte swap implementations have been recognized and
> folded. Although the x86 kernel code uses an inline assembly for byte
> swaps the software implementation nevertheless stays visible to
> GCC. So I think this is a valid test also representative for other
> targets.
>
> I think the run-time of the pass is dominated by the time needed to
> walk over the ssa statements. I doubt that tweaks in the recognition
> of bswap will buy a lot. Merging it with sincos would help here but
> would cost us the flexibility to move the pass to somewhere else. I
> can certainly do that if this is considered the way to go.
I think this is reasonable overhead, and indeed for kernel compiles
most of the compile-time spent anywhere is walking statements.
I just noticed ...
+ if ((int)TREE_INT_CST_LOW (TYPE_SIZE (lhs_type)) != n->size * BITS_PER_UNIT)
+ return false;
shouldn't this test against TYPE_PRECISION (lhs_type) instead of
TYPE_SIZE?
For walking speed we are probably not good at CSEing the
repeated checks of gimple_assign_rhs_class (it does a non-inline
table lookup), so you might want to factor that call in find_bswap_1
(and use TYPE_PRECISION in there instead of TYPE_SIZE).
Do you really need TODO_update_ssa?
I suggest you re-post the final version of the patch (there were
some comments from others in the thread).
Thanks,
Richard.
More information about the Gcc-patches
mailing list