This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] Optimize manual byte swap implementations v3
- From: Richard Guenther <rguenther at suse dot de>
- To: Andreas Krebbel <krebbel at linux dot vnet dot ibm dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Fri, 5 Jun 2009 11:19:55 +0200 (CEST)
- Subject: Re: [PATCH] Optimize manual byte swap implementations v3
- References: <20090209145520.GA32536@bart> <49931E43.firstname.lastname@example.org> <499414EA.email@example.com> <firstname.lastname@example.org> <email@example.com> <49971B86.firstname.lastname@example.org> <499727DB.email@example.com> <49986FE4.firstname.lastname@example.org> <email@example.com> <49F566D0.firstname.lastname@example.org> <20090605083856.GA6148@bart>
On Fri, 5 Jun 2009, Andreas Krebbel wrote:
> > What is lacking is to have a look at the performance overhead. I would
> > consider the kernel as some kind of worst case scenario and I will try
> > to do some benchmarking on my x86_64 machine with it.
> I've built the Linux kernel with -j4 (version 2.6.28) 5 times with and
> without the bswap patch:
> x86_64 Intel Quad Core 9550 8GB 2.83 GHz
> GCC svn revision: 147107
> user system elapsed
> 3602.04 314.83 17:02.93
> 3604.14 314.14 17:01.82
> 3602.02 315.20 17:07.43
> 3603.38 314.94 17:04.00
> 3603.64 314.67 17:03.13
> user system elapsed
> 3619.95 314.29 17:10.17
> 3618.13 316.58 17:11.39
> 3616.37 316.41 17:09.57
> 3619.10 318.04 17:12.32
> 3622.39 316.74 17:15.53
> 3619.19 +0.45%
> In the process 4786 byte swap implementations have been recognized and
> folded. Although the x86 kernel code uses an inline assembly for byte
> swaps the software implementation nevertheless stays visible to
> GCC. So I think this is a valid test also representative for other
> I think the run-time of the pass is dominated by the time needed to
> walk over the ssa statements. I doubt that tweaks in the recognition
> of bswap will buy a lot. Merging it with sincos would help here but
> would cost us the flexibility to move the pass to somewhere else. I
> can certainly do that if this is considered the way to go.
I think this is reasonable overhead, and indeed for kernel compiles
most of the compile-time spent anywhere is walking statements.
I just noticed ...
+ if ((int)TREE_INT_CST_LOW (TYPE_SIZE (lhs_type)) != n->size * BITS_PER_UNIT)
+ return false;
shouldn't this test against TYPE_PRECISION (lhs_type) instead of
For walking speed we are probably not good at CSEing the
repeated checks of gimple_assign_rhs_class (it does a non-inline
table lookup), so you might want to factor that call in find_bswap_1
(and use TYPE_PRECISION in there instead of TYPE_SIZE).
Do you really need TODO_update_ssa?
I suggest you re-post the final version of the patch (there were
some comments from others in the thread).