This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Optimize manual byte swap implementations v3


On Fri, 5 Jun 2009, Andreas Krebbel wrote:

> > What is lacking is to have a look at the performance overhead. I would  
> > consider the kernel as some kind of worst case scenario and I will try  
> > to do some benchmarking on my x86_64 machine with it.
> 
> I've built the Linux kernel with -j4 (version 2.6.28) 5 times with and
> without the bswap patch:
> 
> x86_64 Intel Quad Core 9550 8GB 2.83 GHz
> 
> GCC svn revision: 147107
> 
> clean:
> user	system	elapsed
> 3602.04	314.83	17:02.93
> 3604.14	314.14	17:01.82
> 3602.02	315.20	17:07.43
> 3603.38	314.94	17:04.00
> 3603.64	314.67	17:03.13
> 
> 3603.04	
> 
> patched:
> user	system	elapsed
> 3619.95	314.29	17:10.17
> 3618.13	316.58	17:11.39
> 3616.37	316.41	17:09.57
> 3619.10	318.04	17:12.32
> 3622.39	316.74	17:15.53
> 
> 3619.19 +0.45%
> 
> In the process 4786 byte swap implementations have been recognized and
> folded.  Although the x86 kernel code uses an inline assembly for byte
> swaps the software implementation nevertheless stays visible to
> GCC. So I think this is a valid test also representative for other
> targets.
> 
> I think the run-time of the pass is dominated by the time needed to
> walk over the ssa statements.  I doubt that tweaks in the recognition
> of bswap will buy a lot.  Merging it with sincos would help here but
> would cost us the flexibility to move the pass to somewhere else.  I
> can certainly do that if this is considered the way to go.

I think this is reasonable overhead, and indeed for kernel compiles
most of the compile-time spent anywhere is walking statements.

I just noticed ...

+  if ((int)TREE_INT_CST_LOW (TYPE_SIZE (lhs_type)) != n->size * BITS_PER_UNIT)
+    return false;

shouldn't this test against TYPE_PRECISION (lhs_type) instead of 
TYPE_SIZE?

For walking speed we are probably not good at CSEing the
repeated checks of gimple_assign_rhs_class (it does a non-inline
table lookup), so you might want to factor that call in find_bswap_1
(and use TYPE_PRECISION in there instead of TYPE_SIZE).

Do you really need TODO_update_ssa?

I suggest you re-post the final version of the patch (there were
some comments from others in the thread).

Thanks,
Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]