[PATCH] Optimize BB sorting in domwalk

Jeff Law law@redhat.com
Mon Jul 24 22:14:00 GMT 2017


On 07/24/2017 05:29 AM, Alexander Monakov wrote:
> Profiling uses of qsort in GCC reveals that a significant portion of calls
> comes from domwalk.c where child nodes in dominator tree are reordered
> according to postorder numbering.  However we know that in dominator trees
> the vast majority of nodes have 0, 2 or 3 children, and handling those
> cases separately allows to avoid qsort overhead.
> 
> I don't have trunk figures handy, but on gcc-6 release-checking tramp3d
> compilation this would eliminate about 63% of _all_ qsort calls by count,
> or about 27% by time.  Call count profile looks like:
> 
> N=0	N=1	N=2	N=3	N=4	N=5	N=6 	...	Total
> 16111   9406    228607  91870   20154   8317    3823            406971
> 
> With this patch it would look like
> 
> N=0	N=1	N=2	N=3	N=4	N=5	N=6	...	Total
> 16111   9406    32200   34132   16966   8022    3702            149177
> 
> While at it, I've also added a clarifying comment to bb_postorder (it
> actually gives reverse postorder), simplified casts in cmp_bb_postorder
> and changed it to compute the result in a branchless fashion.
> 
> Bootstrapped and regtested on x86-64, also verified that gcc/*.o files are
> unchanged except for domwalk.o and *-checksum.o.  OK for trunk?
> 
> 	* domwalk.c (cmp_bb_postorder): Simplify.
> 	(sort_bbs_postorder): New function.  Use it...
> 	(dom_walker::walk): ...here to optimize common cases.
Is it really beneficial to write cmp_bb_postorder without simple and
trivial to understand conditionals?  And if it is, then doesn't that
actually suggest that we should have a match.pd pattern to turn the
branchy code into a branchless sequence?

As Uli noted, we should be using std::swap.

Can you please repost ?

THanks,
Jeff



More information about the Gcc-patches mailing list