[PATCH] Optimize BB sorting in domwalk
Jeff Law
law@redhat.com
Mon Jul 24 22:14:00 GMT 2017
On 07/24/2017 05:29 AM, Alexander Monakov wrote:
> Profiling uses of qsort in GCC reveals that a significant portion of calls
> comes from domwalk.c where child nodes in dominator tree are reordered
> according to postorder numbering. However we know that in dominator trees
> the vast majority of nodes have 0, 2 or 3 children, and handling those
> cases separately allows to avoid qsort overhead.
>
> I don't have trunk figures handy, but on gcc-6 release-checking tramp3d
> compilation this would eliminate about 63% of _all_ qsort calls by count,
> or about 27% by time. Call count profile looks like:
>
> N=0 N=1 N=2 N=3 N=4 N=5 N=6 ... Total
> 16111 9406 228607 91870 20154 8317 3823 406971
>
> With this patch it would look like
>
> N=0 N=1 N=2 N=3 N=4 N=5 N=6 ... Total
> 16111 9406 32200 34132 16966 8022 3702 149177
>
> While at it, I've also added a clarifying comment to bb_postorder (it
> actually gives reverse postorder), simplified casts in cmp_bb_postorder
> and changed it to compute the result in a branchless fashion.
>
> Bootstrapped and regtested on x86-64, also verified that gcc/*.o files are
> unchanged except for domwalk.o and *-checksum.o. OK for trunk?
>
> * domwalk.c (cmp_bb_postorder): Simplify.
> (sort_bbs_postorder): New function. Use it...
> (dom_walker::walk): ...here to optimize common cases.
Is it really beneficial to write cmp_bb_postorder without simple and
trivial to understand conditionals? And if it is, then doesn't that
actually suggest that we should have a match.pd pattern to turn the
branchy code into a branchless sequence?
As Uli noted, we should be using std::swap.
Can you please repost ?
THanks,
Jeff
More information about the Gcc-patches
mailing list