This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] [RFC, GCC 4.8] Optimize conditional moves from adjacent memory locations


On Fri, Feb 10, 2012 at 12:46 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> I was looking at the routelookup EEMBC benchmark and it has code of the form:
>
> Â while ( this_node->cmpbit > next_node->cmpbit )
> Â Â{
> Â Â Âthis_node = next_node;
>
> Â Â Âif ( proute->dst_addr & (0x1 << this_node->cmpbit) )
> Â Â Â Â next_node = this_node->rlink;
> Â Â Âelse
> Â Â Â Â next_node = this_node->llink;
> Â Â}

Hmm, this looks like we could do this on the tree level better as we
have more information about this_node there.  Like we know that we
load from this_node->cmpbit before we do either of the branches.  So
can move both of those loads before the branch and then we get the
ifcvt for free.

Thanks,
Andrew Pinski



>
> This is where you have a binary tree/trie and you are iterating going down
> either the right link or left link until you find a stopping condition. ÂThe
> code in ifcvt.c does not handle optimizing these cases for conditional move
> since the load might trap, and generates code that does if-then-else with loads
> and jumps.
>
> However, since the two elements are next to each other in memory, they are
> likely in the same cache line, particularly with aligned stacks and malloc
> returning aligned pointers. ÂExcept in unusual circumstances where the pointer
> is not aligned, this means it is much faster to optimize it as:
>
> Â while ( this_node->cmpbit > next_node->cmpbit )
> Â Â{
> Â Â Âthis_node = next_node;
>
> Â Â Ârtmp = this_node->rlink;
> Â Â Âltmp = this_node->llink;
> Â Â Âif ( proute->dst_addr & (0x1 << this_node->cmpbit) )
> Â Â Â Â next_node = rtmp;
> Â Â Âelse
> Â Â Â Â next_node = ltmp;
> Â Â}
>
> So I wrote some patches to do this optimization. ÂIn ifcvt.c I added a new hook
> that allows the backend to try and do conditional moves if the machine
> independent code doesn't handle the special cases that the machine might have.
>
> Then in rs6000.c I used that hook to see if the conditional moves are adjacent,
> and do the optimization.
>
> I will note that this type of code comes up quite frequently since binary trees
> and tries are common data structure. ÂThe file splay-tree.c in libiberty is one
> place in the compiler tree that has conditional adjacent memory moves.
>
> So I would like comments on the patch before the 4.8 tree opens up. ÂI feel
> even if we decide not to add the adjacent memory move patch, the hook is
> useful, and I have some other ideas for using it for the powerpc.
>
> I was thinking about rewriting the rs6000 dependent parts to make it a normal
> optimization available to all ports. ÂIs this something we want as a normal
> option? ÂAt the moment, I'm not sure it should be part of -O3 because it is
> possible for a trap to occur if the pointer straddles a page boundary and the
> test condition would guard against loading up the second value. ÂHowever,
> -Ofast might be an appropriate place to do this optimization.
>
> At this time I don't have test cases, but I would add them for the real
> submission. ÂI have bootstraped the compiler on powerpc with this option
> enabled and it passed the bootstrap and had no regressions in make check. ÂI
> will do a spec run over the weekend as well.
>
> In addition to libibery/splay-tree.c the following files in gcc have adjacent
> conditional moves that this code would optimize:
>
> Â Â Â Â Â Âcfg.c
> Â Â Â Â Â Âc-typeck.c
> Â Â Â Â Â Âdf-scan.c
> Â Â Â Â Â Âfold-const.c
> Â Â Â Â Â Âgraphds.c
> Â Â Â Â Â Âira-emit.c
> Â Â Â Â Â Âomp-low.c
> Â Â Â Â Â Ârs6000.c
> Â Â Â Â Â Âtree-cfg.c
> Â Â Â Â Â Âtree-ssa-dom.c
> Â Â Â Â Â Âtree-ssa-loop-ivops.c
> Â Â Â Â Â Âtree-ssa-phiopt.c
> Â Â Â Â Â Âtree-ssa-uncprop.c
>
> 2012-02-10 ÂMichael Meissner Â<meissner@linux.vnet.ibm.com>
>
> Â Â Â Â* target.def (cmove_md_extra): New hook that is called from
> Â Â Â Âifcvt.c to allow the backend to generate additional conditional
> Â Â Â Âmoves that aren't handled by the machine independent code. ÂAdd
> Â Â Â Âsupport to call the hook at the appropriate places.
> Â Â Â Â* targhooks.h (default_cmove_md_extra): Likewise.
> Â Â Â Â* targhooks.c (default_cmove_md_extra): Likewise.
> Â Â Â Â* target.h (enum ifcvt_pass): Likewise.
> Â Â Â Â* ifcvt.c (find_if_header): Likewise.
> Â Â Â Â(noce_find_if_block): Likewise.
> Â Â Â Â(struct noce_if_info): Likewise.
> Â Â Â Â(noce_process_if_block): Likewise.
> Â Â Â Â(cond_move_process_if_block): Likewise.
> Â Â Â Â(if_convert): Likewise.
> Â Â Â Â(rest_of_handle_if_conversion): Likewise.
> Â Â Â Â(rest_of_handle_if_after_combine): Likewise.
> Â Â Â Â(rest_of_handle_if_after_reload): Likewise.
> Â Â Â Â* doc/tm.texi (TARGET_CMOVE_MD_EXTRA): Likewise.
> Â Â Â Â* doc/tm.texi.in (TARGET_CMOVE_MD_EXTRA): Likewise.
>
> Â Â Â Â* config/rs6000/rs6000.opt (-mcmove-adjacent-memory): Add support
> Â Â Â Âfor a new switch to optimize conditional moves where each side is
> Â Â Â Âa memory reference and the memory locations are adjacent.
> Â Â Â Â* config/rs6000/rs6000.c (rs6000_cmove_md_extra): Likewise.
> Â Â Â Â(TARGET_CMOVE_MD_EXTRA): Likewise.
> Â Â Â Â(rs6000_decompose_offsettable_memref): Likewise.
>
> --
> Michael Meissner, IBM
> 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
> meissner@linux.vnet.ibm.com   fax +1 (978) 399-6899


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]