This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] [RFC, GCC 4.8] Optimize conditional moves from adjacent memory locations


I was looking at the routelookup EEMBC benchmark and it has code of the form:

   while ( this_node->cmpbit > next_node->cmpbit )
    {
      this_node = next_node;

      if ( proute->dst_addr & (0x1 << this_node->cmpbit) )
         next_node = this_node->rlink;
      else
         next_node = this_node->llink;
    }

This is where you have a binary tree/trie and you are iterating going down
either the right link or left link until you find a stopping condition.  The
code in ifcvt.c does not handle optimizing these cases for conditional move
since the load might trap, and generates code that does if-then-else with loads
and jumps.

However, since the two elements are next to each other in memory, they are
likely in the same cache line, particularly with aligned stacks and malloc
returning aligned pointers.  Except in unusual circumstances where the pointer
is not aligned, this means it is much faster to optimize it as:

   while ( this_node->cmpbit > next_node->cmpbit )
    {
      this_node = next_node;

      rtmp = this_node->rlink;
      ltmp = this_node->llink;
      if ( proute->dst_addr & (0x1 << this_node->cmpbit) )
         next_node = rtmp;
      else
         next_node = ltmp;
    }

So I wrote some patches to do this optimization.  In ifcvt.c I added a new hook
that allows the backend to try and do conditional moves if the machine
independent code doesn't handle the special cases that the machine might have.

Then in rs6000.c I used that hook to see if the conditional moves are adjacent,
and do the optimization.

I will note that this type of code comes up quite frequently since binary trees
and tries are common data structure.  The file splay-tree.c in libiberty is one
place in the compiler tree that has conditional adjacent memory moves.

So I would like comments on the patch before the 4.8 tree opens up.  I feel
even if we decide not to add the adjacent memory move patch, the hook is
useful, and I have some other ideas for using it for the powerpc.

I was thinking about rewriting the rs6000 dependent parts to make it a normal
optimization available to all ports.  Is this something we want as a normal
option?  At the moment, I'm not sure it should be part of -O3 because it is
possible for a trap to occur if the pointer straddles a page boundary and the
test condition would guard against loading up the second value.  However,
-Ofast might be an appropriate place to do this optimization.

At this time I don't have test cases, but I would add them for the real
submission.  I have bootstraped the compiler on powerpc with this option
enabled and it passed the bootstrap and had no regressions in make check.  I
will do a spec run over the weekend as well.

In addition to libibery/splay-tree.c the following files in gcc have adjacent
conditional moves that this code would optimize:

	    cfg.c
	    c-typeck.c
	    df-scan.c
	    fold-const.c
	    graphds.c
	    ira-emit.c
	    omp-low.c
	    rs6000.c
	    tree-cfg.c
	    tree-ssa-dom.c
	    tree-ssa-loop-ivops.c
	    tree-ssa-phiopt.c
	    tree-ssa-uncprop.c

2012-02-10  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* target.def (cmove_md_extra): New hook that is called from
	ifcvt.c to allow the backend to generate additional conditional
	moves that aren't handled by the machine independent code.  Add
	support to call the hook at the appropriate places.
	* targhooks.h (default_cmove_md_extra): Likewise.
	* targhooks.c (default_cmove_md_extra): Likewise.
	* target.h (enum ifcvt_pass): Likewise.
	* ifcvt.c (find_if_header): Likewise.
	(noce_find_if_block): Likewise.
	(struct noce_if_info): Likewise.
	(noce_process_if_block): Likewise.
	(cond_move_process_if_block): Likewise.
	(if_convert): Likewise.
	(rest_of_handle_if_conversion): Likewise.
	(rest_of_handle_if_after_combine): Likewise.
	(rest_of_handle_if_after_reload): Likewise.
	* doc/tm.texi (TARGET_CMOVE_MD_EXTRA): Likewise.
	* doc/tm.texi.in (TARGET_CMOVE_MD_EXTRA): Likewise.

	* config/rs6000/rs6000.opt (-mcmove-adjacent-memory): Add support
	for a new switch to optimize conditional moves where each side is
	a memory reference and the memory locations are adjacent.
	* config/rs6000/rs6000.c (rs6000_cmove_md_extra): Likewise.
	(TARGET_CMOVE_MD_EXTRA): Likewise.
	(rs6000_decompose_offsettable_memref): Likewise.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meissner@linux.vnet.ibm.com	fax +1 (978) 399-6899

Attachment: gcc-power7.patch322b
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]