This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] [RFC, GCC 4.8] Optimize conditional moves from adjacent memory locations


On Fri, Feb 10, 2012 at 10:02 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Fri, Feb 10, 2012 at 12:46 PM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
>> I was looking at the routelookup EEMBC benchmark and it has code of the form:
>>
>> ? while ( this_node->cmpbit > next_node->cmpbit )
>> ? ?{
>> ? ? ?this_node = next_node;
>>
>> ? ? ?if ( proute->dst_addr & (0x1 << this_node->cmpbit) )
>> ? ? ? ? next_node = this_node->rlink;
>> ? ? ?else
>> ? ? ? ? next_node = this_node->llink;
>> ? ?}
>
> Hmm, this looks like we could do this on the tree level better as we
> have more information about this_node there. ?Like we know that we
> load from this_node->cmpbit before we do either of the branches. ?So
> can move both of those loads before the branch and then we get the
> ifcvt for free.

Indeed.  But note that the transform is not valid as *this_node may cross
a page boundary and thus either pointer load may trap if the other does not
(well, unless the C standard (and thus our middle-end) would require that
iff ptr->component does not trap that *ptr does not trap either - we would
require a operand_equal_p (get_base_address ()) for both addresses).

Joseph, can you clarify what the C standard specifies here?

Thanks,
Richard.
> Thanks,
> Andrew Pinski
>
>
>
>>
>> This is where you have a binary tree/trie and you are iterating going down
>> either the right link or left link until you find a stopping condition. ?The
>> code in ifcvt.c does not handle optimizing these cases for conditional move
>> since the load might trap, and generates code that does if-then-else with loads
>> and jumps.
>>
>> However, since the two elements are next to each other in memory, they are
>> likely in the same cache line, particularly with aligned stacks and malloc
>> returning aligned pointers. ?Except in unusual circumstances where the pointer
>> is not aligned, this means it is much faster to optimize it as:
>>
>> ? while ( this_node->cmpbit > next_node->cmpbit )
>> ? ?{
>> ? ? ?this_node = next_node;
>>
>> ? ? ?rtmp = this_node->rlink;
>> ? ? ?ltmp = this_node->llink;
>> ? ? ?if ( proute->dst_addr & (0x1 << this_node->cmpbit) )
>> ? ? ? ? next_node = rtmp;
>> ? ? ?else
>> ? ? ? ? next_node = ltmp;
>> ? ?}
>>
>> So I wrote some patches to do this optimization. ?In ifcvt.c I added a new hook
>> that allows the backend to try and do conditional moves if the machine
>> independent code doesn't handle the special cases that the machine might have.
>>
>> Then in rs6000.c I used that hook to see if the conditional moves are adjacent,
>> and do the optimization.
>>
>> I will note that this type of code comes up quite frequently since binary trees
>> and tries are common data structure. ?The file splay-tree.c in libiberty is one
>> place in the compiler tree that has conditional adjacent memory moves.
>>
>> So I would like comments on the patch before the 4.8 tree opens up. ?I feel
>> even if we decide not to add the adjacent memory move patch, the hook is
>> useful, and I have some other ideas for using it for the powerpc.
>>
>> I was thinking about rewriting the rs6000 dependent parts to make it a normal
>> optimization available to all ports. ?Is this something we want as a normal
>> option? ?At the moment, I'm not sure it should be part of -O3 because it is
>> possible for a trap to occur if the pointer straddles a page boundary and the
>> test condition would guard against loading up the second value. ?However,
>> -Ofast might be an appropriate place to do this optimization.
>>
>> At this time I don't have test cases, but I would add them for the real
>> submission. ?I have bootstraped the compiler on powerpc with this option
>> enabled and it passed the bootstrap and had no regressions in make check. ?I
>> will do a spec run over the weekend as well.
>>
>> In addition to libibery/splay-tree.c the following files in gcc have adjacent
>> conditional moves that this code would optimize:
>>
>> ? ? ? ? ? ?cfg.c
>> ? ? ? ? ? ?c-typeck.c
>> ? ? ? ? ? ?df-scan.c
>> ? ? ? ? ? ?fold-const.c
>> ? ? ? ? ? ?graphds.c
>> ? ? ? ? ? ?ira-emit.c
>> ? ? ? ? ? ?omp-low.c
>> ? ? ? ? ? ?rs6000.c
>> ? ? ? ? ? ?tree-cfg.c
>> ? ? ? ? ? ?tree-ssa-dom.c
>> ? ? ? ? ? ?tree-ssa-loop-ivops.c
>> ? ? ? ? ? ?tree-ssa-phiopt.c
>> ? ? ? ? ? ?tree-ssa-uncprop.c
>>
>> 2012-02-10 ?Michael Meissner ?<meissner@linux.vnet.ibm.com>
>>
>> ? ? ? ?* target.def (cmove_md_extra): New hook that is called from
>> ? ? ? ?ifcvt.c to allow the backend to generate additional conditional
>> ? ? ? ?moves that aren't handled by the machine independent code. ?Add
>> ? ? ? ?support to call the hook at the appropriate places.
>> ? ? ? ?* targhooks.h (default_cmove_md_extra): Likewise.
>> ? ? ? ?* targhooks.c (default_cmove_md_extra): Likewise.
>> ? ? ? ?* target.h (enum ifcvt_pass): Likewise.
>> ? ? ? ?* ifcvt.c (find_if_header): Likewise.
>> ? ? ? ?(noce_find_if_block): Likewise.
>> ? ? ? ?(struct noce_if_info): Likewise.
>> ? ? ? ?(noce_process_if_block): Likewise.
>> ? ? ? ?(cond_move_process_if_block): Likewise.
>> ? ? ? ?(if_convert): Likewise.
>> ? ? ? ?(rest_of_handle_if_conversion): Likewise.
>> ? ? ? ?(rest_of_handle_if_after_combine): Likewise.
>> ? ? ? ?(rest_of_handle_if_after_reload): Likewise.
>> ? ? ? ?* doc/tm.texi (TARGET_CMOVE_MD_EXTRA): Likewise.
>> ? ? ? ?* doc/tm.texi.in (TARGET_CMOVE_MD_EXTRA): Likewise.
>>
>> ? ? ? ?* config/rs6000/rs6000.opt (-mcmove-adjacent-memory): Add support
>> ? ? ? ?for a new switch to optimize conditional moves where each side is
>> ? ? ? ?a memory reference and the memory locations are adjacent.
>> ? ? ? ?* config/rs6000/rs6000.c (rs6000_cmove_md_extra): Likewise.
>> ? ? ? ?(TARGET_CMOVE_MD_EXTRA): Likewise.
>> ? ? ? ?(rs6000_decompose_offsettable_memref): Likewise.
>>
>> --
>> Michael Meissner, IBM
>> 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
>> meissner@linux.vnet.ibm.com ? ? fax +1 (978) 399-6899


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]