This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation


On 01/04/2016 07:32 AM, Ajit Kumar Agarwal wrote:


-----Original Message-----
From: Jeff Law [mailto:law@redhat.com]
Sent: Wednesday, December 23, 2015 12:06 PM
To: Ajit Kumar Agarwal; Richard Biener
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

On 12/11/2015 02:11 AM, Ajit Kumar Agarwal wrote:

Mibench/EEMBC benchmarks (Target Microblaze)

Automotive_qsort1(4.03%), Office_ispell(4.29%), Office_stringsearch1(3.5%). Telecom_adpcm_d( 1.37%), ospfv2_lite(1.35%).
I'm having a real tough time reproducing any of these results.  In fact, I'm having a tough time seeing cases where path splitting even applies to the Mibench/EEMBC benchmarks >>mentioned above.

In the very few cases where split-paths might apply, the net resulting assembly code I get is the same with and without split-paths.

How consistent are these results?

I am consistently getting the gains for office_ispell and office_stringsearch1, telcom_adpcm_d. I ran it again today and we see gains in the same bench mark tests
with the split path changes.

What functions are being affected that in turn impact performance?

For office_ispell: The function are Function "linit (linit, funcdef_no=0, decl_uid=2535, cgraph_uid=0, symbol_order=2) for lookup.c file".
                                    "Function checkfile (checkfile, funcdef_no=1, decl_uid=2478, cgraph_uid=1, symbol_order=4)"
                                    " Function correct (correct, funcdef_no=2, decl_uid=2503, cgraph_uid=2, symbol_order=5)"
                                    " Function askmode (askmode, funcdef_no=24, decl_uid=2464, cgraph_uid=24, symbol_order=27)"
                                    for correct.c file.

For office_stringsearch1: The function is Function "bmhi_search (bmhi_search, funcdef_no=1, decl_uid=2178, cgraph_uid=1, symbol_order=5)"
for bmhisrch.c file.
In linit there are two path splitting opportunities. Neither of them are cases where path splitting exposes any CSE or DCE opportunities at the tree level. In fact, in both cases there are no operands from the predecessors that feed into the join (that we duplicate the split the path).

There's a path splitting opportunity in correct.c::givehelp which AFAICT is totally uninteresting from a performance standpoint. However, these are one of the few cases where path splitting actually results in something that is better optimized at the tree level. How ironic. We'd easily get the same result by sinking a statement down through a PHI in a manner similar to what's been suggested for 64700.

In correct.c::checkfile and correct.c::correct and correct.c::askmode the path splitting opportunities do not lead to any further simplifications at the gimple level.

Similarly for bmhisrch.c::bmhi_search.


So when I look across all these examples, the only one that's really a CSE/DCE opportunity exposed by path splitting that is performance important is adpcm_code.

The rest, AFAICT, benefit at a much lower level -- a diamond in the CFG will require an unconditional branch from the end of one arm around the other arm to the join point. With path splitting that unconditional branch is eliminated. So there's a small gain for that. That gain may be even larger on the microblaze because of its exposed delay slot architecture -- one less slot to fill. It may also result in better code layouts which help simplistic branch predictors.

So I find myself wondering if the primary effect we're looking for most of the time is really elimination of that unconditional branch. And if it is the case, then we're looking at a very different costing heuristic -- one that favors very small join blocks rather than larger ones (that supposedly help expose CSE/DCE, but in reality aren't for the benchmarks I've looked at).

And if that's the case, then we may really be looking at something that belongs at the RTL level rather than at the tree/gimple level. Sadly, it's harder to do things like duplicate blocks at the RTL level.

Anyway, I'm going to ponder some more.

jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]