This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation


On 01/04/2016 07:32 AM, Ajit Kumar Agarwal wrote:

I am consistently getting the gains for office_ispell and office_stringsearch1, telcom_adpcm_d. I ran it again today and we see gains in the same bench mark tests
with the split path changes.

What functions are being affected that in turn impact performance?

For office_ispell: The function are Function "linit (linit, funcdef_no=0, decl_uid=2535, cgraph_uid=0, symbol_order=2) for lookup.c file".
                                    "Function checkfile (checkfile, funcdef_no=1, decl_uid=2478, cgraph_uid=1, symbol_order=4)"
                                    " Function correct (correct, funcdef_no=2, decl_uid=2503, cgraph_uid=2, symbol_order=5)"
                                    " Function askmode (askmode, funcdef_no=24, decl_uid=2464, cgraph_uid=24, symbol_order=27)"
                                    for correct.c file.

For office_stringsearch1: The function is Function "bmhi_search (bmhi_search, funcdef_no=1, decl_uid=2178, cgraph_uid=1, symbol_order=5)"
for bmhisrch.c file.
So I can see split-paths affecting adpcm & lookup. I don't see it affecting correct.c or bmhisrch.c.

That's progress though. It's likely one of one or more of the flags is critical, so thanks for passing those along.

I'm going to focus on adpcm for the moment, in particular adpcm_coder. It appears the key blocks are:


;;   basic block 14, loop depth 1, count 0, freq 9100, maybe hot
;;    prev block 13, next block 15, flags: (NEW, REACHABLE)
;;    pred:       12 [100.0%]  (FALLTHRU,EXECUTABLE)
;;                13 [100.0%]  (FALLTHRU,EXECUTABLE)
  # valpred_12 = PHI <valpred_54(12), valpred_55(13)>
  _112 = MAX_EXPR <valpred_12, -32768>;
  valpred_18 = MIN_EXPR <_112, 32767>;
  delta_56 = delta_7 | iftmp.1_114;
  _57 = indexTable[delta_56];
  index_58 = _57 + index_107;
  _113 = MIN_EXPR <index_58, 88>;
  index_111 = MAX_EXPR <_113, 0>;
  step_59 = stepsizeTable[index_111];
  if (bufferstep_93 != 0)
    goto <bb 15>;
  else
    goto <bb 16>;
;;    succ:       15 [50.0%]  (TRUE_VALUE,EXECUTABLE)
;;                16 [50.0%]  (FALSE_VALUE,EXECUTABLE)

;;   basic block 15, loop depth 1, count 0, freq 4550, maybe hot
;;    prev block 14, next block 16, flags: (NEW, REACHABLE)
;;    pred:       14 [50.0%]  (TRUE_VALUE,EXECUTABLE)
  _60 = delta_56 << 4;
  goto <bb 17>;
;;    succ:       17 [100.0%]  (FALLTHRU,EXECUTABLE)

;;   basic block 16, loop depth 1, count 0, freq 4550, maybe hot
;;    prev block 15, next block 17, flags: (NEW, REACHABLE)
;;    pred:       14 [50.0%]  (FALSE_VALUE,EXECUTABLE)
  outp_62 = outp_83 + 1;
  _63 = (signed char) delta_56;
  _65 = (signed char) outputbuffer_90;
  _66 = _63 | _65;
  *outp_83 = _66;
;;    succ:       17 [100.0%]  (FALLTHRU,EXECUTABLE)

;;   basic block 17, loop depth 1, count 0, freq 9100, maybe hot
;;    prev block 16, next block 18, flags: (NEW, REACHABLE)
;;    pred:       15 [100.0%]  (FALLTHRU,EXECUTABLE)
;;                16 [100.0%]  (FALLTHRU,EXECUTABLE)
  # outp_3 = PHI <outp_83(15), outp_62(16)>
  # outputbuffer_21 = PHI <_60(15), outputbuffer_90(16)>
  _109 = bufferstep_93 ^ 1;
  _98 = _109 & 1;
  ivtmp.11_68 = ivtmp.11_105 + 2;
  if (ivtmp.11_68 != _116)
    goto <bb 4>;
  else
    goto <bb 18>;


Block #17 is the join point that we're going to effectively copy into blocks #15 and #16. Doing so in turn exposes bufferstep_93 as the constant 0 in block #16, which in turn allows elimination of a couple statements in the extended version of block #16 and we propagate the constant 1 for bufferstep_93 to the top of the loop when reached via block #16. So we save a few instructions. However, I think we're actually doing a fairly poor job here.

bufferstep is a great example of a flip-flop variable and its value is statically computable based on the path from the prior loop iteration which, if exploited would allow the FSM threader to eliminate the conditional at the end of bb14. I'm going to have to play with that.

Anyway, it's late and I want to rip this test apart a bit more and see how it interacts with the heuristic that I've cobbled together as well as see what it would take to have DOM or VRP get data on bufferstep_93 on the true path out of BB14 after a path-split.

Jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]