[PATCH v3] combine: perform jump threading at the end

Ilya Leoshkevich iii@linux.ibm.com
Mon Sep 17 08:54:00 GMT 2018


> Am 14.09.2018 um 23:35 schrieb Segher Boessenkool <segher@kernel.crashing.org>:
> 
> Hi!
> 
> On Mon, Sep 10, 2018 at 12:59:19PM +0200, Ilya Leoshkevich wrote:
>> Consider the following RTL:
>> 
>> (code_label 11 10 26 4 2 (nil) [1 uses])
>> (note 26 11 12 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
>> (insn 12 26 15 4 (set (reg:SI 65)
>>        (if_then_else:SI (eq (reg:CCZ 33 %cc)
>>                (const_int 0 [0]))
>>            (const_int 1 [0x1])
>>            (const_int 0 [0]))) "pr80080-4.c":9 1674 {*movsicc})
>> (insn 15 12 16 4 (parallel [
>>            (set (reg:CCZ 33 %cc)
>>                (compare:CCZ (reg:SI 65)
>>                    (const_int 0 [0])))
>>            (clobber (scratch:SI))
>>        ]) "pr80080-4.c":9 1216 {*tstsi_cconly_extimm})
>> (jump_insn 16 15 17 4 (set (pc)
>>        (if_then_else (ne (reg:CCZ 33 %cc)
>>                (const_int 0 [0]))
>>            (label_ref:DI 23)
>>            (pc))) "pr80080-4.c":9 1897 {*cjump_64})
>> 
>> Combine simplifies this into:
>> 
>> (code_label 11 10 26 4 2 (nil) [1 uses])
>> (note 26 11 12 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
>> (note 12 26 15 4 NOTE_INSN_DELETED)
>> (note 15 12 16 4 NOTE_INSN_DELETED)
>> (jump_insn 16 15 17 4 (set (pc)
>>        (if_then_else (eq (reg:CCZ 33 %cc)
>>                (const_int 0 [0]))
>>            (label_ref:DI 23)
>>            (pc))) "pr80080-4.c":9 1897 {*cjump_64})
>> 
>> opening up the possibility to perform jump threading.  Since this
>> happens infrequently, perform jump threading only when there is a
>> changed basic block, whose sole side effect is a trailing jump.
> 
> So this happens because now there is *only* a conditional jump in this BB?

Yes.

> Could you please show generated code before and after this patch?
> I mean generated assembler code.  What -S gives you.

Before:

foo4:
.LFB0:
	.cfi_startproc
	lt	%r1,0(%r2)
	jne	.L2
	lhi	%r3,1
	cs	%r1,%r3,0(%r2)
.L2:
	jne	.L5
	br	%r14
.L5:
	jg	bar

After:

foo4:
.LFB0:
	.cfi_startproc
	lt	%r1,0(%r2)
	jne	.L4
	lhi	%r3,1
	cs	%r1,%r3,0(%r2)
	jne	.L4
	br	%r14
.L4:
	jg	bar

>> +/* Return true iff the only side effect of BB is its trailing jump_insn.  */
>> +
>> +static bool
>> +is_single_jump_bb (basic_block bb)
>> +{
>> +  rtx_insn *end = BB_END (bb);
>> +  rtx_insn *insn;
>> +
>> +  if (!JUMP_P (end))
>> +    return false;
>> +
>> +  for (insn = BB_HEAD (bb); insn != end; insn = NEXT_INSN (insn))
>> +    if (INSN_P (insn) && side_effects_p (PATTERN (insn)))
>> +      return false;
>> +  return true;
>> +}
> 
> Hrm, so it is more than that.

The intention of this check is to match the „Short circuit cases where
block B contains some side effects, as we can't safely bypass it“ one in
thread_jump ().  I've just noticed that I got it wrong - I should also
check whether JUMP_INSN itself has side-effects, like it's done there.

> Why does the existing jump threading not work for you; should it happen
> at another time?

We call cleanup_cfg (CLEANUP_THREADING) only once - during the „jump“
pass, which happens before combine.  There is also „jump2“ pass, which
happens afterwards, and after discussion with Ulrich Weigand I tried to
move jump threading there.  While this change had the desired effect on
the testcase, the code got worse in another places. Example from
GemsFDTD:

Before:

103e96c:       b9 04 00 31             lgr     %r3,%r1
103e970:       a7 18 00 00             lhi     %r1,0
103e974:       e3 20 f0 d0 00 0d       dsg     %r2,208(%r15)
103e97a:       b9 20 00 c3             cgr     %r12,%r3
103e97e:       a7 29 ff ff             lghi    %r2,-1
103e982:       ec 12 00 01 00 42       lochih  %r1,1
103e988:       e3 20 f0 f8 00 82       xg      %r2,248(%r15)
103e98e:       1a 15                   ar      %r1,%r5
103e990:       b9 e9 c0 72             sgrk    %r7,%r2,%r12
103e994:       b3 c1 00 e2             ldgr    %f14,%r2
103e998:       b9 f8 a0 21             ark     %r2,%r1,%r10
103e99c:       12 99                   ltr     %r9,%r9
103e99e:       a7 18 00 00             lhi     %r1,0
103e9a2:       ec 1c 00 01 00 42       lochile %r1,1
103e9a8:       50 10 f1 28             st      %r1,296(%r15)
103e9ac:       c2 9d 00 00 00 00       cfi     %r9,0
103e9b2:       c0 c4 00 01 28 4e       jgle    1063a4e
103e9b8:       e5 4c f1 08 00 00       mvhi    264(%r15),0
103e9be:       eb 14 00 03 00 0d       sllg    %r1,%r4,3

After:

103e96c:       b9 04 00 31             lgr     %r3,%r1       
103e970:       e5 4c f1 18 00 01       mvhi    280(%r15),1
103e976:       a7 18 00 00             lhi     %r1,0         
103e97a:       e3 20 f0 d0 00 0d       dsg     %r2,208(%r15) 
103e980:       b9 20 00 c3             cgr     %r12,%r3      
103e984:       a7 29 ff ff             lghi    %r2,-1        
103e988:       ec 12 00 01 00 42       lochih  %r1,1         
103e98e:       e3 20 f0 f8 00 82       xg      %r2,248(%r15) 
103e994:       1a 15                   ar      %r1,%r5       
103e996:       b3 c1 00 e2             ldgr    %f14,%r2
103e99a:       b9 e9 c0 72             sgrk    %r7,%r2,%r12  
103e99e:       b9 f8 a0 21             ark     %r2,%r1,%r10  
103e9a2:       c2 9d 00 00 00 00       cfi     %r9,0         
103e9a8:       c0 c4 00 01 28 51       jgle    1063a4a
103e9ae:       e5 4c f1 18 00 00       mvhi    280(%r15),0
103e9b4:       e5 4c f1 08 00 00       mvhi    264(%r15),0   
103e9ba:       eb 14 00 03 00 0d       sllg    %r1,%r4,3     

Diff:

                lgr     %rn,%rn
-               mvhi    m(%rx),1
                lhi     %rn,0

                ar      %rn,%rn
-               ldgr    %fn,%rn
                sgrk    %rn,%rn,%rn
+               ldgr    %fn,%rn
                ark     %rn,%rn,%rn
+               ltr     %rn,%rn
+               lhi     %rn,0
+               lochile %rn,1
+               st      %rn,m(%rx)
                cfi     %rn,0

                mvhi    m(%rx),0
-               mvhi    m(%rx),0
                sllg    %rn,%rn,3

The code used to compare %r9 with 0, jump based on the comparison
result, and set the local variable in the branch.  Now it compares %r9
with 0 twice - I didn’t look at the details yet, but apparently we’re
missing some optimization because jump threading is done too late.

> 
> Segher



More information about the Gcc-patches mailing list