This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH v3] combine: perform jump threading at the end
- From: Ilya Leoshkevich <iii at linux dot ibm dot com>
- To: Segher Boessenkool <segher at kernel dot crashing dot org>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, krebbel at linux dot ibm dot com, rdapp at linux dot ibm dot com, richard dot guenther at gmail dot com
- Date: Mon, 17 Sep 2018 10:50:58 +0200
- Subject: Re: [PATCH v3] combine: perform jump threading at the end
- References: <20180910105919.19602-1-iii@linux.ibm.com> <20180914213551.GW23155@gate.crashing.org>
> Am 14.09.2018 um 23:35 schrieb Segher Boessenkool <segher@kernel.crashing.org>:
>
> Hi!
>
> On Mon, Sep 10, 2018 at 12:59:19PM +0200, Ilya Leoshkevich wrote:
>> Consider the following RTL:
>>
>> (code_label 11 10 26 4 2 (nil) [1 uses])
>> (note 26 11 12 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
>> (insn 12 26 15 4 (set (reg:SI 65)
>> (if_then_else:SI (eq (reg:CCZ 33 %cc)
>> (const_int 0 [0]))
>> (const_int 1 [0x1])
>> (const_int 0 [0]))) "pr80080-4.c":9 1674 {*movsicc})
>> (insn 15 12 16 4 (parallel [
>> (set (reg:CCZ 33 %cc)
>> (compare:CCZ (reg:SI 65)
>> (const_int 0 [0])))
>> (clobber (scratch:SI))
>> ]) "pr80080-4.c":9 1216 {*tstsi_cconly_extimm})
>> (jump_insn 16 15 17 4 (set (pc)
>> (if_then_else (ne (reg:CCZ 33 %cc)
>> (const_int 0 [0]))
>> (label_ref:DI 23)
>> (pc))) "pr80080-4.c":9 1897 {*cjump_64})
>>
>> Combine simplifies this into:
>>
>> (code_label 11 10 26 4 2 (nil) [1 uses])
>> (note 26 11 12 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
>> (note 12 26 15 4 NOTE_INSN_DELETED)
>> (note 15 12 16 4 NOTE_INSN_DELETED)
>> (jump_insn 16 15 17 4 (set (pc)
>> (if_then_else (eq (reg:CCZ 33 %cc)
>> (const_int 0 [0]))
>> (label_ref:DI 23)
>> (pc))) "pr80080-4.c":9 1897 {*cjump_64})
>>
>> opening up the possibility to perform jump threading. Since this
>> happens infrequently, perform jump threading only when there is a
>> changed basic block, whose sole side effect is a trailing jump.
>
> So this happens because now there is *only* a conditional jump in this BB?
Yes.
> Could you please show generated code before and after this patch?
> I mean generated assembler code. What -S gives you.
Before:
foo4:
.LFB0:
.cfi_startproc
lt %r1,0(%r2)
jne .L2
lhi %r3,1
cs %r1,%r3,0(%r2)
.L2:
jne .L5
br %r14
.L5:
jg bar
After:
foo4:
.LFB0:
.cfi_startproc
lt %r1,0(%r2)
jne .L4
lhi %r3,1
cs %r1,%r3,0(%r2)
jne .L4
br %r14
.L4:
jg bar
>> +/* Return true iff the only side effect of BB is its trailing jump_insn. */
>> +
>> +static bool
>> +is_single_jump_bb (basic_block bb)
>> +{
>> + rtx_insn *end = BB_END (bb);
>> + rtx_insn *insn;
>> +
>> + if (!JUMP_P (end))
>> + return false;
>> +
>> + for (insn = BB_HEAD (bb); insn != end; insn = NEXT_INSN (insn))
>> + if (INSN_P (insn) && side_effects_p (PATTERN (insn)))
>> + return false;
>> + return true;
>> +}
>
> Hrm, so it is more than that.
The intention of this check is to match the „Short circuit cases where
block B contains some side effects, as we can't safely bypass it“ one in
thread_jump (). I've just noticed that I got it wrong - I should also
check whether JUMP_INSN itself has side-effects, like it's done there.
> Why does the existing jump threading not work for you; should it happen
> at another time?
We call cleanup_cfg (CLEANUP_THREADING) only once - during the „jump“
pass, which happens before combine. There is also „jump2“ pass, which
happens afterwards, and after discussion with Ulrich Weigand I tried to
move jump threading there. While this change had the desired effect on
the testcase, the code got worse in another places. Example from
GemsFDTD:
Before:
103e96c: b9 04 00 31 lgr %r3,%r1
103e970: a7 18 00 00 lhi %r1,0
103e974: e3 20 f0 d0 00 0d dsg %r2,208(%r15)
103e97a: b9 20 00 c3 cgr %r12,%r3
103e97e: a7 29 ff ff lghi %r2,-1
103e982: ec 12 00 01 00 42 lochih %r1,1
103e988: e3 20 f0 f8 00 82 xg %r2,248(%r15)
103e98e: 1a 15 ar %r1,%r5
103e990: b9 e9 c0 72 sgrk %r7,%r2,%r12
103e994: b3 c1 00 e2 ldgr %f14,%r2
103e998: b9 f8 a0 21 ark %r2,%r1,%r10
103e99c: 12 99 ltr %r9,%r9
103e99e: a7 18 00 00 lhi %r1,0
103e9a2: ec 1c 00 01 00 42 lochile %r1,1
103e9a8: 50 10 f1 28 st %r1,296(%r15)
103e9ac: c2 9d 00 00 00 00 cfi %r9,0
103e9b2: c0 c4 00 01 28 4e jgle 1063a4e
103e9b8: e5 4c f1 08 00 00 mvhi 264(%r15),0
103e9be: eb 14 00 03 00 0d sllg %r1,%r4,3
After:
103e96c: b9 04 00 31 lgr %r3,%r1
103e970: e5 4c f1 18 00 01 mvhi 280(%r15),1
103e976: a7 18 00 00 lhi %r1,0
103e97a: e3 20 f0 d0 00 0d dsg %r2,208(%r15)
103e980: b9 20 00 c3 cgr %r12,%r3
103e984: a7 29 ff ff lghi %r2,-1
103e988: ec 12 00 01 00 42 lochih %r1,1
103e98e: e3 20 f0 f8 00 82 xg %r2,248(%r15)
103e994: 1a 15 ar %r1,%r5
103e996: b3 c1 00 e2 ldgr %f14,%r2
103e99a: b9 e9 c0 72 sgrk %r7,%r2,%r12
103e99e: b9 f8 a0 21 ark %r2,%r1,%r10
103e9a2: c2 9d 00 00 00 00 cfi %r9,0
103e9a8: c0 c4 00 01 28 51 jgle 1063a4a
103e9ae: e5 4c f1 18 00 00 mvhi 280(%r15),0
103e9b4: e5 4c f1 08 00 00 mvhi 264(%r15),0
103e9ba: eb 14 00 03 00 0d sllg %r1,%r4,3
Diff:
lgr %rn,%rn
- mvhi m(%rx),1
lhi %rn,0
ar %rn,%rn
- ldgr %fn,%rn
sgrk %rn,%rn,%rn
+ ldgr %fn,%rn
ark %rn,%rn,%rn
+ ltr %rn,%rn
+ lhi %rn,0
+ lochile %rn,1
+ st %rn,m(%rx)
cfi %rn,0
mvhi m(%rx),0
- mvhi m(%rx),0
sllg %rn,%rn,3
The code used to compare %r9 with 0, jump based on the comparison
result, and set the local variable in the branch. Now it compares %r9
with 0 twice - I didn’t look at the details yet, but apparently we’re
missing some optimization because jump threading is done too late.
>
> Segher