This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Worse code generated by PRE


The optimization does look bad -- splitting backedge to allow
expression hoisting rarely removes any redundancy -- unless the loop
is really short trip counted. Besides it introduces extra copy, jump
instruction and increases register pressure.

David

On Wed, Sep 29, 2010 at 5:55 AM, Bingfeng Mei <bmei@broadcom.com> wrote:
> Richard,
> Here is the test code.
> typedef short int16_t;
> typedef unsigned short uint16_t;
>
> void MemSet16(
> ? ? ? ? ? ? ? ?int16_t ? ? ? ? *pBuf, ?/* Buffer address */
> ? ? ? ? ? ? ? ?int16_t ? ? ? ? Val, ? ?/* Value to be set */
> ? ? ? ? ? ? ? ?uint16_t ? ? ? ?Bytes ? /* Total size in bytes */
> ? ? ? ? ? ? ? ?)
>
> {
> ? ? ? ?uint16_t Idx;
>
> ? ? ? ?for(Idx=0; Idx<(Bytes>>1); Idx++)
> ? ? ? ? ? ? ? ?*pBuf++ = Val;
> }
>
> I grepped insert_into_preds_of_block and found it is called only by
> tree-ssa-pre.c. Actually, I am referring to RTL PRE pass in gcse.c
> and lcm.c.
>
>
> Before PRE:
>
>
> ;; Start of basic block ( 2) -> 3
> ;; bb 3 artificial_defs: { }
> ;; bb 3 artificial_uses: { u9(55){ }u10(57){ }u11(62){ }}
> ;; lr ?in ? ? ? ?55 [r55] 57 [r57] 62 [__arg_pointer_register__] 113 114 115
> ;; lr ?use ? ? ? 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 113 114
> ;; lr ?def ? ? ? 110 118 119 120 121
> ;; live ?in ? ? ?55 [r55] 57 [r57] 62 [__arg_pointer_register__] 113 114 115
> ;; live ?gen ? ? 110 118 119 120 121
> ;; live ?kill
>
> ;; Pred edge ?2 [91.0%] ?(fallthru)
> (note 34 33 35 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
>
> (insn 35 34 36 3 tst.c:4 (set (reg/f:SI 118)
> ? ? ? ?(plus:SI (reg/v/f:SI 114 [ pBuf ])
> ? ? ? ? ? ?(const_int 2 [0x2]))) 273 {addsi3} (nil))
>
> (insn 36 35 37 3 tst.c:4 (set (reg:HI 119)
> ? ? ? ?(plus:HI (reg:HI 113 [ D.3441 ])
> ? ? ? ? ? ?(const_int -1 [0xffffffffffffffff]))) 276 {addhi3} (expr_list:REG_DEAD (reg:HI 113 [ D.3441 ])
> ? ? ? ?(nil)))
>
> (insn 37 36 38 3 tst.c:4 (set (reg:SI 120)
> ? ? ? ?(zero_extend:SI (reg:HI 119))) 1056 {zero_extendhisi2} (expr_list:REG_DEAD (reg:HI 119)
> ? ? ? ?(nil)))
>
> (insn 38 37 39 3 tst.c:4 (set (reg:SI 121)
> ? ? ? ?(ashift:SI (reg:SI 120)
> ? ? ? ? ? ?(const_int 1 [0x1]))) 389 {ashlsi3} (expr_list:REG_DEAD (reg:SI 120)
> ? ? ? ?(nil)))
>
> (insn 39 38 43 3 tst.c:4 (set (reg/f:SI 110 [ D.3464 ])
> ? ? ? ?(plus:SI (reg/f:SI 118)
> ? ? ? ? ? ?(reg:SI 121))) 273 {addsi3} (expr_list:REG_DEAD (reg:SI 121)
> ? ? ? ?(expr_list:REG_DEAD (reg/f:SI 118)
> ? ? ? ? ? ?(nil))))
> ;; End of basic block 3 -> ( 4)
> ;; lr ?out ? ? ? 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115
> ;; live ?out ? ? 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115
>
>
> ;; Succ edge ?4 [100.0%] ?(fallthru)
>
> ;; Start of basic block ( 4 3) -> 4
> ;; bb 4 artificial_defs: { }
> ;; bb 4 artificial_uses: { u18(55){ }u19(57){ }u20(62){ }}
> ;; lr ?in ? ? ? ?55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115
> ;; lr ?use ? ? ? 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115
> ;; lr ?def ? ? ? 114 122
> ;; live ?in ? ? ?55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115
> ;; live ?gen ? ? 114 122
> ;; live ?kill
>
> ;; Pred edge ?4 [91.0%]
> ;; Pred edge ?3 [100.0%] ?(fallthru)
> (code_label 43 39 40 4 3 "" [1 uses])
>
> (note 40 43 41 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
>
> (insn 41 40 42 4 tst.c:14 (set (mem:HI (reg/v/f:SI 114 [ pBuf ]) [2 *pBuf+0 S2 A16])
> ? ? ? ?(reg/v:HI 115 [ Val ])) 236 {*movhhi} (nil))
>
> (insn 42 41 44 4 tst.c:14 (set (reg/v/f:SI 114 [ pBuf ])
> ? ? ? ?(plus:SI (reg/v/f:SI 114 [ pBuf ])
> ? ? ? ? ? ?(const_int 2 [0x2]))) 273 {addsi3} (nil))
>
> (insn 44 42 45 4 tst.c:13 (set (reg:BI 122)
> ? ? ? ?(ne:BI (reg/v/f:SI 114 [ pBuf ])
> ? ? ? ? ? ?(reg/f:SI 110 [ D.3464 ]))) 1006 {cmp_simode} (nil))
>
> (jump_insn 45 44 48 4 tst.c:13 (set (pc)
> ? ? ? ?(if_then_else (ne (reg:BI 122)
> ? ? ? ? ? ? ? ?(const_int 0 [0x0]))
> ? ? ? ? ? ?(label_ref 43)
> ? ? ? ? ? ?(pc))) 1085 {cbranchbi4} (expr_list:REG_DEAD (reg:BI 122)
> ? ? ? ?(expr_list:REG_BR_PROB (const_int 9100 [0x238c])
> ? ? ? ? ? ?(expr_list:REG_PRED_WIDTH (const_int 4 [0x4])
> ? ? ? ? ? ? ? ?(nil))))
> ?-> 43)
> ;; End of basic block 4 -> ( 4 5)
> ;; lr ?out ? ? ? 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115
> ;; live ?out ? ? 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115
>
>
> After PRE:
>
> ;; Start of basic block ( 2) -> 3
> ;; bb 3 artificial_defs: { }
> ;; bb 3 artificial_uses: { u9(55){ }u10(57){ }u11(62){ }}
> ;; lr ?in ? ? ? ?55 [r55] 57 [r57] 62 [__arg_pointer_register__] 113 114 115
> ;; lr ?use ? ? ? 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 113 114
> ;; lr ?def ? ? ? 110 118 119 120 121
> ;; live ?in ? ? ?55 [r55] 57 [r57] 62 [__arg_pointer_register__] 113 114 115
> ;; live ?gen ? ? 110 118 119 120 121
> ;; live ?kill
>
> ;; Pred edge ?2 [91.0%] ?(fallthru)
> (note 34 33 35 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
>
> (insn 35 34 53 3 tst.c:4 (set (reg/f:SI 123 [ pBuf ])
> ? ? ? ?(plus:SI (reg/v/f:SI 114 [ pBuf ])
> ? ? ? ? ? ?(const_int 2 [0x2]))) 273 {addsi3} (nil))
>
> (insn 53 35 36 3 tst.c:4 (set (reg/f:SI 118)
> ? ? ? ?(reg/f:SI 123 [ pBuf ])) -1 (nil))
>
> (insn 36 53 37 3 tst.c:4 (set (reg:HI 119)
> ? ? ? ?(plus:HI (reg:HI 113 [ D.3441 ])
> ? ? ? ? ? ?(const_int -1 [0xffffffffffffffff]))) 276 {addhi3} (expr_list:REG_DEAD (reg:HI 113 [ D.3441 ])
> ? ? ? ?(nil)))
>
> (insn 37 36 38 3 tst.c:4 (set (reg:SI 120)
> ? ? ? ?(zero_extend:SI (reg:HI 119))) 1056 {zero_extendhisi2} (expr_list:REG_DEAD (reg:HI 119)
> ? ? ? ?(nil)))
>
> (insn 38 37 39 3 tst.c:4 (set (reg:SI 121)
> ? ? ? ?(ashift:SI (reg:SI 120)
> ? ? ? ? ? ?(const_int 1 [0x1]))) 389 {ashlsi3} (expr_list:REG_DEAD (reg:SI 120)
> ? ? ? ?(nil)))
>
> (insn 39 38 43 3 tst.c:4 (set (reg/f:SI 110 [ D.3464 ])
> ? ? ? ?(plus:SI (reg/f:SI 118)
> ? ? ? ? ? ?(reg:SI 121))) 273 {addsi3} (expr_list:REG_DEAD (reg:SI 121)
> ? ? ? ?(expr_list:REG_DEAD (reg/f:SI 118)
> ? ? ? ? ? ?(nil))))
> ;; End of basic block 3 -> ( 4)
> ;; lr ?out ? ? ? 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115
> ;; live ?out ? ? 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115
>
>
> ;; Succ edge ?4 [100.0%] ?(fallthru)
>
> ;; Start of basic block ( 6 3) -> 4
> ;; bb 4 artificial_defs: { }
> ;; bb 4 artificial_uses: { u18(55){ }u19(57){ }u20(62){ }}
> ;; lr ?in ? ? ? ?55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115
> ;; lr ?use ? ? ? 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115
> ;; lr ?def ? ? ? 114 122
> ;; live ?in ? ? ?55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115
> ;; live ?gen ? ? 114 122
> ;; live ?kill
>
> ;; Pred edge ?6 [100.0%] ?(fallthru)
> ;; Pred edge ?3 [100.0%] ?(fallthru)
> (code_label 43 39 40 4 3 "" [0 uses])
>
> (note 40 43 41 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
>
> (insn 41 40 51 4 tst.c:14 (set (mem:HI (reg/v/f:SI 114 [ pBuf ]) [2 *pBuf+0 S2 A16])
> ? ? ? ?(reg/v:HI 115 [ Val ])) 236 {*movhhi} (nil))
>
> (insn 51 41 44 4 tst.c:14 (set (reg/v/f:SI 114 [ pBuf ])
> ? ? ? ?(reg/f:SI 123 [ pBuf ])) -1 (expr_list:REG_EQUAL (plus:SI (reg/v/f:SI 114 [ pBuf ])
> ? ? ? ? ? ?(const_int 2 [0x2]))
> ? ? ? ?(nil)))
>
> (insn 44 51 45 4 tst.c:13 (set (reg:BI 122)
> ? ? ? ?(ne:BI (reg/v/f:SI 114 [ pBuf ])
> ? ? ? ? ? ?(reg/f:SI 110 [ D.3464 ]))) 1006 {cmp_simode} (nil))
>
> (jump_insn 45 44 55 4 tst.c:13 (set (pc)
> ? ? ? ?(if_then_else (ne (reg:BI 122)
> ? ? ? ? ? ? ? ?(const_int 0 [0x0]))
> ? ? ? ? ? ?(label_ref:SI 55)
> ? ? ? ? ? ?(pc))) 1085 {cbranchbi4} (expr_list:REG_DEAD (reg:BI 122)
> ? ? ? ?(expr_list:REG_BR_PROB (const_int 9100 [0x238c])
> ? ? ? ? ? ?(expr_list:REG_PRED_WIDTH (const_int 4 [0x4])
> ? ? ? ? ? ? ? ?(nil))))
> ?-> 55)
> ;; End of basic block 4 -> ( 6 5)
> ;; lr ?out ? ? ? 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115
> ;; live ?out ? ? 55 [r55] 57 [r57] 62 [__arg_pointer_register__] 110 114 115
>
>
> ;; Succ edge ?6 [91.0%]
> ;; Succ edge ?5 [9.0%] ?(fallthru)
>
> ;; Start of basic block ( 4) -> 6
> ;; bb 6 artificial_defs: { }
> ;; bb 6 artificial_uses: { u-1(55){ }u-1(57){ }u-1(62){ }}
>
> ;; Pred edge ?4 [91.0%]
> (code_label 55 45 54 6 4 "" [1 uses])
>
> (note 54 55 52 6 [bb 6] NOTE_INSN_BASIC_BLOCK)
>
> (insn 52 54 48 6 (set (reg/f:SI 123 [ pBuf ])
> ? ? ? ?(plus:SI (reg/v/f:SI 114 [ pBuf ])
> ? ? ? ? ? ?(const_int 2 [0x2]))) 273 {addsi3} (nil))
> ;; End of basic block 6 -> ( 4)
>
>
>
>
> Thanks,
> Bingfeng
>
>
>
>
>> -----Original Message-----
>> From: Richard Guenther [mailto:richard.guenther@gmail.com]
>> Sent: 29 September 2010 13:33
>> To: Bingfeng Mei
>> Cc: gcc@gcc.gnu.org
>> Subject: Re: Worse code generated by PRE
>>
>> On Wed, Sep 29, 2010 at 2:16 PM, Bingfeng Mei <bmei@broadcom.com> wrote:
>> > Hello,
>> > I have been examining a significant performance regression
>> > between 4.5 and 4.4 in our port. I found that Partial Redundancy
>> > Elimination introduced in 4.5 causes the issue. The following
>> > pseudo code explains the problem:
>> >
>> > BB 3:
>> > r118 <- ?r114 + 2
>> >
>> > BB 4:
>> > R114 <- ?r114 + 2
>> > ...
>> > Conditional jump to BB 4
>> >
>> > After PRE
>> >
>> > BB 3:
>> > r123 <- ?r114 + 2
>> > r118 <- ?r123
>> >
>> > BB 4:
>> > r114 <- r123
>> > conditional jump to BB 5
>> >
>> > BB5:
>> > r123 <- r114 + 2
>> > jump to BB 4
>> >
>> >
>> > A simple loop (BB 4) is divided into two basic blocks (BB 4 & 5).
>> > An extra jump instruction is introduced. On some targets, this
>> > jump can be removed by bb-reorder pass. On our target, it cannot
>> > be reordered due to complex doloop_end pattern we generate later.
>> > Additionally, since bb-reorder is done in very late phase, the code
>> > miss some optimization opportunity such as auto_inc_dec. I don't
>> > see any benefit here to do PRE like this. Maybe we should exclude
>> > such case in the first place? I read the relevant text in
>> > "Advanced Compiler Design Implementation", the example used is linear
>> > CFG and it doesn't mention how to handle loop case.
>>
>> PRE basically sinks the computation into the latch block (possibly
>> creating that). ?Note that without a testcase it's hard to tell whether
>> this is ok in general. ?PRE tries to avoid generation of new induction
>> variables and cross-iteration data-dependences, see
>> insert_into_preds_of_block.
>> Note that 4.4 in principle performs the same optimization (you might
>> figure that PRE in 4.4 is generally disabled for -Os but enabled in 4.5,
>> but only for hot execution traces following existing practice to tune
>> code-size/performance on a fine-grained basis).
>>
>> Richard.
>>
>> > Any suggestion is greatly appreciated.
>> >
>> > Thanks,
>> > Bingfeng Mei
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>
>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]