This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Why is this not optimized?


Thanks for the reply. I will look at the patch. As far as the cost is concerned, I think fwprop doesn't really need to understand pipeline model. As long as rtx costs after optimization is less than before optimization, I think it is good enough. Of course, it won't be better in every case, but should be better in general.

Cheers,
Bingfeng

-----Original Message-----
From: Bin.Cheng [mailto:amker.cheng@gmail.com] 
Sent: 15 May 2014 06:59
To: Bingfeng Mei
Cc: gcc@gcc.gnu.org
Subject: Re: Why is this not optimized?

On Wed, May 14, 2014 at 9:14 PM, Bingfeng Mei <bmei@broadcom.com> wrote:
> Hi,
> I am looking at some code of our target, which is not optimized as expected. For the following RTX, I expect source of insn 17 should be propagated into insn 20, and insn 17 is eliminated as a result. On our target, it will become a predicated xor instruction instead of two. Initially, I thought fwprop pass should do this.
>
> (insn 17 16 18 3 (set (reg/v:HI 102 [ crc ])
>         (xor:HI (reg/v:HI 108 [ crc ])
>             (const_int 16386 [0x4002]))) coremark.c:1632 725 {xorhi3}
>      (nil))
> (insn 18 17 19 3 (set (reg:BI 113)
>         (ne:BI (reg:QI 101 [ D.4446 ])
>             (const_int 1 [0x1]))) 1397 {cmp_qimode}
>      (nil))
> (jump_insn 19 18 55 3 (set (pc)
>         (if_then_else (ne (reg:BI 113)
>                 (const_int 0 [0]))
>             (label_ref 23)
>             (pc))) 1477 {cbranchbi4}
>      (expr_list:REG_DEAD (reg:BI 113)
>         (expr_list:REG_BR_PROB (const_int 7100 [0x1bbc])
>             (expr_list:REG_PRED_WIDTH (const_int 1 [0x1])
>                 (nil))))
>  -> 23)
> (note 55 19 20 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
> (insn 20 55 23 4 (set (reg:HI 112 [ crc ])
>         (reg/v:HI 102 [ crc ])) 502 {fp_movhi}
>      (expr_list:REG_DEAD (reg/v:HI 102 [ crc ])
>         (nil)))
> (code_label 23 20 56 5 2 "" [1 uses])
>
>
> But it can't. First propagate_rtx_1 will return false because PR_CAN_APPEAR is false and
> following code is executed.
>
>   if (x == old_rtx)
>     {
>       *px = new_rtx;
>       return can_appear;
>     }
>
> Even I forces PR_CAN_APPEAR to be set in flags, fwprop still won't go ahead in
> try_fwprpp_subst because old_cost is 0 (REG only rtx), and set_src_cost (SET_SRC (set),
> speed) is bigger than 0. So the change is deemed as not profitable, which is not correct
> IMO.
Pass fwprop is too conservative with respect to propagation
opportunities outside of memory reference, it just gives up at many
places.  Also as in your case, seems it doesn't take into
consideration that the original insn can be removed after propagation.

We Mi once sent a patch re-implementing fwprop pass at
https://gcc.gnu.org/ml/gcc-patches/2013-03/msg00617.html .
I also did some experiments and worked out a local patch doing similar
work to handle cases exactly like yours.
The problem is even though one instruction can be saved (as in your
case), it's not always good, because it tends to generate more complex
instructions, and such insns are somehow more vulnerable to pipeline
hazard.  Unfortunately, it's kind of impossible for fwprop to
understand the pipeline risk.

Thanks,
bin
>
> If fwprop is not the place to do this optimization, where should it be done? I am working on up-to-date GCC 4.8.
>
> Thanks,
> Bingfeng Mei



-- 
Best Regards.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]