[PATCH,x86] Fix combine for condditional instructions.

Uros Bizjak ubizjak@gmail.com
Thu Dec 13 10:20:00 GMT 2012


On Thu, Dec 13, 2012 at 10:51 AM, Richard Biener
<richard.guenther@gmail.com> wrote:

>>>> I assume that this is not right way for fixing such simple performance
>>>> anomaly since we need to do redundant work - combine load to
>>>> conditional and then split it back in peephole2? Does it look
>>>> reasonable? Why we should produce non-efficient instrucction that must
>>>> be splitted later?
>>>
>>> Well, either don't allow this instruction variant from the start, or allow
>>> the extra freedom for register allocation this creates.  It doesn't make
>>> sense to just reject it being generated by combine - that doesn't address
>>> when it materializes in another way.
>>
>> Please check the attached patch, it implements this limitation in a correct way:
>> - keeps memory operands for -Os or cold parts of the executable
>> - doesn't increase register pressure
>> - handles all situations where memory operand can propagate into RTX
>>
>> Yuri, can you please check if this patch fixes the performance problem for you?
>>
>> BTW: I would really like to add some TARGET_USE_CMOVE_WITH_MEMOP
>> target macro and conditionalize new peephole2 patterns on it.
>
> Looks good to me.  I believe optimize_insn_for_speed_p ()
> only works reliable during RTL expansion as it relies on
> crtl->maybe_hot_insn_p to be better than function granular.  I'm quite sure
> split does not adjust this (it probably should, as those predicates are
> definitely the correct ones to use), via rtl_profile_for_bb ().  I
> think passes that
> do not adjust this get what is left over by previous passes (instead of the
> default).
>
> Honza, I think the pass manager should call default_rtl_profile () before each
> RTL pass to avoid this, no?

Please note that we have plenty of existing peephole2s that use
optimize_insn_for_speed_p predicate. It is assumed to work ...

Uros.



More information about the Gcc-patches mailing list