[PATCH] PR target/70155: Use SSE for TImode load/store

H.J. Lu hjl.tools@gmail.com
Tue Apr 26 16:31:00 GMT 2016


On Tue, Apr 26, 2016 at 9:20 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2016-04-26 19:12 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>:
>> On Tue, Apr 26, 2016 at 9:07 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>> 2016-04-26 18:39 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>:
>>>> On Tue, Apr 26, 2016 at 8:21 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>> 2016-04-26 18:12 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>:
>>>>>> On Tue, Apr 26, 2016 at 8:05 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>>>> 2016-04-26 17:55 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>:
>>>>>>>> On Tue, Apr 26, 2016 at 7:15 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>>>>>> 2016-04-26 17:07 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>:
>>>>>>>>>> On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>>>>>>>> 2016-04-25 18:27 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>> Ilya, can you take a look?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> H.J.
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Algorithmic part of the patch looks OK to me except the following piece of code.
>>>>>>>>>>>
>>>>>>>>>>> +/* Check REF's chain to add new insns into a queue
>>>>>>>>>>> +   and find registers requiring conversion.  */
>>>>>>>>>>>
>>>>>>>>>>> Comment is wrong because you don't have any conversions required for
>>>>>>>>>>> your candidates.
>>>>>>>>>>
>>>>>>>>>> I will fix it.
>>>>>>>>>>
>>>>>>>>>>> +
>>>>>>>>>>> +void
>>>>>>>>>>> +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref ref)
>>>>>>>>>>> +{
>>>>>>>>>>> +  df_link *chain;
>>>>>>>>>>> +
>>>>>>>>>>> +  gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
>>>>>>>>>>> +             || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
>>>>>>>>>>> +  add_to_queue (DF_REF_INSN_UID (ref));
>>>>>>>>>>> +
>>>>>>>>>>> +  for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next)
>>>>>>>>>>> +    {
>>>>>>>>>>> +      unsigned uid = DF_REF_INSN_UID (chain->ref);
>>>>>>>>>>> +
>>>>>>>>>>> +      if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref)))
>>>>>>>>>>> +       continue;
>>>>>>>>>>> +
>>>>>>>>>>> +      if (!DF_REF_REG_MEM_P (chain->ref))
>>>>>>>>>>> +       continue;
>>>>>>>>>>>
>>>>>>>>>>> I believe here you wrongly jump to the next ref intead of actually adding it
>>>>>>>>>>> to a queue.  You may just use
>>>>>>>>>>>
>>>>>>>>>>> gcc_assert (!DF_REF_REG_MEM_P (chain->ref));
>>>>>>>>>>>
>>>>>>>>>>> because you should'n have a candidate used in address operand.
>>>>>>>>>>
>>>>>>>>>> I will update.
>>>>>>>>>>
>>>>>>>>>>> +
>>>>>>>>>>> +      if (bitmap_bit_p (insns, uid))
>>>>>>>>>>> +       continue;
>>>>>>>>>>> +
>>>>>>>>>>> +      if (bitmap_bit_p (candidates, uid))
>>>>>>>>>>> +       add_to_queue (uid);
>>>>>>>>>>>
>>>>>>>>>>> Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses and defs
>>>>>>>>>>> out of candidates list are allowed?
>>>>>>>>>>
>>>>>>>>>> That would be wrong since there are
>>>>>>>>>>
>>>>>>>>>>  while (!bitmap_empty_p (queue))
>>>>>>>>>>     {
>>>>>>>>>>       insn_uid = bitmap_first_set_bit (queue);
>>>>>>>>>>       bitmap_clear_bit (queue, insn_uid);
>>>>>>>>>>       bitmap_clear_bit (candidates, insn_uid);
>>>>>>>>>>       add_insn (candidates, insn_uid);
>>>>>>>>>>     }
>>>>>>>>>>
>>>>>>>>>> An instruction is a candidate and the bit is cleared when
>>>>>>>>>> analyze_register_chain is called.
>>>>>>>>>
>>>>>>>>> You clear candidates bit but the first thing you do in add_insn is set
>>>>>>>>> insns bit.
>>>>>>>>> Thus you should hit:
>>>>>>>>>
>>>>>>>>>       if (bitmap_bit_p (insns, uid))
>>>>>>>>>         continue;
>>>>>>>>>
>>>>>>>>> For handled candidates.
>>>>>>>>>
>>>>>>>>> Probably it would be more clear if we keep this clear/set pair
>>>>>>>>> together?  E.g. move
>>>>>>>>> bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn.
>>>>>>>>>
>>>>>>>>
>>>>>>>> After we started processing candidates, we only use candidates
>>>>>>>> to check if an instruction is a candidate, not to check if an
>>>>>>>> instruction is NOT a candidate.
>>>>>>>
>>>>>>> I don't see how it's related to what I said.  My point is that
>>>>>>> when you analyze added insn you shouldn't reach insns which are both
>>>>>>> not in candidates and not in current scalar_chain_64.  That's why I
>>>>>>> think you miss an assert in scalar_chain_64::analyze_register_chain.
>>>>>>
>>>>>> Since all candidates will be processed by
>>>>>>
>>>>>>  while (!bitmap_empty_p (queue))
>>>>>>     {
>>>>>>       insn_uid = bitmap_first_set_bit (queue);
>>>>>>       bitmap_clear_bit (queue, insn_uid);
>>>>>>       bitmap_clear_bit (candidates, insn_uid);
>>>>>>       add_insn (candidates, insn_uid);
>>>>>>     }
>>>>>
>>>>> This process only queue, not all candidates.  analyze_register_chain
>>>>> fills queue bitmap to build a chain.
>>>>>
>>>>>>
>>>>>> I will change to
>>>>>>
>>>>>> /* Check REF's chain to add new insns into a queue.  */
>>>>>>
>>>>>> void
>>>>>> timode_scalar_chain::analyze_register_chain (bitmap candidates,
>>>>>>                                              df_ref ref)
>>>>>> {
>>>>>>     gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
>>>>>>               || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
>>>>>>   add_to_queue (DF_REF_INSN_UID (ref));
>>>>>> }
>>>>>>
>>>>>
>>>>> The purpose of analyze_register_chain is to collect all register defs
>>>>> or uses into chain's queue.  You can't just remove a loop over ref's
>>>>> chain then.
>>>>
>>>> That is true for DImode conversion.  For TImode conversion,
>>>> there are no register conversions required:
>>>>
>>>> https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01394.html
>>>>
>>>> All it does it to call add_to_queue.
>>>
>>> No conversion means no mark_dual_mode_def calls in analyze_register_chain
>>> but it still may call add_to_queue multiple times in its loop. E.g.
>>>
>>> 1: r1 = 1
>>> ...
>>> 5: r1 = 2
>>> ..
>>> 10: r2 = r1
>>>
>>> When you add #10 into a chain you should add both #1 and #5 into a queue.
>>> That's what analyze_register_chain is supposed to do.
>>
>> Since both i1 and i5 are candidates, they are added to queue.
>
> By who?  They will just go into another chain.
>

scalar_chain::build is called on all candidates, which adds all
candidate instructions to insns bitmap.  All candidate instructions
will be converted.

-- 
H.J.



More information about the Gcc-patches mailing list