This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] PR target/70155: Use SSE for TImode load/store


On Tue, Apr 26, 2016 at 9:27 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2016-04-26 19:20 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
>> 2016-04-26 19:12 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>:
>>> On Tue, Apr 26, 2016 at 9:07 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>> 2016-04-26 18:39 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>:
>>>>> On Tue, Apr 26, 2016 at 8:21 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>>> 2016-04-26 18:12 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>:
>>>>>>> On Tue, Apr 26, 2016 at 8:05 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>>>>> 2016-04-26 17:55 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>:
>>>>>>>>> On Tue, Apr 26, 2016 at 7:15 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>>>>>>> 2016-04-26 17:07 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>:
>>>>>>>>>>> On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>>>>>>>>> 2016-04-25 18:27 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ilya, can you take a look?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> H.J.
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Algorithmic part of the patch looks OK to me except the following piece of code.
>>>>>>>>>>>>
>>>>>>>>>>>> +/* Check REF's chain to add new insns into a queue
>>>>>>>>>>>> +   and find registers requiring conversion.  */
>>>>>>>>>>>>
>>>>>>>>>>>> Comment is wrong because you don't have any conversions required for
>>>>>>>>>>>> your candidates.
>>>>>>>>>>>
>>>>>>>>>>> I will fix it.
>>>>>>>>>>>
>>>>>>>>>>>> +
>>>>>>>>>>>> +void
>>>>>>>>>>>> +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref ref)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +  df_link *chain;
>>>>>>>>>>>> +
>>>>>>>>>>>> +  gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
>>>>>>>>>>>> +             || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
>>>>>>>>>>>> +  add_to_queue (DF_REF_INSN_UID (ref));
>>>>>>>>>>>> +
>>>>>>>>>>>> +  for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next)
>>>>>>>>>>>> +    {
>>>>>>>>>>>> +      unsigned uid = DF_REF_INSN_UID (chain->ref);
>>>>>>>>>>>> +
>>>>>>>>>>>> +      if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref)))
>>>>>>>>>>>> +       continue;
>>>>>>>>>>>> +
>>>>>>>>>>>> +      if (!DF_REF_REG_MEM_P (chain->ref))
>>>>>>>>>>>> +       continue;
>>>>>>>>>>>>
>>>>>>>>>>>> I believe here you wrongly jump to the next ref intead of actually adding it
>>>>>>>>>>>> to a queue.  You may just use
>>>>>>>>>>>>
>>>>>>>>>>>> gcc_assert (!DF_REF_REG_MEM_P (chain->ref));
>>>>>>>>>>>>
>>>>>>>>>>>> because you should'n have a candidate used in address operand.
>>>>>>>>>>>
>>>>>>>>>>> I will update.
>>>>>>>>>>>
>>>>>>>>>>>> +
>>>>>>>>>>>> +      if (bitmap_bit_p (insns, uid))
>>>>>>>>>>>> +       continue;
>>>>>>>>>>>> +
>>>>>>>>>>>> +      if (bitmap_bit_p (candidates, uid))
>>>>>>>>>>>> +       add_to_queue (uid);
>>>>>>>>>>>>
>>>>>>>>>>>> Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses and defs
>>>>>>>>>>>> out of candidates list are allowed?
>>>>>>>>>>>
>>>>>>>>>>> That would be wrong since there are
>>>>>>>>>>>
>>>>>>>>>>>  while (!bitmap_empty_p (queue))
>>>>>>>>>>>     {
>>>>>>>>>>>       insn_uid = bitmap_first_set_bit (queue);
>>>>>>>>>>>       bitmap_clear_bit (queue, insn_uid);
>>>>>>>>>>>       bitmap_clear_bit (candidates, insn_uid);
>>>>>>>>>>>       add_insn (candidates, insn_uid);
>>>>>>>>>>>     }
>>>>>>>>>>>
>>>>>>>>>>> An instruction is a candidate and the bit is cleared when
>>>>>>>>>>> analyze_register_chain is called.
>>>>>>>>>>
>>>>>>>>>> You clear candidates bit but the first thing you do in add_insn is set
>>>>>>>>>> insns bit.
>>>>>>>>>> Thus you should hit:
>>>>>>>>>>
>>>>>>>>>>       if (bitmap_bit_p (insns, uid))
>>>>>>>>>>         continue;
>>>>>>>>>>
>>>>>>>>>> For handled candidates.
>>>>>>>>>>
>>>>>>>>>> Probably it would be more clear if we keep this clear/set pair
>>>>>>>>>> together?  E.g. move
>>>>>>>>>> bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> After we started processing candidates, we only use candidates
>>>>>>>>> to check if an instruction is a candidate, not to check if an
>>>>>>>>> instruction is NOT a candidate.
>>>>>>>>
>>>>>>>> I don't see how it's related to what I said.  My point is that
>>>>>>>> when you analyze added insn you shouldn't reach insns which are both
>>>>>>>> not in candidates and not in current scalar_chain_64.  That's why I
>>>>>>>> think you miss an assert in scalar_chain_64::analyze_register_chain.
>>>>>>>
>>>>>>> Since all candidates will be processed by
>>>>>>>
>>>>>>>  while (!bitmap_empty_p (queue))
>>>>>>>     {
>>>>>>>       insn_uid = bitmap_first_set_bit (queue);
>>>>>>>       bitmap_clear_bit (queue, insn_uid);
>>>>>>>       bitmap_clear_bit (candidates, insn_uid);
>>>>>>>       add_insn (candidates, insn_uid);
>>>>>>>     }
>>>>>>
>>>>>> This process only queue, not all candidates.  analyze_register_chain
>>>>>> fills queue bitmap to build a chain.
>>>>>>
>>>>>>>
>>>>>>> I will change to
>>>>>>>
>>>>>>> /* Check REF's chain to add new insns into a queue.  */
>>>>>>>
>>>>>>> void
>>>>>>> timode_scalar_chain::analyze_register_chain (bitmap candidates,
>>>>>>>                                              df_ref ref)
>>>>>>> {
>>>>>>>     gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
>>>>>>>               || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
>>>>>>>   add_to_queue (DF_REF_INSN_UID (ref));
>>>>>>> }
>>>>>>>
>>>>>>
>>>>>> The purpose of analyze_register_chain is to collect all register defs
>>>>>> or uses into chain's queue.  You can't just remove a loop over ref's
>>>>>> chain then.
>>>>>
>>>>> That is true for DImode conversion.  For TImode conversion,
>>>>> there are no register conversions required:
>>>>>
>>>>> https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01394.html
>>>>>
>>>>> All it does it to call add_to_queue.
>>>>
>>>> No conversion means no mark_dual_mode_def calls in analyze_register_chain
>>>> but it still may call add_to_queue multiple times in its loop. E.g.
>>>>
>>>> 1: r1 = 1
>>>> ...
>>>> 5: r1 = 2
>>>> ..
>>>> 10: r2 = r1
>>>>
>>>> When you add #10 into a chain you should add both #1 and #5 into a queue.
>>>> That's what analyze_register_chain is supposed to do.
>>>
>>> Since both i1 and i5 are candidates, they are added to queue.
>>
>> By who?  They will just go into another chain.
>
> BTW you can just reuse existing analyze_register_chain and make
> mark_dual_mode_def
> virtual instead.  Just put gcc_unreachable () into its TI version.
>

I will take a look.

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]