This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] PR target/70155: Use SSE for TImode load/store


2016-04-26 22:50 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>:
> On Tue, Apr 26, 2016 at 11:42 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Apr 26, 2016 at 9:33 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Tue, Apr 26, 2016 at 9:27 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>> 2016-04-26 19:20 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
>>>>> 2016-04-26 19:12 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>:
>>>>>> On Tue, Apr 26, 2016 at 9:07 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>>>> 2016-04-26 18:39 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>:
>>>>>>>> On Tue, Apr 26, 2016 at 8:21 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>>>>>> 2016-04-26 18:12 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>:
>>>>>>>>>> On Tue, Apr 26, 2016 at 8:05 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>>>>>>>> 2016-04-26 17:55 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>:
>>>>>>>>>>>> On Tue, Apr 26, 2016 at 7:15 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>>>>>>>>>> 2016-04-26 17:07 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>:
>>>>>>>>>>>>>> On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>>>>>>>>>>>> 2016-04-25 18:27 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ilya, can you take a look?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> H.J.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Algorithmic part of the patch looks OK to me except the following piece of code.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +/* Check REF's chain to add new insns into a queue
>>>>>>>>>>>>>>> +   and find registers requiring conversion.  */
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Comment is wrong because you don't have any conversions required for
>>>>>>>>>>>>>>> your candidates.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I will fix it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>> +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref ref)
>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>> +  df_link *chain;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +  gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
>>>>>>>>>>>>>>> +             || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
>>>>>>>>>>>>>>> +  add_to_queue (DF_REF_INSN_UID (ref));
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +  for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next)
>>>>>>>>>>>>>>> +    {
>>>>>>>>>>>>>>> +      unsigned uid = DF_REF_INSN_UID (chain->ref);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +      if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref)))
>>>>>>>>>>>>>>> +       continue;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +      if (!DF_REF_REG_MEM_P (chain->ref))
>>>>>>>>>>>>>>> +       continue;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I believe here you wrongly jump to the next ref intead of actually adding it
>>>>>>>>>>>>>>> to a queue.  You may just use
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> gcc_assert (!DF_REF_REG_MEM_P (chain->ref));
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> because you should'n have a candidate used in address operand.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I will update.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +      if (bitmap_bit_p (insns, uid))
>>>>>>>>>>>>>>> +       continue;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +      if (bitmap_bit_p (candidates, uid))
>>>>>>>>>>>>>>> +       add_to_queue (uid);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses and defs
>>>>>>>>>>>>>>> out of candidates list are allowed?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That would be wrong since there are
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  while (!bitmap_empty_p (queue))
>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>       insn_uid = bitmap_first_set_bit (queue);
>>>>>>>>>>>>>>       bitmap_clear_bit (queue, insn_uid);
>>>>>>>>>>>>>>       bitmap_clear_bit (candidates, insn_uid);
>>>>>>>>>>>>>>       add_insn (candidates, insn_uid);
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> An instruction is a candidate and the bit is cleared when
>>>>>>>>>>>>>> analyze_register_chain is called.
>>>>>>>>>>>>>
>>>>>>>>>>>>> You clear candidates bit but the first thing you do in add_insn is set
>>>>>>>>>>>>> insns bit.
>>>>>>>>>>>>> Thus you should hit:
>>>>>>>>>>>>>
>>>>>>>>>>>>>       if (bitmap_bit_p (insns, uid))
>>>>>>>>>>>>>         continue;
>>>>>>>>>>>>>
>>>>>>>>>>>>> For handled candidates.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Probably it would be more clear if we keep this clear/set pair
>>>>>>>>>>>>> together?  E.g. move
>>>>>>>>>>>>> bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> After we started processing candidates, we only use candidates
>>>>>>>>>>>> to check if an instruction is a candidate, not to check if an
>>>>>>>>>>>> instruction is NOT a candidate.
>>>>>>>>>>>
>>>>>>>>>>> I don't see how it's related to what I said.  My point is that
>>>>>>>>>>> when you analyze added insn you shouldn't reach insns which are both
>>>>>>>>>>> not in candidates and not in current scalar_chain_64.  That's why I
>>>>>>>>>>> think you miss an assert in scalar_chain_64::analyze_register_chain.
>>>>>>>>>>
>>>>>>>>>> Since all candidates will be processed by
>>>>>>>>>>
>>>>>>>>>>  while (!bitmap_empty_p (queue))
>>>>>>>>>>     {
>>>>>>>>>>       insn_uid = bitmap_first_set_bit (queue);
>>>>>>>>>>       bitmap_clear_bit (queue, insn_uid);
>>>>>>>>>>       bitmap_clear_bit (candidates, insn_uid);
>>>>>>>>>>       add_insn (candidates, insn_uid);
>>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>> This process only queue, not all candidates.  analyze_register_chain
>>>>>>>>> fills queue bitmap to build a chain.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I will change to
>>>>>>>>>>
>>>>>>>>>> /* Check REF's chain to add new insns into a queue.  */
>>>>>>>>>>
>>>>>>>>>> void
>>>>>>>>>> timode_scalar_chain::analyze_register_chain (bitmap candidates,
>>>>>>>>>>                                              df_ref ref)
>>>>>>>>>> {
>>>>>>>>>>     gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref))
>>>>>>>>>>               || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref)));
>>>>>>>>>>   add_to_queue (DF_REF_INSN_UID (ref));
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The purpose of analyze_register_chain is to collect all register defs
>>>>>>>>> or uses into chain's queue.  You can't just remove a loop over ref's
>>>>>>>>> chain then.
>>>>>>>>
>>>>>>>> That is true for DImode conversion.  For TImode conversion,
>>>>>>>> there are no register conversions required:
>>>>>>>>
>>>>>>>> https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01394.html
>>>>>>>>
>>>>>>>> All it does it to call add_to_queue.
>>>>>>>
>>>>>>> No conversion means no mark_dual_mode_def calls in analyze_register_chain
>>>>>>> but it still may call add_to_queue multiple times in its loop. E.g.
>>>>>>>
>>>>>>> 1: r1 = 1
>>>>>>> ...
>>>>>>> 5: r1 = 2
>>>>>>> ..
>>>>>>> 10: r2 = r1
>>>>>>>
>>>>>>> When you add #10 into a chain you should add both #1 and #5 into a queue.
>>>>>>> That's what analyze_register_chain is supposed to do.
>>>>>>
>>>>>> Since both i1 and i5 are candidates, they are added to queue.
>>>>>
>>>>> By who?  They will just go into another chain.
>>>>
>>>> BTW you can just reuse existing analyze_register_chain and make
>>>> mark_dual_mode_def
>>>> virtual instead.  Just put gcc_unreachable () into its TI version.
>>>>
>>>
>>
>> Here is the updated patch which does that.  Ok for trunk if there
>> is no regressions on x86-64?
>>
>
> CSE works with SSE constants now.  Here is the updated patch.
> OK for trunk if there are no regressions on x86-64?

Looks OK to me.

Thanks,
Ilya

>
>
> --
> H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]