This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Atom: Scheduler improvements for better imul placement


2012/4/12 Andrey Belevantsev <abel@ispras.ru>:
> On 12.04.2012 17:54, Richard Guenther wrote:
>>
>> 2012/4/12 Andrey Belevantsev<abel@ispras.ru>:
>>>
>>> On 12.04.2012 16:38, Richard Guenther wrote:
>>>>
>>>>
>>>> On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatin<izamyatin@gmail.com>
>>>> ?wrote:
>>>>>
>>>>>
>>>>> On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther
>>>>> <richard.guenther@gmail.com> ? ?wrote:
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakov<amonakov@ispras.ru>
>>>>>> ?wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Can atom execute two IMUL in parallel? ?Or what exactly is the
>>>>>>>> pipeline
>>>>>>>> behavior?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> As I understand from Intel's optimization reference manual, the
>>>>>>> behavior is as
>>>>>>> follows: if the instruction immediately following IMUL has shorter
>>>>>>> latency,
>>>>>>> execution is stalled for 4 cycles (which is IMUL's latency);
>>>>>>> otherwise,
>>>>>>> a
>>>>>>> 4-or-more cycles latency instruction can be issued after IMUL without
>>>>>>> a
>>>>>>> stall.
>>>>>>> In other words, IMUL is pipelined with respect to other long-latency
>>>>>>> instructions, but not to short-latency instructions.
>>>>>>
>>>>>>
>>>>>>
>>>>>> It seems to be modeled in the pipeline description though:
>>>>>>
>>>>>> ;;; imul insn has 5 cycles latency
>>>>>> (define_reservation "atom-imul-32"
>>>>>> ? ? ? ? ? ? ? ? ? ?"atom-imul-1, atom-imul-2, atom-imul-3,
>>>>>> atom-imul-4,
>>>>>> ? ? ? ? ? ? ? ? ? ? atom-port-0")
>>>>>>
>>>>>> ;;; imul instruction excludes other non-FP instructions.
>>>>>> (exclusion_set "atom-eu-0, atom-eu-1"
>>>>>> ? ? ? ? ? ? ? "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4")
>>>>>>
>>>>>
>>>>> The main idea is quite simple:
>>>>>
>>>>> If we are going to schedule IMUL instruction (it is on the top of
>>>>> ready list) we try to find out producer of other (independent) IMUL
>>>>> instruction that is in ready list too. The goal is try to schedule
>>>>> such a producer to get another IMUL in ready list and get scheduling
>>>>> of 2 successive IMUL instructions.
>>>>
>>>>
>>>>
>>>> Why does that not happen without your patch? ?Does it never happen
>>>> without
>>>> your patch or does it merely not happen for one EEMBC benchmark (can
>>>> you provide a testcase?)?
>>>
>>>
>>>
>>> It does not happen because the scheduler by itself does not do such
>>> specific
>>> reordering. ?That said, it is easy to imagine the cases where this patch
>>> will make things worse rather than better.
>>
>>
>> That surprises me. ?What is so specific about this reordering?
>
>
> I mean that the scheduler does things like "sort the ready list according to
> a number of heuristics and to the insn priority, then choose the insn that
> would allow the maximum of ready insns to be issued on the current cycle".
> ?The heuristics are rather general. ?The scheduler does not do things like
> "if some random insn is ready, then choose some other random insn from the
> ready list and schedule it" (which is what the patch does). This is what
> scheduler hooks are for, to account for some target-specific heuristic.
>
> The problem is that this particular implementation looks somewhat like an
> overkill and also motivated by a single benchmark. ?Testing on a wider set
> of benchmarks and checking compile-time hit would make the motivation more
> convincing.

Yeah, and especially looking _why_ the generic heuristics are not working
and if they could be improved.  After all the issue seems to be properly
modeled in the DFA.

Richard.

> Andrey
>
>
>>
>>> Igor, why not try different subtler mechanisms like adjust_priority,
>>> which
>>> is get called when an insn is added to the ready list? ?E.g. increase the
>>> producer's priority.
>>>
>>> The patch as is misses checks for NONDEBUG_INSN_P. ?Also, why bail out
>>> when
>>> you have more than one imul in the ready list? ?Don't you want to bump
>>> the
>>> priority of the other imul found?
>>>
>>> Andrey
>>>
>>>
>>>>
>>>>> And MD allows us only prefer scheduling of successive IMUL
>>>>> instructions,
>>>>> i.e.
>>>>> If IMUL was just scheduled and ready list contains another IMUL
>>>>> instruction then it will be chosen as the best candidate for
>>>>> scheduling.
>>>>>
>>>>>
>>>>>> at least from my very limited guessing of what the above does. ?So,
>>>>>> did
>>>>>> you
>>>>>> analyze why the scheduler does not properly schedule things for you?
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>>
>>>>>>> ?From reading the patch, I could not understand the link between
>>>>>>> pipeline
>>>>>>> behavior and what the patch appears to do.
>>>>>>>
>>>>>>> Hope that helps.
>>>>>>>
>>>>>>> Alexander
>>>
>>>
>>>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]