This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: A question regarding bundling and NOPs insertion for VLIW architecture
- From: Alexander Monakov <amonakov at ispras dot ru>
- To: Revital1 Eres <ERES at il dot ibm dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Tue, 11 May 2010 15:57:46 +0400 (MSD)
- Subject: Re: A question regarding bundling and NOPs insertion for VLIW architecture
- References: <OF98B3C739.81B1A541-ONC2257720.003B2F37-C2257720.003BB037@il.ibm.com>
On Tue, 11 May 2010, Revital1 Eres wrote:
>
> Hello,
>
> I have a question regarding the process of bundling and NOPs insertion for
> VLIW architecture
> and I appreciate your answer:
>
> I am calling the second scheduler from the machine reorg pass; similar to
> what is done for IA64.
> I now want to handle the bundling and NOPs insertion for VLIW architecture
> with issue rate of 4
> and I want to make sure I understand the process:
>
> IIUC I can use the insns with TImode that the scheduler marked to indicate
> a new cycle, so the
> the question is how many nops to insert after that cycle, if any.
> I noticed the following approach that was used in c6x which is mentioned
> in:
> http://archiv.tu-chemnitz.de/pub/2004/0176/data/index.html
>
> "NOP Insertion and Parallel Scheduling
> If the scheduler is run, it checks dependencies and tries to schedule the
> instructions as to
> minimize the processing cycles. The hooks TARGET_SCHED_REORDER(2) are
> considered
> to reorder the instructions in the ready cue in case the back end wants to
> override the
> default rules. I used the hooks to memorize the program cycle the
> instruction is scheduled.
> This value is stored in a hash table I created for that purpose. From the
> cycle information
> I can later determine how many NOPs have to be inserted between two
> instructions. This
> value then overrides the attribute value."
>
> IA64 seems to have much more complicated approach for the bundling and NOPs
> insertion and I wonder
> if the reason is due to IA64 specific issues? or there is something I'm
> missing in the approach
> mentioned above?
>From skimming the paper I understand that the target processor is a 4-wide
VLIW with little or no instruction issue constraints (which insn type may go
in which bundle slot) and uses a non-interlocked pipeline, thus requiring NOP
insertion to avoid dependencies. IA64 is different in both regards.
Bundling in ia64 is complicated because not all combinations of insn types are
possible in a bundle (a bundle contains three insns), and instruction issue
boundaries can appear in mid-bundles (ia64 architecture uses stop bits to
indicate parallel issue boundaries, and there are some bundle kinds with a
stop bit in between). Incidentally, ia64 does not need NOP insertion to avoid
data dependency violation, because it uses scoreboarding to track register
dependencies. Thus, NOP insertion is only needed to satisfy bundling
constraints.
I think the ia64 port in GCC uses dynamic programming to perform bundling
because it would be much harder to extract the instruction placement from the
automaton (which I think tracks all of the mentioned constraints internally).
Alexander