Patch for AMD Dispatch Scheduler

Tue Aug 24 23:22:00 GMT 2010

Dispatch scheduling is a new BD feature. It is composed of two parts: the scheduling part and the alignment part.

The scheduling part (this patch) arranges instructions to maximize the throughput of the hardware dispatcher. It makes sure dispatch widow boundaries are roughly observed. It is roughly, because the lengths of instructions, in number of bytes, are not known at the scheduling time. In x86 some instruction lengths may not be known until assembly time where information such as branch offsets are computed. Scheduling part is called once before register allocation and once after register allocation.

The alignment part (not in this patch) makes sure dispatch widows align at the correct boundaries.

Dispatch Scheduling is implemented as an extension to Haifa Scheduler pass. Scheduler is programed to follow x86-BD dispatching rules during the scheduling.

2 GCC hook functions are used to communicate from the machine independent part to the machine dependent parts of the scheduler.

A new command line flag –mdispatch-scheduler is defined. This option sets flag_dispatch_scheduling. To perform dispatch scheduling “-march=bdver1” and Haifa Scheduling flags must all be selected on the command line.

Testing
-------

Self compile ran with “–mdispatch-scheduling -fschedule-insns -fsched-pressure –O2". Dispatch scheduling flag was manually set on in the self compile to exercise the new code. No new test added for this implementation. Make check of i386 tests passes. No difference in the number of failures with and without the dispatch flag.

ChangeLog
---------

2010-08-12  Reza Yazdani  <reza.yazdani@amd.com>

    * tm.texi.in (TARGET_SCHED_DISPATCH): New.
    (TARGET_SCHED_DISPATCH_DO): New.
    * tm.texi: Regererated.
    * hooks.c (hook_bool_rtx_int_false): New.
    (hook_void_rtx_int): New.
    * hooks.h (hook_bool_rtx_int_false): New.
    (hook_void_rtx_int): New.
    * target.def (dispatch): Defined.
    (dispatch_do): Defined.
    * haifa-sched.c (ready_remove_first_dispatch): New.
    (number_in_ready): New.
    (get_ready_element): New.
    * sched-init.h (get_ready_element): Declared.
    (number_in_ready): Declared.
    (debug_ready_dispatch): Declared.
    (debug_dispatch_window): Declared.
    * i386.opt (-mdispatch-scheduler): Declared.
    (flag_dispatch_scheduling): Declared.
    * i386.c (has_dispatch): New.
    (get_mem_group): New.
    (is_cmp): New.
    (dispatch_violation): New.
    (is_branch): New.
    (is_prefetch): New.
    (init_window): New.
    (allocate_window): New.
    (init_dispatch_sched): New.
    (is_end_basic_block): New.
    (process_end_window): New.
    (allocate_next_window): New.
    (find_constant_1): New.
    (find_constant): New.
    (get_num_immediate): New.
    (has_immediate): New.
    (get_insn_path): New.
    (dispatch_group): New.
    (count_num_restricted): New.
    (fits_dispatch_window): New.
    (add_insn_window): New.
    (add_to_dispatch_window): New.
    (debug_dispatch_window_file): New.
    (debug_dispatch_window): New.
    (debug_insn_dispatch_info_file): New.
    (debug_ready_dispatch): New.
    (do_dispatch): New.
    (has_dispatch): New.

Reza Yazdani
--------------------------------------------------------------------------
Previous communications regarding this patch:

There were complains that there are too many hooks in my implementation. I changed the interface and only two hooks are used in the current implementation. One for boolean functions and one for action routines.

RY

--- On Fri, 8/20/10, Vladimir N. Makarov <vmakarov@redhat.com> wrote:

> From: Vladimir N. Makarov <vmakarov@redhat.com>
> Subject: Re: Patch for AMD Dispatch Scheduler
> To: "reza yazdani" <yazdani_reza@yahoo.com>
> Cc: gcc-patches@gcc.gnu.org, jh@suse.cz, ubizjak@gmail.com, sebpop@gmail.com
> Date: Friday, August 20, 2010, 8:29 AM
> On 08/12/2010 08:27 PM, reza yazdani
> wrote:
> > Dispatch scheduling is a new BD feature. It is
> composed of two parts: the scheduling part and the alignment
> part.
> > 
> > The scheduling part (this patch) arranges instructions
> to maximize the throughput of the hardware dispatcher. It
> makes sure dispatch widow boundaries are roughly observed.
> It is roughly, because the lengths of instructions, in
> number of bytes, are not known at the scheduling time. In
> x86 some instruction lengths may not be known until assembly
> time where information such as branch offsets are computed.
> Scheduling part is called once before register allocation
> and once after register allocation.
> > 
> > The alignment part (not in this patch) makes sure
> dispatch widows align at the correct boundaries.
> > 
> > Dispatch Scheduling is implemented as an extension to
> Haifa Scheduler pass. Scheduler is programed to follow
> x86-BD dispatching rules during the scheduling.
> > 
>   I thought about this patch for a long time. 
> The insn scheduler is
> very machine-dependent pass and it is hard to use the same
> model for
> so different targets.  The insn scheduler reached a
> point where it has
> too many hooks some of them is duplicated.  We need to
> do some work to
> decrease number of hooks and their renaming.  For
> example, code for
> add_to_dispatch_window call can be hidden in variable_issue
> or
> dispatch_init in issue_rate or
> dfa_{pre,post}_cycle_insn.  Of course,
> the work is not for you because it needs a lot of knowledge
> about GCC
> insn scheduler.
> 
>   As for the scheduler part of the patch.  I
> think it is ok for now.
> But if you used dfa pipeline hazard recognizer as Richard
> Henderson
> mentioned, you would not need function
> ready_remove_first_dispatch and
> you could use first cycle multi-pass insn scheduling
> (although it is
> not necessary for your target because constraints do not
> depend on the
> insn order in the window as for Itanium but it might change
> in future
> who knows).  You could also use modulo-scheduling
> which might be more
> important for OOO processors than insn scheduling. 
> You would not need
> a new option too for dispatch scheduling.  Writing for
> dfa description
> of a new processor would require nontraditional programming
> like
> defining new attributes for md insns and usage of them in
> dfa
> description.  I think it is too late for GCC4.6 to
> rewrite this all.
> But I think it is worth to be considered for next GCC
> release.  I guess you
> could avoid the current implementation if you discussed the
> approach with the GCC
> community before starting the implementation.
> 
>   On the other hand, the current implementation of dfa
> pipeline hazard recognizer
> has own disadvantage for x86/x86_64.  The recognizer
> is considered to be too big
> because of numerous automata.
> 
>   The scheduler part of patch is ok for the trunk.
> 
> Thanks.
> 
> 
> > A new command line flag –mdispatch-scheduler is
> defined. This option sets flag_dispatch_scheduling. To
> perform dispatch scheduling “-march=bdver1” and Haifa
> Scheduling flags must all be selected on the command line.
> > 
> 
> 
------------------------------------
From: "Richard Henderson" <rth@redhat.com>

On 08/12/2010 05:27 PM, reza yazdani wrote:
> The scheduling part (this patch) arranges instructions to maximize
> the throughput of the hardware dispatcher. It makes sure dispatch
> widow boundaries are roughly observed. It is roughly, because the
> lengths of instructions, in number of bytes, are not known at the
> scheduling time. In x86 some instruction lengths may not be known
> until assembly time where information such as branch offsets are
> computed. Scheduling part is called once before register allocation
> and once after register allocation.

I'm a bit confused how this "dispatch" scheduling is different
from other scheduling, and why it needs so many new hooks.  The
whole process smells very similar to "bundling" like we'd do on
ia64 for instance.

For instance, if I compare the structure of your new
ready_remove_first_dispatch function to choose_ready, I see
many similarities.  It sure looks like the multipass_dfa
scheduling hooks can be made to do what you want.

Can you explain what's fundamentally different about your bits?

---------------------------------------------------------------
From:"Uros Bizjak" <ubizjak@gmail.com>

On Fri, Aug 13, 2010 at 2:27 AM, reza yazdani <yazdani_reza@yahoo.com> wrote:
> Dispatch scheduling is a new BD feature. It is composed of two parts: the scheduling part and the alignment part.
>
> The scheduling part (this patch) arranges instructions to maximize the throughput of the hardware dispatcher. It makes sure dispatch widow boundaries are roughly observed. It is roughly, because the lengths of instructions, in number of bytes, are not known at the scheduling time. In x86 some instruction lengths may not be known until assembly time where information such as branch offsets are computed. Scheduling part is called once before register allocation and once after register allocation.
>
> The alignment part (not in this patch) makes sure dispatch widows align at the correct boundaries.
>
> Dispatch Scheduling is implemented as an extension to Haifa Scheduler pass. Scheduler is programed to follow x86-BD dispatching rules during the scheduling.
>
> A new command line flag –mdispatch-scheduler is defined. This option sets flag_dispatch_scheduling. To perform dispatch scheduling “-march=bdver1” and Haifa Scheduling flags must all be selected on the command line.
>
> Testing
> -------
>
> Self compile ran with “–mdispatch-scheduling –O3”. No new test added for this implementation. Make check of i386 tests passes. No difference in the number of failures with and without the dispatch flag.

+/* Number of allowable groups in a dispatch window.  It is an array
+   indexed by dispatch_group enum.  100 is used as a big number,
+   because the number of these kind of operations does not have any
+   effect in dispatch window, but we need them for other reasons in
+   the table.  */
+static int num_allowable_groups[disp_last] =
+{
+  0, 2, 1, 1, 2, 4, 4, 2, 1, BIG, BIG
+};

Some invalid number, like -1 doesn't fit there?

+/* Returns decode type on an AMDFAM10 machine which can be
+   "DIRECT", "DOUBLE", "VECTOR" which are decoded
+   to 1, 2 or more than 2 micro-ops respectively for INSN.  */
+static enum attr_amdfam10_decode
+dispatch_group_amdfam10_decode (rtx insn)

Space between comment and function declaration.

+/* Return true if INSN is a prefetch instruction.  */
+static bool
+is_prefetch (rtx insn)
+{
+  return ((strcmp (GET_RTX_NAME (GET_CODE (insn)), "prefetch_sse") == 0)
+         || (strcmp (GET_RTX_NAME (GET_CODE (insn)),
+                     "prefetch_sse_rex") == 0)
+         || (strcmp (GET_RTX_NAME (GET_CODE (insn)),
+                     "prefetch_3dnow") == 0)
+         || (strcmp (GET_RTX_NAME (GET_CODE (insn)),
+                     "prefetch_3dnow_rex") == 0));
+}

No! Please introduce "prefetch" type and handle it elsewhere instead
of the call to is_prefetch. The names already changed in the
mainline...

+static void
+process_end_window (void)
+{
+  gcc_assert (dispatch_window_list->num_insn <= MAX_INSN);
+  if (dispatch_window_list->next)
+    {
+      gcc_assert (dispatch_window_list1->num_insn <= MAX_INSN);
+      gcc_assert (dispatch_window_list->window_size +
dispatch_window_list1->window_size <= 48);
+      init_window (1);
+    }
+  init_window (0);
+}

Watch for line lengths (there are multiple violations in the patch).

+static void
+find_con (const_rtx in_rtx, int *imm, int *imm32, int *imm64)

find_constant, please ?
+  code = GET_CODE (in_rtx);
+  if (code == CONST_INT)
+    {
+      (*imm)++;
+      (*imm32)++;
+    }
+  else if (code == CONST_DOUBLE)
+    {
+      (*imm)++;
+      (*imm64)++;
+    }

This will work in the wrong way on 64bit hosts, where CONST_INT also
covers 64bit immediates. Try to compile:

long long test (long long a)
{
  return a + 0x1122334455667788ll;
}

#(insn:TI 6 3 23 imm.c:2 (set (reg:DI 0 ax [61])
#        (const_int 1234605616436508552 [0x1122334455667788])) 89
{*movdi_1_rex64} (expr_list:REG_EQUIV (const_int 1234605616436508552
[0x1122334455667788])
#        (nil)))
    movabsq    $1234605616436508552, %rax    # 6    *movdi_1_rex64/3    [length = 10]

+static int
+get_num_imm (rtx insn, int *imm, int *imm32, int *imm64)

get_num_immediates...
Also, this function should be rewritten to use for_each_rtx to call
find_con, see many examples in gcc source.

+static bool
+has_imm (rtx insn)

has_immediate

+/* Get bytes length of INSN.
+   This function is very similar to the static function min_insn_size
+   in i386.c.  The main difference is the use of get_attr_length.  */
+
+static int
+get_insn_length (rtx insn)

Huh? min_insn_size also uses get_attr_length. This function is almost
exact copy of min_insn_size (minus some early discards and "important
cases" that are present in min_insn_size and not here). Please remove
this function.

+static enum insn_path
+get_insn_path (rtx insn)
+{
+  enum attr_amdfam10_decode path = dispatch_group_amdfam10_decode (insn);
+
+  if ((int)path == 0)
+    return path_single;
+
+  if ((int)path == 1)
+    return path_double;
+
+  return path_multi;

Just inline dispatch_group_amdfam10_decode (this is the only user).
Also, don't cast attributes, you can use
AMDFAM10_DECODE_{DIRECT,VECTOR,DOUBLE} defines.

+static int
+count_num_restricted (rtx insn, dispatch_windows *window_list)

Watch line lengths in this function!

+static void
+print_dispatch_window (FILE *file, int window_num)

Please also make this a DEBUG_FUNCTION, the convention is to name it
"debug_dispatch_window_file".

+  fprintf (file, "  num_imm = %d, num_imm_32 = %d, num_imm_64 = %d,\
+ imm_size = %d\n",
+          list->num_imm, list->num_imm_32, list->num_imm_64, list->imm_size);

No multiline strings... If it can't be split, it is tolerable for the
string can go over 72 character limit.

+/* Print to stderr a dispatch window.  */

Please print to stdout, as all other debug functions do.

+DEBUG_FUNCTION void
+debug_print_dispatch_window (int window_num)

debug_dispatch_window

+static void
+print_insn_dispatch_info (FILE *file, rtx insn)

Similar to above, make this a DEBUG_FUNCTION and rename it to
debug_insn_dispatch_info_file. Also, watch for multiline strings.

+DEBUG_FUNCTION void
+debug_print_insn_dispatch_info (rtx insn)
+{
+  print_insn_dispatch_info (stderr, insn);
+}

As above, print to stdout and rename this function to debug_insn_dispatch_info.

Thanks,
Uros.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: dispatch-diff.txt
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20100824/cfd5c8b7/attachment.txt>