This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack)


On 05/20/16 11:09, Alexander Monakov wrote:

This patch implements '-msoft-stack' code generation variant for NVPTX.  The
goal is to avoid relying on '.local' memory space for placement of automatic
data, and instead have an explicitely-maintained stack pointer (which can be
set up to point to preallocated global memory space).  This allows to have
stack data accessible from all threads and modifiable with atomic
instructions.  This also allows to implement variable-length stack allocation
(for 'alloca' and C99 VLAs).

Each warp has its own 'soft stack' pointer.  It lives in shared memory array
called __nvptx_stacks at index %tid.y (like OpenACC, OpenMP is offloading is
going to use launch geometry such that %tid.y gives the warp index).  It is
retrieved in function prologue (if the function needs a stack frame) and may
also be written there (if the function is non-leaf, so that its callees see
the updated stack pointer), and restored prior to returning.

Startup code is responsible for setting up the initial soft-stack pointer. For
-mmainkernel testing it is libgcc's __main, for OpenMP offloading it's the
kernel region entry code.

ah, that's much more understandable, thanks. Presumably this doesn't support worker-single mode (in OpenACC parlance, I don't know what the OpenMP version of that is?) And neither would it support calls from vector-partitioned code (I think that's SIMD in OpenMP-land?). It seems like we should reject the combination of -msoft-stack -fopenacc?

2016-05-19  Alexander Monakov  <amonakov@ispras.ru>

2016-05-19  Alexander Monakov  <amonakov@ispras.ru>

2016-05-19  Alexander Monakov  <amonakov@ispras.ru>

2016-05-19  Alexander Monakov  <amonakov@ispras.ru>

2016-03-15  Alexander Monakov  <amonakov@ispras.ru>

2015-12-14  Alexander Monakov  <amonakov@ispras.ru>

2015-12-09  Alexander Monakov  <amonakov@ispras.ru>

why so many changelogs? The on-branch development history is irrelvant for trunk -- the usual single changelog style should be followed.


diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 2d4dad1..700c4b0 100644

@@ -992,16 +1091,56 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)


+  else if (need_frameptr || cfun->machine->has_varadic || cfun->calls_alloca)
+    {
+      /* Maintain 64-bit stack alignment.  */

This block needs a more descriptive comment -- it appears to be doing a great deal more than maintaining 64-bit stack alignment!

+      int keep_align = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
+      sz = ROUND_UP (sz, keep_align);
+      int bits = POINTER_SIZE;
+      fprintf (file, "\t.reg.u%d %%frame;\n", bits);
+      fprintf (file, "\t.reg.u32 %%fstmp0;\n");
+      fprintf (file, "\t.reg.u%d %%fstmp1;\n", bits);
+      fprintf (file, "\t.reg.u%d %%fstmp2;\n", bits);

Some of these register names appear to be long lived -- and referenced in other functions. It would be better to give those more descriptive names, or even give them hard-regs. You should certainly do so for those that are already hard regs (%frame & %stack) -- is it more feasible to augment init_frame to initialize them?

@@ -1037,6 +1178,10 @@ nvptx_output_return (void)
 {
   machine_mode mode = (machine_mode)cfun->machine->return_mode;

+  if (cfun->machine->using_softstack)
+    fprintf (asm_out_file, "\tst.shared.u%d [%%fstmp2], %%fstmp1;\n",

See note above about obscure reg names.

  Since ptx is a virtual target, we just define a few
    hard registers for special purposes and leave pseudos unallocated.
@@ -200,6 +205,7 @@ struct GTY(()) machine_function
   bool is_varadic;  /* This call is varadic  */
   bool has_varadic;  /* Current function has a varadic call.  */
   bool has_chain; /* Current function has outgoing static chain.  */
+  bool using_softstack; /* Need to restore __nvptx_stacks[tid.y].  */

Comment should describe what the attribute is, not what it implies. In this case I think it's /* Current function has a soft stack frame. */


diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 33a4862..e5650b6 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md


+(define_insn "set_softstack_insn"
+  [(unspec [(match_operand 0 "nvptx_register_operand" "R")] UNSPEC_ALLOCA)]
+  "TARGET_SOFT_STACK"
+{
+  return (cfun->machine->using_softstack
+	  ? "%.\\tst.shared%t0\\t[%%fstmp2], %0;"
+	  : "");
+})

Is this alloca related (UNSPEC_ALLOCA) or restore related (invoked in restore_stack_block), or stack setting (as insn name suggests). Things seem inconsistently named. Comments would be good.


 (define_expand "restore_stack_block"
   [(match_operand 0 "register_operand" "")
    (match_operand 1 "register_operand" "")]
   ""
 {
+  if (TARGET_SOFT_STACK)
+    {
+      emit_move_insn (operands[0], operands[1]);
+      emit_insn (gen_set_softstack_insn (operands[0]));

Is it necessary to store the soft stack here? Only restore_stack_block seems defined, and not save_stack_block. Why the asymmetry?


diff --git a/gcc/testsuite/gcc.target/nvptx/softstack.c b/gcc/testsuite/gcc.target/nvptx/softstack.c
new file mode 100644
index 0000000..73e60f2
--- /dev/null


This test is ok, but probably wise to check varadic generates the expected code?

Are changes needed in nvptx's crt0.s (or an auxilary file in libc or libgcc) to make this work for testing? Have you tested an nvptx-elf target with this flag on?

nathan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]