This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] nvptx: implement automatic storage in custom stacks
- From: Bernd Schmidt <bschmidt at redhat dot com>
- To: Alexander Monakov <amonakov at ispras dot ru>, gcc-patches at gcc dot gnu dot org
- Cc: Jakub Jelinek <jakub at redhat dot com>, Dmitry Melnik <dm at ispras dot ru>
- Date: Thu, 12 Nov 2015 15:40:37 +0100
- Subject: Re: [PATCH] nvptx: implement automatic storage in custom stacks
- Authentication-results: sourceware.org; auth=none
- References: <alpine dot LNX dot 2 dot 20 dot 1511121609140 dot 3050 at monopod dot intra dot ispras dot ru>
I'm proposing the following patch as a step towards resolving the issue with
inaccessibility of stack storage (.local memory) in PTX to other threads than
the one using that stack. The idea is to have preallocated stacks, and have
__nvptx_stacks[] array in shared memory hold current stack pointers. Each
thread is maintaining __nvptx_stacks[tid.y] as its stack pointer, thus for
OpenMP the intent is to preallocate on a per-warp basis (not per-thread).
For OpenMP SIMD regions we'll have to ensure that conflicting accesses are not
introduced.
This is of course really ugly; I'd propose we keep it on an nvptx-OpenMP
specific branch for now until we know that this is really going somewhere.
I've run it through make -k check-c regtesting. These are new fails, all
mysterious:
These would have to be investigated first.
+ sz = (sz + keep_align - 1) & ~(keep_align - 1);
Use the ROUND_UP macro.
+ fprintf (file, "\tmul%s.u32 %%fstmp1, %%fstmp0, %d;\n",
+ bits == 64 ? ".wide" : "", bits);
Use a shift.
+
+ if (need_softstack_decl)
+ {
+ fprintf (asm_out_file, ".extern .shared .u64 __nvptx_stacks[];\n;");
+ }
Lose excess braces.
+.global .u64 %__softstack[16384];
Maybe declarea as .u8 so you don't have two different constants for the
stack size?
+ .reg .u64 %stackptr;
+ mov.u64 %stackptr, %__softstack;
+ cvta.global.u64 %stackptr, %stackptr;
+ add.u64 %stackptr, %stackptr, 131072;
+ st.shared.u64 [__nvptx_stacks], %stackptr;
+
I'm guessing you have other missing pieces for setting this up for
multiple threads.
Bernd