This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] nvptx: implement automatic storage in custom stacks
- From: Alexander Monakov <amonakov at ispras dot ru>
- To: Bernd Schmidt <bschmidt at redhat dot com>
- Cc: gcc-patches at gcc dot gnu dot org, Jakub Jelinek <jakub at redhat dot com>, Dmitry Melnik <dm at ispras dot ru>
- Date: Thu, 12 Nov 2015 17:59:42 +0300 (MSK)
- Subject: Re: [PATCH] nvptx: implement automatic storage in custom stacks
- Authentication-results: sourceware.org; auth=none
- References: <alpine dot LNX dot 2 dot 20 dot 1511121609140 dot 3050 at monopod dot intra dot ispras dot ru> <5644A4E5 dot 2040908 at redhat dot com>
On Thu, 12 Nov 2015, Bernd Schmidt wrote:
> > I've run it through make -k check-c regtesting. These are new fails, all
> > mysterious:
>
> These would have to be investigated first.
Any specific suggestions? The PTX code emitted from GCC differs only in
prologue/epilogue, so whatever's broken... I think is unlikely due to this
change. I can give it another try after upgrading CUDA driver and cuda-gdb
from 7.0 to latest.
> > + sz = (sz + keep_align - 1) & ~(keep_align - 1);
>
> Use the ROUND_UP macro.
OK, thanks.
> > + fprintf (file, "\tmul%s.u32 %%fstmp1, %%fstmp0, %d;\n",
> > + bits == 64 ? ".wide" : "", bits);
>
> Use a shift.
I think mul is acceptable here: PTX JIT is handling it properly, according to
what I saw while investigating in cuda-gdb. If I used a shift, I'd also have
to introduce another instruction for a widening integer conversion in the
64-bit case. Do you insist?
> > +
> > + if (need_softstack_decl)
> > + {
> > + fprintf (asm_out_file, ".extern .shared .u64 __nvptx_stacks[];\n;");
> > + }
>
> Lose excess braces.
OK.
> > +.global .u64 %__softstack[16384];
>
> Maybe declarea as .u8 so you don't have two different constants for the stack
> size?
OK, with ".align 8" to ensure 64-bit alignment.
> > + .reg .u64 %stackptr;
> > + mov.u64 %stackptr, %__softstack;
> > + cvta.global.u64 %stackptr, %stackptr;
> > + add.u64 %stackptr, %stackptr, 131072;
> > + st.shared.u64 [__nvptx_stacks], %stackptr;
> > +
>
> I'm guessing you have other missing pieces for setting this up for multiple
> threads.
This is crt0.s, which is linked in only for single-threaded testing with
-mmainkernel; for OpenMP, the intention is to handle it in the file that
implements libgomp_nvptx_main.
Thanks.
Alexander