[Bug target/85519] [nvptx, openacc, openmp, testsuite] Recursive tests may fail due to thread stack limit

vries at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Wed Apr 25 08:41:00 GMT 2018


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85519

--- Comment #1 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #0)
> All these solutions work until the next failure shows up. It would be nice
> to fix this more definitely in some way, but I'm not sure how.

We could try to figure out the frame size of the recursive function.

Using GOMP_DEBUG=1 we see the JIT compile/link log:
...
Link log warning : Stack size for entry function 'main$_omp_fn$0' cannot be
statically determined
info    : 0 bytes gmem
info    : Function properties for 'main$_omp_fn$0':
info    : used 8 registers, 0 stack, 0 bytes smem, 328 bytes cmem[0], 0 bytes
lmem                                 
...
but the stack size is only shown for the offloading region, not for individual
functions.

Using GOMP_NVPTX_SAVE_TEMPS=1 we could get the cubin, and dump the resource
usage:
...
$ cuobjdump -res-usage gomp-nvptx.*.cubin  

Resource usage:
 Common:
  GLOBAL:0
 Function rec:
  REG:8 STACK:0 SHARED:0 LOCAL:0 TEXTURE:0 SURFACE:0 SAMPLER:0
 Function main$_omp_fn$0:
  REG:8 STACK:UNKNOWN SHARED:0 LOCAL:0 CONSTANT[0]:328 TEXTURE:0 SURFACE:0
SAMPLER:0
...
but the STACK entry for rec shows up as 0.

Finally, using nvdisasm (or GOMP_NVPTX_DISASM=1) we find the info:
...
$ nvdisasm gomp-nvptx.*.cubin
        //----- nvinfo : EIATTR_FRAME_SIZE
        .align          4
        /*0000*/        .byte   0x04, 0x11
        /*0002*/        .short  (.L_6 - .L_5)
        .align          4
.L_5:
        /*0004*/        .word   index@(rec)
        /*0008*/        .word   0x00000010             <<<<<<<<


        //----- nvinfo : EIATTR_FRAME_SIZE
        .align          4
.L_6:
        /*000c*/        .byte   0x04, 0x11
        /*000e*/        .short  (.L_8 - .L_7)
        .align          4
.L_7:
        /*0010*/        .word   index@(main$_omp_fn$0)
        /*0014*/        .word   0x00000000


        //----- nvinfo : EIATTR_MIN_STACK_SIZE
        .align          4
.L_8:
        /*0018*/        .byte   0x04, 0x12
        /*001a*/        .short  (.L_10 - .L_9)
        .align          4
.L_9:
        /*001c*/        .word   index@(main$_omp_fn$0)
        /*0020*/        .word   0xffffffff
.L_10:
...


So, we could write some tcl function to get the frame size for a function, and
xfail or skip the test if the frame size is bigger that given constant x, but
AFAIK dejagnu is not setup for this. The best we could do is to add a dg-final
check and emit a:
...
PASS: rec.c dg-nvptx-frame-size-check main$_omp_fn$0 0
FAIL: rec.c dg-nvptx-frame-size-check rec 8
...


Or, going for a more precise check:
...
FAIL: rec.c dg-nvptx-stack-size-check main$_omp_fn$0,rec=65 (peak stack size
1048 is larger than stack size limit 1024)
...
where you then check that frame-size (main$_omp_fn$0) + 65 * frame-size (rec) <
udaThreadGetLimit(&size, cudaLimitStackSize)).

Presumably formulating the peak stack composition gets more involved with
openmp test cases which have a more complicated call stack.


More information about the Gcc-bugs mailing list