Created attachment 34041 [details] asyncwait-2.c When compiling an openacc test-case with the gomp-4_0-branch, I run into: ... $ ./lean-c/install/bin/gcc asyncwait-2.c -fopenacc -flto --param lto-min-partition=900 /tmp/ccRhKrwN.ltrans0.ltrans.o:(__gnu_offload_funcs+0x0): undefined reference to `main._omp_fn.20' /tmp/ccRhKrwN.ltrans0.ltrans.o:(__gnu_offload_funcs+0x8): undefined reference to `main._omp_fn.19' /tmp/ccRhKrwN.ltrans1.ltrans.o: In function `main': ccRhKrwN.ltrans1.o:(.text+0xe19): undefined reference to `cuStreamCreate' collect2: error: ld returned 1 exit status ... Note that I'm using patch https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00085.html on top of gomp-4_0-branch, otherwise any openacc testcase will fail in lto when processing openacc builtins.
I only run into this with -flto-partition=balanced. From the exe.wpa.000i.cgraph dump: ... Total unit size: 2034, partition size: 1000 Step 0: added main._omp_fn.0/24, size 22, cost 1/0 best 1/0, step 0 Step 1: added main._omp_fn.1/23, size 44, cost 2/0 best 2/0, step 1 ... Step 17: added main._omp_fn.17/7, size 694, cost 18/0 best 18/0, step 17 Step 18: added main._omp_fn.18/6, size 735, cost 19/0 best 19/0, step 18 Step 19: added main._omp_fn.19/5, size 775, cost 20/0 best 19/0, step 18 Step 20: added main._omp_fn.20/4, size 816, cost 21/0 best 19/0, step 18 Step 21: added main/3, size 2034, cost 53/21 best 19/0, step 18 Unwinding 3 insertions to step 18 New partition Step 19: added main._omp_fn.19/5, size 40, cost 1/21 best 1/21, step 19 Step 20: added main._omp_fn.20/4, size 81, cost 2/21 best 2/21, step 20 Step 21: added main/3, size 1299, cost 72/23 best 2/21, step 20 Privatizing symbol name: main._omp_fn.0 -> main._omp_fn.0.lto_priv.0 Promoting as hidden: main._omp_fn.0 Privatizing symbol name: main._omp_fn.1 -> main._omp_fn.1.lto_priv.1 Promoting as hidden: main._omp_fn.1 ... In .exe.ltrans0.s, main._omp_fn.18 is privatized, but exported as global hidden: ... .text .globl main._omp_fn.18.lto_priv.18 .hidden main._omp_fn.18.lto_priv.18 .type main._omp_fn.18.lto_priv.18, @function main._omp_fn.18.lto_priv.18: ... In .exe.ltrans1.s, it is referenced, and declared as hidden: ... .hidden main._omp_fn.18.lto_priv.18 ... Conversely, in .exe.ltrans1.s, main._omp_fn.20 is not privatized: ... .type main._omp_fn.20, @function main._omp_fn.20: ... But in .exe.ltrans0.s, main._omp_fn.20 is referenced, and not declared: ... .omp_func_table.4851: .quad main._omp_fn.20 ...
I tried to reproduce this issue using trunk gcc and OpenMP: gcc -fopenmp -flto -flto-partition=balanced -lgfortran -save-temps libgomp/testsuite/libgomp.fortran/target2.f90 But all functions are privatized, e.g. __target2_MOD_foo._omp_fn.3.lto_priv.5, it's exported as global hidden in partition 1, and referenced in the offload table in partition 0 as it was planned. We should figure out why in your case main._omp_fn.19 and main._omp_fn.20 were not marked as global...
Can you reproduce this with the trunk patch kit I posted internally? gomp-4_0-branch is somewhat out of date wrt offloading.
> Can you reproduce this with the trunk patch kit I posted internally? > gomp-4_0-branch is somewhat out of date wrt offloading. No, it does not reproduce that way. The split falls somewhat differently: ... Total unit size: 1929, partition size: 900 Step 0: added main._omp_fn.20/24, size 37, cost 1/0 best 1/0, step 0 Step 1: added main._omp_fn.19/23, size 73, cost 2/0 best 2/0, step 1 Step 2: added main._omp_fn.18/22, size 110, cost 3/0 best 3/0, step 2 ... Step 17: added main._omp_fn.3/7, size 660, cost 18/0 best 18/0, step 17 Step 18: added main._omp_fn.2/6, size 696, cost 19/0 best 18/0, step 17 Step 19: added main._omp_fn.1/5, size 714, cost 20/0 best 18/0, step 17 Step 20: added main._omp_fn.0/4, size 732, cost 21/0 best 18/0, step 17 Step 21: added main/3, size 1929, cost 53/21 best 18/0, step 17 Unwinding 4 insertions to step 17 New partition Step 18: added main._omp_fn.2/6, size 36, cost 1/21 best 1/21, step 18 Step 19: added main._omp_fn.1/5, size 54, cost 2/21 best 2/21, step 19 Step 20: added main._omp_fn.0/4, size 72, cost 3/21 best 3/21, step 20 Step 21: added main/3, size 1269, cost 71/24 best 3/21, step 20 ... But I think the main difference is that the offload table and main (using the offload table) are now in the same partition. I don't know whether that's by design or accident.
(In reply to vries from comment #4) > But I think the main difference is that the offload table and main (using > the offload table) are now in the same partition. I don't know whether > that's by design or accident. What do you mean by "main (using the offload table)"? The design was to have the offload table in the first partition (number zero), and the table should be used only in libgomp through the GOMP_offload_register function.
Created attachment 34058 [details] ltrans0.s
Created attachment 34059 [details] ltrans1.s
(In reply to Ilya Verbin from comment #5) > (In reply to vries from comment #4) > > But I think the main difference is that the offload table and main (using > > the offload table) are now in the same partition. I don't know whether > > that's by design or accident. > > What do you mean by "main (using the offload table)"? > The design was to have the offload table in the first partition (number > zero), It seems to be in partition 1: ... $ grep -c OFFLOAD_TABLE ltrans1.s ltrans0.s ltrans1.s:33 ltrans0.s:0 ... > and the table should be used only in libgomp through the > GOMP_offload_register function. It's used like this, in main: ... movl $__OFFLOAD_TABLE__, %esi ...
(In reply to vries from comment #8) > (In reply to Ilya Verbin from comment #5) > > (In reply to vries from comment #4) > > > But I think the main difference is that the offload table and main (using > > > the offload table) are now in the same partition. I don't know whether > > > that's by design or accident. > > What do you mean by "main (using the offload table)"? > It's used like this, in main: > movl $__OFFLOAD_TABLE__, %esi Ah, I see, this is something OpenACC specific, for some reason it passes __OFFLOAD_TABLE__ to all functions. Anyway, this is just a weak symbol, which points to the start of the offload table. It's defined by mkoffload when all partitions are ready. I don't think that it could somehow affect the LTO partitioning and the functions' visibility.
Can't reproduce this with current trunk.