This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

libgomp: ... error: the launch timed out and was terminated


Hi,

I have installed gcc-6.3.0 with support for OpenACC and OpenMP for
NVIDIA Quadro K2200 on my "SUSE Linux Enterprise Server 12.2 (x86_64)".
We use CUDA Toolkit 8.0. Unfortunately, I get an error computing "pi"
with 1,000,000,000 iterations using OpenACC, while I get the expected
result with OpenMP or OpenACC from "pgcc". The program works fine if
I use only 100,000,000 iterations. Let's begin with the working versions.

loki OpenACC 232  gcc -fopenmp -o pi_gcc_openmp pi_OpenACC_OpenMP.c
loki OpenACC 233 pi_gcc_openmp pi 1000000000
Using OpenMP.
pi = 0.0000000000
loki OpenACC 234


loki OpenACC 228 pgcc -Mcuda=cuda8.0 -ta=nvidia -Minfo=all -o pi_pgcc_openacc pi_OpenACC_OpenMP.c
main:
     44, Accelerator kernel generated
         Generating Tesla code
         49, #pragma acc loop gang, vector(1024) /* blockIdx.x threadIdx.x */
             Generating reduction(+:pi)
l
loki OpenACC 229 pi_pgcc_openacc 1000000000
Using OpenACC.
pi = 3.1415926536
loki OpenACC 230

By the way, does "gcc" have a similar command line option to "-Minfo=all"
to show which optimizations have been used and what kind of code it has
generated?


I used the following command to configure "gcc".

loki OpenACC 214 gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-6.3.0_accel/libexec/gcc/x86_64-pc-linux-gnu/6.3.0/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-6.3.0/configure --prefix=/usr/local/gcc-6.3.0_accel --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --enable-offload-targets=nvptx-none=/usr/local/gcc-6.3.0_accel/bin --with-cuda-driver=/usr/local/cuda/ --enable-languages=c,c++,fortran,lto --enable-nls --enable-threads=posix --with-gmp-lib=/usr/local/lib64 --with-gmp-include=/usr/local/include --with-mpfr-lib=/usr/local/lib64 --with-mpfr-include=/usr/local/include --with-mpc-lib=/usr/local/lib64 --with-mpc-include=/usr/local/include --with-isl-lib=/usr/local/lib64 --with-isl-include=/usr/local/include
Thread model: posix
gcc version 6.3.0 (GCC)
loki OpenACC 215


loki OpenACC 201 /usr/bin/time -p pi_gcc_openacc
Using OpenACC.
pi = 3.1415926536
real 1.91
user 0.00
sys 0.00


loki OpenACC 202 /usr/bin/time -p pi_gcc_openacc 1000000000
Using OpenACC.

libgomp: cuStreamSynchronize error: the launch timed out and was terminated

libgomp: cuMemFreeHost error: the launch timed out and was terminated
real 39.33
user 0.00
sys 0.00
loki OpenACC 203


"dmesg -T" shows the following messages for the above problem.

...
[Tue Feb 14 13:27:33 2017] NVRM: Xid (PCI:0000:03:00): 8, Channel 0000002f
[Tue Feb 14 13:27:33 2017] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context [Tue Feb 14 13:27:35 2017] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context [Tue Feb 14 13:27:37 2017] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context [Tue Feb 14 13:27:39 2017] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context [Tue Feb 14 13:27:41 2017] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context [Tue Feb 14 13:27:43 2017] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context [Tue Feb 14 13:27:45 2017] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context [Tue Feb 14 13:27:47 2017] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context [Tue Feb 14 13:27:49 2017] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context [Tue Feb 14 13:27:51 2017] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context [Tue Feb 14 13:27:54 2017] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context [Tue Feb 14 13:27:55 2017] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context [Tue Feb 14 13:27:57 2017] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0] [Tue Feb 14 13:27:59 2017] Modules linked in: joydev st binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache fuse af_packet snd_hda_codec_hdmi nvidia_drm(POEX) nvidia_modeset(POEX) iscsi_ibft iscsi_boot_sysfs nvidia_uvm(POEX) snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel intel_rapl x86_pkg_temp_thermal snd_hda_codec msr snd_hda_core intel_powerclamp snd_hwdep coretemp iTCO_wdt nvidia(POEX) iTCO_vendor_support snd_pcm kvm_intel kvm snd_timer snd irqbypass e1000e crct10dif_pclmul crc32_pclmul mei_me sb_edac crc32c_intel lpc_ich ptp pcspkr edac_core i2c_i801 mei pps_core soundcore shpchp mfd_core wmi cryptd fujitsu_laptop fjes video processor button nfsd auth_rpcgss nfs_acl lockd grace sunrpc ext4 crc16 jbd2 mbcache sr_mod sd_mod cdrom hid_generic usbhid drm_kms_helper ahci xhci_pci syscopyarea xhci_hcd
[Tue Feb 14 13:27:59 2017]  sysfillrect
[Tue Feb 14 13:27:59 2017]  libahci
[Tue Feb 14 13:27:59 2017]  ehci_pci
[Tue Feb 14 13:27:59 2017]  sysimgblt
[Tue Feb 14 13:27:59 2017]  ata_generic
[Tue Feb 14 13:27:59 2017]  fb_sys_fops
[Tue Feb 14 13:27:59 2017]  ehci_hcd
[Tue Feb 14 13:27:59 2017]  libata
[Tue Feb 14 13:27:59 2017]  drm
[Tue Feb 14 13:27:59 2017]  usbcore
[Tue Feb 14 13:27:59 2017]  usb_common
[Tue Feb 14 13:27:59 2017]  sg
[Tue Feb 14 13:27:59 2017]  dm_multipath
[Tue Feb 14 13:27:59 2017]  dm_mod
[Tue Feb 14 13:27:59 2017]  scsi_dh_rdac
[Tue Feb 14 13:27:59 2017]  scsi_dh_emc
[Tue Feb 14 13:27:59 2017]  scsi_dh_alua
[Tue Feb 14 13:27:59 2017]  scsi_mod
[Tue Feb 14 13:27:59 2017]  autofs4

[Tue Feb 14 13:27:59 2017] Supported: No, Proprietary modules are loaded
[Tue Feb 14 13:27:59 2017] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P OEL X 4.4.38-93-default #1 [Tue Feb 14 13:27:59 2017] Hardware name: FUJITSU CELSIUS R940/D3358-A1, BIOS V5.0.0.9 R1.8.0 for D3358-A1x 04/02/2015 [Tue Feb 14 13:27:59 2017] task: ffffffff81c11500 ti: ffffffff81c00000 task.ti: ffffffff81c00000 [Tue Feb 14 13:27:59 2017] RIP: 0010:[<ffffffff8105c632>] [<ffffffff8105c632>] read_hpet+0xb2/0xd0
[Tue Feb 14 13:27:59 2017] RSP: 0018:ffff88048fa03c10  EFLAGS: 00000246
[Tue Feb 14 13:27:59 2017] RAX: 00000000c9213c35 RBX: ffff88048fa03c38 RCX: 0000000000000246 [Tue Feb 14 13:27:59 2017] RDX: c9213c3500000000 RSI: c9213c2c00000000 RDI: 0000000000000246 [Tue Feb 14 13:27:59 2017] RBP: 000000000cc95cc8 R08: ffff8804ccb1b358 R09: ffff8804ccb1b368 [Tue Feb 14 13:27:59 2017] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8804ccb1b5c0 [Tue Feb 14 13:27:59 2017] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880876b90008 [Tue Feb 14 13:27:59 2017] FS: 0000000000000000(0000) GS:ffff88048fa00000(0000) knlGS:0000000000000000
[Tue Feb 14 13:27:59 2017] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Tue Feb 14 13:27:59 2017] CR2: 00007f251dbcb000 CR3: 0000000001c0a000 CR4: 00000000001406f0
[Tue Feb 14 13:27:59 2017] Stack:
[Tue Feb 14 13:27:59 2017] ffffffff810f0241 ffff88048fa03c58 ffff8804ccb1b344 ffffffff810f02da [Tue Feb 14 13:27:59 2017] ffffffff810f0315 0000000058a2f7d0 ffffffff810f0241 ffff8804ccb1b340 [Tue Feb 14 13:27:59 2017] ffffffffa154f059 ffffffff810f02da ffffffff810f0315 ffff8804ccb1b348
[Tue Feb 14 13:27:59 2017] Call Trace:
[Tue Feb 14 13:27:59 2017]  [<ffffffff810f0241>] __getnstimeofday64+0x31/0xc0
[Tue Feb 14 13:27:59 2017]  [<ffffffff810f02da>] getnstimeofday64+0xa/0x30
[Tue Feb 14 13:27:59 2017]  [<ffffffff810f0315>] do_gettimeofday+0x15/0x50
[Tue Feb 14 13:27:59 2017] [<ffffffffa154f059>] os_get_current_time+0x19/0x30 [nvidia]
[Tue Feb 14 13:27:59 2017]  [<ffffffffa1ab3684>] _nv016773rm+0x14/0x40 [nvidia]
[Tue Feb 14 13:27:59 2017] DWARF2 unwinder stuck at _nv016773rm+0x14/0x40 [nvidia]

[Tue Feb 14 13:27:59 2017] Leftover inexact backtrace:

[Tue Feb 14 13:27:59 2017]  <IRQ>
[Tue Feb 14 13:27:59 2017]  [<ffffffffa1a61dfe>] ? _nv020684rm+0x26e/0x3e0 [nvidia]
[Tue Feb 14 13:27:59 2017]  [<ffffffffa1a62af7>] ? _nv020667rm+0x77/0x80 [nvidia]
[Tue Feb 14 13:27:59 2017]  [<ffffffffa181816e>] ? _nv010966rm+0xbe/0x1b0 [nvidia]
[Tue Feb 14 13:27:59 2017]  [<ffffffffa17e4b4d>] ? _nv009952rm+0x11d/0x160 [nvidia]
[Tue Feb 14 13:27:59 2017]  [<ffffffffa198021c>] ? _nv018671rm+0xec/0x2a0 [nvidia]
[Tue Feb 14 13:27:59 2017]  [<ffffffffa197a731>] ? _nv018650rm+0x1e1/0x400 [nvidia]
[Tue Feb 14 13:27:59 2017]  [<ffffffffa197a77d>] ? _nv018650rm+0x22d/0x400 [nvidia]
[Tue Feb 14 13:27:59 2017]  [<ffffffffa197a197>] ? _nv018651rm+0x637/0x760 [nvidia]
[Tue Feb 14 13:27:59 2017]  [<ffffffffa197d7ae>] ? _nv018696rm+0xde/0xf0 [nvidia]
[Tue Feb 14 13:27:59 2017]  [<ffffffffa197d770>] ? _nv018696rm+0xa0/0xf0 [nvidia]
[Tue Feb 14 13:27:59 2017]  [<ffffffffa197faf7>] ? _nv018698rm+0x417/0x590 [nvidia]
[Tue Feb 14 13:27:59 2017]  [<ffffffffa197d5d1>] ? _nv018697rm+0x51/0x150 [nvidia]
[Tue Feb 14 13:27:59 2017]  [<ffffffffa1a5ca22>] ? _nv016920rm+0x192/0xfd0 [nvidia]
[Tue Feb 14 13:27:59 2017] [<ffffffffa1abdb6b>] ? rm_run_rc_callback+0x9b/0xe0 [nvidia] [Tue Feb 14 13:27:59 2017] [<ffffffffa1544eb0>] ? nvidia_isr_kthread_bh+0x10/0x10 [nvidia] [Tue Feb 14 13:27:59 2017] [<ffffffffa1544f0e>] ? nvidia_rc_timer+0x5e/0x80 [nvidia]
[Tue Feb 14 13:27:59 2017]  [<ffffffff810e7910>] ? call_timer_fn+0x30/0x120
[Tue Feb 14 13:27:59 2017] [<ffffffffa1544eb0>] ? nvidia_isr_kthread_bh+0x10/0x10 [nvidia]
[Tue Feb 14 13:27:59 2017]  [<ffffffff810e91ef>] ? run_timer_softirq+0x1ff/0x2b0
[Tue Feb 14 13:27:59 2017]  [<ffffffff81080912>] ? __do_softirq+0xe2/0x2e0
[Tue Feb 14 13:27:59 2017]  [<ffffffff81080da5>] ? irq_exit+0xe5/0xf0
[Tue Feb 14 13:27:59 2017]  [<ffffffff815e517c>] ? reschedule_interrupt+0x8c/0xa0
[Tue Feb 14 13:27:59 2017]  <EOI>
[Tue Feb 14 13:27:59 2017]  [<ffffffff814aa4a7>] ? cpuidle_enter_state+0xd7/0x250
[Tue Feb 14 13:27:59 2017]  [<ffffffff810be9db>] ? cpu_startup_entry+0x29b/0x380
[Tue Feb 14 13:27:59 2017]  [<ffffffff81d6f0c1>] ? start_kernel+0x4c5/0x4d0
[Tue Feb 14 13:27:59 2017]  [<ffffffff81d6ea00>] ? set_init_arg+0x50/0x50
[Tue Feb 14 13:27:59 2017] [<ffffffff81d6e120>] ? early_idt_handler_array+0x120/0x120
[Tue Feb 14 13:27:59 2017]  [<ffffffff81d6e719>] ? x86_64_start_kernel+0x147/0x156
[Tue Feb 14 13:27:59 2017] Code: 05 b4 76 f8 00 8b 80 f0 00 00 00 48 89 c2 89 05 29 ca ba 00 48 c7 c7 40 90 c0 81 48 c1 e2 20 c6 07 00 0f 1f 40 00 48 89 cf 57 9d <0f> 1f 44 00 00 48 c1 ea 20 eb 83 48 89 cf 57 9d 0f 1f 44 00 00 [Tue Feb 14 13:27:59 2017] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context [Tue Feb 14 13:27:59 2017] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context [Tue Feb 14 13:28:03 2017] perf interrupt took too long (10484 > 10000), lowering kernel.perf_event_max_sample_rate to 12500
loki root 108



Does somebody know if something is wrong with my program, my environment, or
my gcc installation? Thank you very much for any help in advance.


Kind regards

Siegmar
/* gcc -fopenacc -foffload=nvptx-none -o pi_gcc_openacc pi_OpenACC_OpenMP.c
 * gcc -fopenmp -o pi_gcc_openmp pi_OpenACC_OpenMP.c
 * gcc -o pi_gcc_cpu pi_OpenACC_OpenMP.c
 *
 * pgcc -Mcuda=cuda8.0 -ta=nvidia -Minfo=all -o pi_pgcc_openacc pi_OpenACC_OpenMP.c
 * pgcc -mp -Mcuda=cuda8.0 -ta=nvidia -Minfo=all -o pi_pgcc_openmp pi_OpenACC_OpenMP.c
 * pgcc -Minfo=all -o pi_pgcc_cpu pi_OpenACC_OpenMP.c
 *
 * /usr/bin/time -p 
 *
 */


#include <stdio.h>
#include <stdlib.h>

//#define N 1000000000			/* error for gcc with OpenACC	*/
#define N 100000000			/* default number of intervals	*/


#define f(x) (4.0 / (1.0 + (x) * (x)))	/* function to compute "pi"	*/
#define vl   1024			/* vector length		*/

int main (int argc, char *argv[])
{
  double pi = 0.0f,
	 h;				/* width of an interval		*/
  int	 n;				/* number of intervals		*/

  if (argc == 2)
  {
    n = atoi (argv[1]);
  }
  else
  {
    n = N;
  }
  h = 1.0 / (double) n;
  #ifdef _OPENMP
    printf ("Using OpenMP.\n");
    #pragma omp parallel for default(none) shared(h, n) reduction(+:pi)
  #elif _OPENACC
    printf ("Using OpenACC.\n");
    #pragma acc parallel vector_length(vl) 
    #pragma acc loop reduction(+:pi)
  #else
    printf ("Using neither OpenACC nor OpenMP.\n");
  #endif
  for (long i = 0; i < n; i++)
  {
    double x;

    x = (h * (double) i) + (h / 2);	/* midpoint of i-th interval	*/
    pi += h * f(x);			/* tangent-trapezoidal rule	*/
  }
  printf ("pi = %.10f\n", pi);

  return EXIT_SUCCESS;
}

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]