The nvptx-target usually is built on x86_64-linux-gnu, but searching the web you'll see that these GPUs are also used in aarch64-linux-gnu and powerpc64le-linux-gnu systems. building the nvptx offload compiler on powerpc64le, you see reasonable test results for libgomp, and I see at last one powerpc64le related commit: 2015-10-09 James Norris <jnorris@codesourcery.com> * config/rs6000/rs6000.c (rs6000_offload_options): New. (TARGET_OFFLOAD_OPTIONS): New. However adding an aarch64_offload_options hook doesn't look that well for AArch64, there are still two type of issues triggered in the testsuite: FAIL: libgomp.c/../libgomp.c-c++-common/for-11.c (test for excess errors) Excess errors: lto1: fatal error: nvptx-none - 0-bit integer numbers unsupported (mode 'SI') and: NRESOLVED: libgomp.c/../libgomp.c-c++-common/for-9.c compilation failed to produce executable UNSUPPORTED: libgomp.c/../libgomp.c-c++-common/function-not-offloaded-aux.c spawn -ignore SIGHUP /home/ubuntu/gcc-10-10.1.0/build/./gcc/xgcc -B/home/ubuntu/gcc-10-10.1.0/build/./gcc/ -B/usr/aarch64-linux-gnu/b in/ -B/usr/aarch64-linux-gnu/lib/ -isystem /usr/aarch64-linux-gnu/include -isystem /usr/aarch64-linux-gnu/sys-include -isystem /home/ ubuntu/gcc-10-10.1.0/build/sys-include -fchecking=1 offload_device_nonshared_as411951.c -B/home/ubuntu/gcc-10-10.1.0/build/aarch64-li nux-gnu/./libgomp/ -B/home/ubuntu/gcc-10-10.1.0/build/aarch64-linux-gnu/./libgomp/.libs -I/home/ubuntu/gcc-10-10.1.0/build/aarch64-li nux-gnu/./libgomp -I../../../../src/libgomp/testsuite/../../include -I../../../../src/libgomp/testsuite/.. -Lno -fmessage-length=0 -f no-diagnostics-show-caret -Wno-hsa -fdiagnostics-color=never -B/home/ubuntu/gcc-10-10.1.0/debian/tmp-nvptx/usr/libexec/gcc/aarch64-li nux-gnu/10 -B/home/ubuntu/gcc-10-10.1.0/debian/tmp-nvptx/usr/bin -fopenmp -L/home/ubuntu/gcc-10-10.1.0/build/aarch64-linux-gnu/./libg omp/.libs -lm -o offload_device_nonshared_as411951.exe lto1: internal compiler error: bytecode stream: string too long for the string table 0x62559f string_for_index ../../src-nvptx/gcc/data-streamer-in.c:53 0x62559f bp_unpack_indexed_string(data_in*, bitpack_d*, unsigned int*) ../../src-nvptx/gcc/data-streamer-in.c:97 0x87a39b lto_input_mode_table(lto_file_decl_data*) ../../src-nvptx/gcc/lto-streamer-in.c:1685 0x5a076f lto_file_finalize ../../src-nvptx/gcc/lto/lto-common.c:2217 0x5a076f lto_create_files_from_ids ../../src-nvptx/gcc/lto/lto-common.c:2240 0x5a076f lto_file_read ../../src-nvptx/gcc/lto/lto-common.c:2295 0x5a076f read_cgraph_and_symbols(unsigned int, char const**) ../../src-nvptx/gcc/lto/lto-common.c:2747 0x58f523 lto_main() ../../src-nvptx/gcc/lto/lto.c:625 Please submit a full bug report, with preprocessed source if appropriate.
patch for the target hook posted at https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550534.html
(Remove powerpc64le-linux-gnu from the summary as this PR is only about aarch64-linux and GCC is known to work on powerpc64le-linux-gnu.) (In reply to Matthias Klose from comment #1) > patch for the target hook posted at > https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550534.html This patch has been committed on Fri Jul 24 16:17:44 2020 +0200 as https://gcc.gnu.org/g:29a14a1a907947fe9e43bce62d3468559f17da97
I think this issue and #111937 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111937) have the same root cause: aarch64 also sets NUM_POLY_INT_COEFFS to 2, which makes it incompatible with the default value for nvptx (which is 1).
*** Bug 114174 has been marked as a duplicate of this bug. ***
Confirmed.
*** Bug 111937 has been marked as a duplicate of this bug. ***
I ran into the same issue both with GCC 12.3.0 and 14.1.0 on a GH200 system. However, the message with 14.1.0 is a bit different: Segmentation fault 0xafc473 crash_signal ../.././gcc/toplev.cc:319 0x145ce3b pp_quoted_string ../.././gcc/pretty-print.cc:2284 0x145e333 pp_format(pretty_printer*, text_info*, urlifier const*) ../.././gcc/pretty-print.cc:1634 0x144b003 diagnostic_context::report_diagnostic(diagnostic_info*) ../.././gcc/diagnostic.cc:1611 0x144b3cf diagnostic_impl ../.././gcc/diagnostic.cc:1774 0x144d4b7 fatal_error(unsigned int, char const*, ...) ../.././gcc/diagnostic.cc:2217 0x9ad95f lto_input_mode_table(lto_file_decl_data*) ../.././gcc/lto-streamer-in.cc:2121 0x67f2bf lto_file_finalize ../.././gcc/lto/lto-common.cc:2278 0x67f2bf lto_create_files_from_ids ../.././gcc/lto/lto-common.cc:2302 0x67f2bf lto_file_read ../.././gcc/lto/lto-common.cc:2357 0x67f2bf read_cgraph_and_symbols(unsigned int, char const**) ../.././gcc/lto/lto-common.cc:2805 0x66adff lto_main() ../.././gcc/lto/lto.cc:656 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. nvptx mkoffload: fatal error: aarch64-unknown-linux-gnu-accel-nvptx-none-gcc returned 1 exit status compilation terminated. lto-wrapper: fatal error: /p/usersoftware/[...]/easybuild/jedi/software/GCCcore/14.1.0/libexec/gcc/aarch64-unknown-linux-gnu/14.1.0//accel/nvptx-none/mkoffload returned 1 exit status compilation terminated. /p/usersoftware/[...]/easybuild/jedi/software/binutils/2.42-GCCcore-14.1.0/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status
The master branch has been updated by Prathamesh Kulkarni <prathamesh3492@gcc.gnu.org>: https://gcc.gnu.org/g:38900247f3880d6eca2e364a000e5898f8deae64 commit r15-2801-g38900247f3880d6eca2e364a000e5898f8deae64 Author: Prathamesh Kulkarni <prathameshk@nvidia.com> Date: Wed Aug 7 23:45:38 2024 +0530 Partially support streaming of poly_int for offloading. When offloading is enabled, the patch streams out host NUM_POLY_INT_COEFFS, and changes streaming in as follows: if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS) { for (i = 0; i < host_num_poly_int_coeffs; i++) poly_int.coeffs[i] = stream_in coeff; for (; i < NUM_POLY_INT_COEFFS; i++) poly_int.coeffs[i] = 0; } else { for (i = 0; i < NUM_POLY_INT_COEFFS; i++) poly_int.coeffs[i] = stream_in coeff; /* Ensure that degree of poly_int <= accel NUM_POLY_INT_COEFFS. */ for (; i < host_num_poly_int_coeffs; i++) { val = stream_in coeff; if (val != 0) error (); } } gcc/ChangeLog: PR ipa/96265 PR ipa/111937 * data-streamer-in.cc (streamer_read_poly_uint64): Remove code for streaming, and call poly_int_read_common instead. (streamer_read_poly_int64): Likewise. * data-streamer.cc (host_num_poly_int_coeffs): Conditionally define new variable if ACCEL_COMPILER is defined. * data-streamer.h (host_num_poly_int_coeffs): Declare. (poly_int_read_common): New function template. (bp_unpack_poly_value): Remove code for streaming and call poly_int_read_common instead. * lto-streamer-in.cc (lto_input_mode_table): Stream-in host NUM_POLY_INT_COEFFS into host_num_poly_int_coeffs if ACCEL_COMPILER is defined. * lto-streamer-out.cc (lto_write_mode_table): Stream out NUM_POLY_INT_COEFFS if offloading is enabled. * poly-int.h (MAX_NUM_POLY_INT_COEFFS_BITS): New macro. * tree-streamer-in.cc (lto_input_ts_poly_tree_pointers): Adjust streaming-in of poly_int. Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
Thanks a lot for the patch Prathamesh Kulkarni. There seems to be some progress, which is great to see! I've tried your patch. I applied it to the latest snapshot and also to GCC 14.2.0 and GCC 14.1.0 to see what happens. In general, all three versions seem to come a bit further towards getting offloading to work. The GCC 15 snapshot seems closest, but now fails with an unknown argument error. In all cases, I built GCC with the EasyBlock of Easybuild, though I'm not sure if that's the cause why the flag is here. GCC 14.2.0 (built with EasyBuild, applied patch): ==== ```console $ gcc -fopenmp -foffload=nvptx-none test.c lto1: internal compiler error: in lto_read_decls, at lto/lto-common.cc:1970 0x68110f lto_read_decls ../.././gcc/lto/lto-common.cc:1970 0x68110f lto_file_finalize ../.././gcc/lto/lto-common.cc:2292 0x68110f lto_create_files_from_ids ../.././gcc/lto/lto-common.cc:2302 0x68110f lto_file_read ../.././gcc/lto/lto-common.cc:2357 0x68110f read_cgraph_and_symbols(unsigned int, char const**) ../.././gcc/lto/lto-common.cc:2805 0x66b13f lto_main() ../.././gcc/lto/lto.cc:656 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. nvptx mkoffload: fatal error: aarch64-unknown-linux-gnu-accel-nvptx-none-gcc returned 1 exit status compilation terminated. lto-wrapper: fatal error: /p/usersoftware/cstpa/reuter1/EasyBuild/easybuild/jedi/software/GCCcore/14.2.0/libexec/gcc/aarch64-unknown-linux-gnu/14.2.0//accel/nvptx-none/mkoffload returned 1 exit status compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status ``` --- GCC 14.1.0 (built with EasyBuild, applied patch): ==== ```console $ gcc -fopenmp -foffload=nvptx-none test.c lto1: internal compiler error: in lto_read_decls, at lto/lto-common.cc:1970 0x680eaf lto_read_decls ../.././gcc/lto/lto-common.cc:1970 0x680eaf lto_file_finalize ../.././gcc/lto/lto-common.cc:2292 0x680eaf lto_create_files_from_ids ../.././gcc/lto/lto-common.cc:2302 0x680eaf lto_file_read ../.././gcc/lto/lto-common.cc:2357 0x680eaf read_cgraph_and_symbols(unsigned int, char const**) ../.././gcc/lto/lto-common.cc:2805 0x66aebf lto_main() ../.././gcc/lto/lto.cc:656 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. nvptx mkoffload: fatal error: aarch64-unknown-linux-gnu-accel-nvptx-none-gcc returned 1 exit status compilation terminated. lto-wrapper: fatal error: /p/usersoftware/cstpa/reuter1/EasyBuild/easybuild/jedi/software/GCCcore/14.1.0/libexec/gcc/aarch64-unknown-linux-gnu/14.1.0//accel/nvptx-none/mkoffload returned 1 exit status compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status ``` Execution no longer segmentation faults, but compilation still fails in LTO. No changes from 14.1.0 to 14.2.0. --- GCC 15.0.0 (gcc-15-20240804, built with EasyBuild using adapted GCC 14.2.0 EasyConfig and the patch applied): ==== ```console $ gcc -fopenmp -foffload=nvptx-none test.c gcc: error: unrecognized command-line option ‘-m64’ nvptx mkoffload: fatal error: gcc returned 1 exit status compilation terminated. lto-wrapper: fatal error: /p/usersoftware/cstpa/reuter1/EasyBuild/easybuild/jedi/software/GCCcore/15.0.0/libexec/gcc/aarch64-unknown-linux-gnu/15.0.0//accel/nvptx-none/mkoffload returned 1 exit status compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status $ gcc --version gcc (GCC) 15.0.0 20240804 (experimental) Copyright (C) 2024 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ```
Created attachment 58875 [details] Verbose compile output after rebuilding the latest GCC snapshot with the patch applied New error message when trying to build OpenMP offload code on aarch64 with the latest GCC snapshot and the patch applied. The build mainly failed due to 'unrecognized command-line option ‘-m64’'.
Hi, Yes, those two errors are expected. I posted RFC discussion about AArch64/nvptx offloading issues here: https://gcc.gnu.org/pipermail/gcc/2024-July/244466.html For the unrecognized command line -m64 option, I have a WIP patch posted upstream: https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659866.html Thanks, Prathamesh
> Hi, > Yes, those two errors are expected. > I posted RFC discussion about AArch64/nvptx offloading issues here: > https://gcc.gnu.org/pipermail/gcc/2024-July/244466.html > > For the unrecognized command line -m64 option, I have a WIP patch posted upstream: > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659866.html > > Thanks, > Prathamesh Thanks a lot for the update and your work towards resolving these issues. It's really appreciated. I was not aware of the ongoing discussion and the WIP patch. I'll keep an eye on those two and continue to test new patches when they're pushed to master. There's not much more I can do unfortunately, as I'm not familiar with GCC internals at all.
commit r15-3034-gdb2e9a2a46f64b037494e8300c46f2d90a9fa55c Author: Prathamesh Kulkarni <prathameshk@nvidia.com> Date: Tue Aug 20 12:54:02 2024 +0530 [optc-save-gen.awk] Fix streaming of command line options for offloading. The patch modifies optc-save-gen.awk to generate if (!lto_stream_offload_p) check before streaming out target-specific opt in cl_optimization_stream_out, when offloading is enabled. Also, it modifies cl_optimization_stream_in to issue an error during build time if accelerator backend defines a target-specific Optimization option. This restriction currently is in place to maintain consistency for streaming of Optimization options between host and accelerator. A proper fix would be to merge target-specific Optimization options for host and accelerators enabled for offloading. gcc/ChangeLog: * optc-save-gen.awk: New array var_target_opt. Use it to generate if (!lto_stream_offload_p) check in cl_optimization_stream_out, and generate a diagnostic with #error if accelerator backend uses Optimization for target-specifc options in cl_optimization_stream_in. Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
commit r15-3093-g792adb8d222d0d1d16b182871e105f47823b8e72 Author: Prathamesh Kulkarni <prathameshk@nvidia.com> Date: Thu Aug 22 19:25:20 2024 +0530 Recompute TYPE_MODE and DECL_MODE for aggregate type for acclerator. The patch streams out VOIDmode for aggregate types with offloading enabled, and recomputes appropriate TYPE_MODE and DECL_MODE while streaming-in on accel side. The rationale for this change is to avoid streaming out host-specific modes that may be used for aggregate types, which may not be representable on the accelerator. For eg, AArch64 uses OImode for ARRAY_TYPE whose size is 256-bits, and nvptx doesn't have OImode, and thus ends up emitting an error from lto_input_mode_table. gcc/ChangeLog: * lto-streamer-in.cc: (lto_read_tree_1): Set DECL_MODE (expr) to TREE_TYPE (TYPE_MODE (expr)) if TREE_TYPE (expr) is aggregate type and offloading is enabled. * stor-layout.cc (layout_type): Move computation of mode for ARRAY_TYPE from ... (compute_array_mode): ... to here. * stor-layout.h (compute_array_mode): Declare. * tree-streamer-in.cc: Include stor-layout.h. (unpack_ts_common_value_fields): Call compute_array_mode if offloading is enabled. * tree-streamer-out.cc (pack_ts_fixed_cst_value_fields): Stream out VOIDmode if decl has aggregate type and offloading is enabled. (pack_ts_type_common_value_fields): Stream out VOIDmode for aggregate type if offloading is enabled. Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
commit r15-3488-gae88e91938af364ef5613e5461b12b484b578bc5 Author: Prathamesh Kulkarni <prathameshk@nvidia.com> Date: Thu Sep 5 18:52:53 2024 +0530 Avoid ICE when passing VLA vector to accelerator. gcc/ChangeLog: * gimplify.cc (omp_add_variable): Check if decl size is not poly_int_tree_p. (gimplify_adjust_omp_clauses): Likewise. * omp-low.cc (scan_sharing_clauses): Likewise. (lower_omp_target): Likewise. Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
The master branch has been updated by Prathamesh Kulkarni <prathamesh3492@gcc.gnu.org>: https://gcc.gnu.org/g:e783a4a683762487cb003ae48235f3d44875de1b commit r15-3571-ge783a4a683762487cb003ae48235f3d44875de1b Author: Prathamesh Kulkarni <prathameshk@nvidia.com> Date: Tue Sep 10 21:01:58 2024 +0530 Pass host specific ABI opts from mkoffload. The patch adds an option -foffload-abi-host-opts, which is set by host in TARGET_OFFLOAD_OPTIONS, and mkoffload then passes its value to host_compiler. gcc/ChangeLog: PR target/96265 * common.opt (foffload-abi-host-opts): New option. * config/aarch64/aarch64.cc (aarch64_offload_options): Pass -foffload-abi-host-opts. * config/i386/i386-options.cc (ix86_offload_options): Likewise. * config/rs6000/rs6000.cc (rs6000_offload_options): Likewise. * config/nvptx/mkoffload.cc (offload_abi_host_opts): Define. (compile_native): Append offload_abi_host_opts to argv_obstack. (main): Handle option -foffload-abi-host-opts. * config/gcn/mkoffload.cc (offload_abi_host_opts): Define. (compile_native): Append offload_abi_host_opts to argv_obstack. (main): Handle option -foffload-abi-host-opts. * lto-wrapper.cc (merge_and_complain): Handle -foffload-abi-host-opts. (append_compiler_options): Likewise. * opts.cc (common_handle_option): Likewise. Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
Good news! I built GCC trunk on a GH200 system via EasyBuild and tried a few examples. A very basic example (more or less a "Hello World") worked just fine. Afterwards, I tried a few of our offload examples used in our internal CI. Those worked fine as well, both for a single GPU (our login node) and on four GPUs (one of our compute nodes). This is just a small sample size, but a huge step towards offloading support on aarch64.
The master branch has been updated by Prathamesh Kulkarni <prathamesh3492@gcc.gnu.org>: https://gcc.gnu.org/g:ae88da5e070659d37b3c3daa4b880531769183bf commit r15-4133-gae88da5e070659d37b3c3daa4b880531769183bf Author: Prathamesh Kulkarni <prathameshk@nvidia.com> Date: Tue Oct 8 12:38:31 2024 +0530 Recompute TYPE_MODE and DECL_MODE for vector_type for accelerator. gcc/ChangeLog: PR ipa/96265 * lto-streamer-in.cc (lto_read_tree_1): Set TYPE_MODE and DECL_MODE for vector_type if offloading is enabled. (lto_input_mode_table): Remove handling of vector modes. * tree-streamer-out.cc (pack_ts_decl_common_value_fields): Stream out VOIDmode for vector_type if offloading is enabled. (pack_ts_decl_common_value_fields): Likewise. Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>