[og12] [committed] Port remaining OG11 patches

Kwok Cheung Yeung kcy@codesourcery.com
Fri Jul 1 16:55:46 GMT 2022


Hello

I have committed the following forward-ports from OG11 onto the 
devel/omp/gcc-12 branch to bring OG12 up-to-parity with OG11 again.

12d14a9a255 amdgcn: libgomp plugin USM implementation
687640af27a amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK
cbc3dd01de8 amdgcn: Support XNACK mode
1d4e24c9fb4 Fix gfortran.dg/gomp/num-teams-2.f90

Kwok
-------------- next part --------------
From 1d4e24c9fb40d4df7e742d7631a29329d5fb7c84 Mon Sep 17 00:00:00 2001
From: Tobias Burnus <tobias@codesourcery.com>
Date: Mon, 27 Jun 2022 13:26:43 +0200
Subject: [PATCH 1/4] Fix gfortran.dg/gomp/num-teams-2.f90

OG11 contrary to mainline issues an error for resolve_positive_int_expr
(-> OG11 commit a14b3f29681da1d2465e15f98b8cf8d5c64a2c3c). Update
testcase accordingly.

gcc/testsuite/
	* gfortran.dg/gomp/num-teams-2.f90: Use dg-error not dg-warning.
---
 gcc/testsuite/ChangeLog.omp                    |  4 ++++
 gcc/testsuite/gfortran.dg/gomp/num-teams-2.f90 | 12 ++++++------
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp
index 301085fba5d..99ed6f0e467 100644
--- a/gcc/testsuite/ChangeLog.omp
+++ b/gcc/testsuite/ChangeLog.omp
@@ -1,3 +1,7 @@
+2022-06-27  Tobias Burnus  <tobias@codesourcery.com>
+
+	* gfortran.dg/gomp/num-teams-2.f90: Use dg-error not dg-warning.
+
 2022-05-12  Jakub Jelinek  <jakub@redhat.com>
 
 	Backport from mainline:
diff --git a/gcc/testsuite/gfortran.dg/gomp/num-teams-2.f90 b/gcc/testsuite/gfortran.dg/gomp/num-teams-2.f90
index e7814a11a5a..f3031481d4a 100644
--- a/gcc/testsuite/gfortran.dg/gomp/num-teams-2.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/num-teams-2.f90
@@ -9,13 +9,13 @@ subroutine foo (i)
   !$omp teams num_teams (6 : 4)		! { dg-warning "NUM_TEAMS lower bound at .1. larger than upper bound at .2." }
   !$omp end teams
 
-  !$omp teams num_teams (-7)		! { dg-warning "INTEGER expression of NUM_TEAMS clause at .1. must be positive" }
+  !$omp teams num_teams (-7)		! { dg-error "INTEGER expression of NUM_TEAMS clause at .1. must be positive" }
   !$omp end teams
 
-  !$omp teams num_teams (i : -7)		! { dg-warning "INTEGER expression of NUM_TEAMS clause at .1. must be positive" }
+  !$omp teams num_teams (i : -7)		! { dg-error "INTEGER expression of NUM_TEAMS clause at .1. must be positive" }
   !$omp end teams
 
-  !$omp teams num_teams (-7 : 8)		! { dg-warning "INTEGER expression of NUM_TEAMS clause at .1. must be positive" }
+  !$omp teams num_teams (-7 : 8)		! { dg-error "INTEGER expression of NUM_TEAMS clause at .1. must be positive" }
   !$omp end teams
 end
 
@@ -25,13 +25,13 @@ subroutine bar (i)
   !$omp target teams num_teams (6 : 4)	! { dg-warning "NUM_TEAMS lower bound at .1. larger than upper bound at .2." }
   !$omp end target teams
 
-  !$omp target teams num_teams (-7)	! { dg-warning "INTEGER expression of NUM_TEAMS clause at .1. must be positive" }
+  !$omp target teams num_teams (-7)	! { dg-error "INTEGER expression of NUM_TEAMS clause at .1. must be positive" }
   !$omp end target teams
 
-  !$omp target teams num_teams (i : -7)	! { dg-warning "INTEGER expression of NUM_TEAMS clause at .1. must be positive" }
+  !$omp target teams num_teams (i : -7)	! { dg-error "INTEGER expression of NUM_TEAMS clause at .1. must be positive" }
   !$omp end target teams
 
-  !$omp target teams num_teams (-7 : 8)	! { dg-warning "INTEGER expression of NUM_TEAMS clause at .1. must be positive" }
+  !$omp target teams num_teams (-7 : 8)	! { dg-error "INTEGER expression of NUM_TEAMS clause at .1. must be positive" }
   !$omp end target teams
 end
 end module
-- 
2.25.1

-------------- next part --------------
From cbc3dd01de8788587a2b641efcb838058303b5ab Mon Sep 17 00:00:00 2001
From: Andrew Stubbs <ams@codesourcery.com>
Date: Fri, 10 Jun 2022 15:15:49 +0100
Subject: [PATCH 2/4] amdgcn: Support XNACK mode

The XNACK feature allows memory load instructions to restart safely following
a page-miss interrupt.  This is useful for shared-memory devices, like APUs,
and to implement OpenMP Unified Shared Memory.

To support the feature we must be able to set the appropriate meta-data and
set the load instructions to early-clobber.  When the port supports scheduling
of s_waitcnt instructions there will be further requirements.

gcc/ChangeLog:

	* config/gcn/gcn-hsa.h (XNACKOPT): New macro.
	(ASM_SPEC): Use XNACKOPT.
	* config/gcn/gcn-opts.h (enum sram_ecc_type): Rename to ...
	(enum hsaco_attr_type): ... this, and generalize the names.
	(TARGET_XNACK): New macro.
	* config/gcn/gcn-valu.md (gather<mode>_insn_1offset<exec>):
	Add xnack compatible alternatives.
	(gather<mode>_insn_2offsets<exec>): Likewise.
	* config/gcn/gcn.c (gcn_option_override): Permit -mxnack for devices
	other than Fiji.
	(gcn_expand_epilogue): Remove early-clobber problems.
	(output_file_start): Emit xnack attributes.
	(gcn_hsa_declare_function_name): Obey -mxnack setting.
	* config/gcn/gcn.md (xnack): New attribute.
	(enabled): Rework to include "xnack" attribute.
	(*movbi): Add xnack compatible alternatives.
	(*mov<mode>_insn): Likewise.
	(*mov<mode>_insn): Likewise.
	(*mov<mode>_insn): Likewise.
	(*movti_insn): Likewise.
	* config/gcn/gcn.opt (-mxnack): Add the "on/off/any" syntax.
	(sram_ecc_type): Rename to ...
	(hsaco_attr_type: ... this.)
	* config/gcn/mkoffload.c (SET_XNACK_ANY): New macro.
	(TEST_XNACK): Delete.
	(TEST_XNACK_ANY): New macro.
	(TEST_XNACK_ON): New macro.
	(main): Support the new -mxnack=on/off/any syntax.
---
 gcc/ChangeLog.omp           |  31 ++++++++++
 gcc/config/gcn/gcn-hsa.h    |   3 +-
 gcc/config/gcn/gcn-opts.h   |  10 ++--
 gcc/config/gcn/gcn-valu.md  |  29 ++++-----
 gcc/config/gcn/gcn.cc       |  34 ++++++-----
 gcc/config/gcn/gcn.md       | 113 +++++++++++++++++++++++-------------
 gcc/config/gcn/gcn.opt      |  18 +++---
 gcc/config/gcn/mkoffload.cc |  19 ++++--
 8 files changed, 171 insertions(+), 86 deletions(-)

diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp
index 24e22e19ae8..7af24604841 100644
--- a/gcc/ChangeLog.omp
+++ b/gcc/ChangeLog.omp
@@ -1,3 +1,34 @@
+2022-06-10  Andrew Stubbs  <ams@codesourcery.com>
+
+	* config/gcn/gcn-hsa.h (XNACKOPT): New macro.
+	(ASM_SPEC): Use XNACKOPT.
+	* config/gcn/gcn-opts.h (enum sram_ecc_type): Rename to ...
+	(enum hsaco_attr_type): ... this, and generalize the names.
+	(TARGET_XNACK): New macro.
+	* config/gcn/gcn-valu.md (gather<mode>_insn_1offset<exec>):
+	Add xnack compatible alternatives.
+	(gather<mode>_insn_2offsets<exec>): Likewise.
+	* config/gcn/gcn.c (gcn_option_override): Permit -mxnack for devices
+	other than Fiji.
+	(gcn_expand_epilogue): Remove early-clobber problems.
+	(output_file_start): Emit xnack attributes.
+	(gcn_hsa_declare_function_name): Obey -mxnack setting.
+	* config/gcn/gcn.md (xnack): New attribute.
+	(enabled): Rework to include "xnack" attribute.
+	(*movbi): Add xnack compatible alternatives.
+	(*mov<mode>_insn): Likewise.
+	(*mov<mode>_insn): Likewise.
+	(*mov<mode>_insn): Likewise.
+	(*movti_insn): Likewise.
+	* config/gcn/gcn.opt (-mxnack): Add the "on/off/any" syntax.
+	(sram_ecc_type): Rename to ...
+	(hsaco_attr_type: ... this.)
+	* config/gcn/mkoffload.c (SET_XNACK_ANY): New macro.
+	(TEST_XNACK): Delete.
+	(TEST_XNACK_ANY): New macro.
+	(TEST_XNACK_ON): New macro.
+	(main): Support the new -mxnack=on/off/any syntax.
+
 2022-06-30  Kwok Cheung Yeung  <kcy@codesourcery.com>
 
 	* dwarf2cfi.cc (get_cfa_from_loc_descr): Check op against DW_OP_bregx.
diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index b3079cebb43..fd08947574f 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -81,12 +81,13 @@ extern unsigned int gcn_local_sym_hash (const char *name);
 /* In HSACOv4 no attribute setting means the binary supports "any" hardware
    configuration.  The name of the attribute also changed.  */
 #define SRAMOPT "msram-ecc=on:-mattr=+sramecc;msram-ecc=off:-mattr=-sramecc"
+#define XNACKOPT "mxnack=on:-mattr=+xnack;mxnack=off:-mattr=-xnack"
 
 /* Use LLVM assembler and linker options.  */
 #define ASM_SPEC  "-triple=amdgcn--amdhsa "  \
 		  "%:last_arg(%{march=*:-mcpu=%*}) " \
 		  "%{!march=*|march=fiji:--amdhsa-code-object-version=3} " \
-		  "%{" NO_XNACK "mxnack:-mattr=+xnack;:-mattr=-xnack} " \
+		  "%{" NO_XNACK XNACKOPT "}" \
 		  "%{" NO_SRAM_ECC SRAMOPT "} " \
 		  "-filetype=obj"
 #define LINK_SPEC "--pie --export-dynamic"
diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
index b62dfb45f59..07ddc79cda3 100644
--- a/gcc/config/gcn/gcn-opts.h
+++ b/gcc/config/gcn/gcn-opts.h
@@ -48,11 +48,13 @@ extern enum gcn_isa {
 #define TARGET_M0_LDS_LIMIT (TARGET_GCN3)
 #define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS)
 
-enum sram_ecc_type
+#define TARGET_XNACK (flag_xnack != HSACO_ATTR_OFF)
+
+enum hsaco_attr_type
 {
-  SRAM_ECC_OFF,
-  SRAM_ECC_ON,
-  SRAM_ECC_ANY
+  HSACO_ATTR_OFF,
+  HSACO_ATTR_ON,
+  HSACO_ATTR_ANY
 };
 
 #endif
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index abe46201344..ec114db9dd1 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -741,13 +741,13 @@
     {})
 
 (define_insn "gather<mode>_insn_1offset<exec>"
-  [(set (match_operand:V_ALL 0 "register_operand"		   "=v")
+  [(set (match_operand:V_ALL 0 "register_operand"		   "=v,&v")
 	(unspec:V_ALL
-	  [(plus:<VnDI> (match_operand:<VnDI> 1 "register_operand" " v")
+	  [(plus:<VnDI> (match_operand:<VnDI> 1 "register_operand" " v, v")
 			(vec_duplicate:<VnDI>
-			  (match_operand 2 "immediate_operand"	   " n")))
-	   (match_operand 3 "immediate_operand"			   " n")
-	   (match_operand 4 "immediate_operand"			   " n")
+			  (match_operand 2 "immediate_operand"	   " n, n")))
+	   (match_operand 3 "immediate_operand"			   " n, n")
+	   (match_operand 4 "immediate_operand"			   " n, n")
 	   (mem:BLK (scratch))]
 	  UNSPEC_GATHER))]
   "(AS_FLAT_P (INTVAL (operands[3]))
@@ -777,7 +777,8 @@
     return buf;
   }
   [(set_attr "type" "flat")
-   (set_attr "length" "12")])
+   (set_attr "length" "12")
+   (set_attr "xnack" "off,on")])
 
 (define_insn "gather<mode>_insn_1offset_ds<exec>"
   [(set (match_operand:V_ALL 0 "register_operand"		   "=v")
@@ -802,17 +803,18 @@
    (set_attr "length" "12")])
 
 (define_insn "gather<mode>_insn_2offsets<exec>"
-  [(set (match_operand:V_ALL 0 "register_operand"			"=v")
+  [(set (match_operand:V_ALL 0 "register_operand"		     "=v,&v")
 	(unspec:V_ALL
 	  [(plus:<VnDI>
 	     (plus:<VnDI>
 	       (vec_duplicate:<VnDI>
-		 (match_operand:DI 1 "register_operand"			"Sv"))
+		 (match_operand:DI 1 "register_operand"		     "Sv,Sv"))
 	       (sign_extend:<VnDI>
-		 (match_operand:<VnSI> 2 "register_operand"		" v")))
-	     (vec_duplicate:<VnDI> (match_operand 3 "immediate_operand" " n")))
-	   (match_operand 4 "immediate_operand"				" n")
-	   (match_operand 5 "immediate_operand"				" n")
+		 (match_operand:<VnSI> 2 "register_operand"	     " v, v")))
+	     (vec_duplicate:<VnDI> (match_operand 3 "immediate_operand"
+								     " n, n")))
+	   (match_operand 4 "immediate_operand"			     " n, n")
+	   (match_operand 5 "immediate_operand"			     " n, n")
 	   (mem:BLK (scratch))]
 	  UNSPEC_GATHER))]
   "(AS_GLOBAL_P (INTVAL (operands[4]))
@@ -831,7 +833,8 @@
     return buf;
   }
   [(set_attr "type" "flat")
-   (set_attr "length" "12")])
+   (set_attr "length" "12")
+   (set_attr "xnack" "off,on")])
 
 (define_expand "scatter_store<mode><vnsi>"
   [(match_operand:DI 0 "register_operand")
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 2ef4cee58fd..15aa8c138ee 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -170,9 +170,14 @@ gcn_option_override (void)
 	acc_lds_size = 32768;
     }
 
-  /* The xnack option is a placeholder, for now.  */
-  if (flag_xnack)
-    sorry ("XNACK support");
+  /* gfx908 "Fiji" does not support XNACK.  */
+  if (gcn_arch == PROCESSOR_FIJI)
+    {
+      if (flag_xnack == HSACO_ATTR_ON)
+	error ("-mxnack=on is incompatible with -march=fiji");
+      /* Allow HSACO_ATTR_ANY silently because that's the default.  */
+      flag_xnack = HSACO_ATTR_OFF;
+    }
 }
 
 /* }}}  */
@@ -3188,17 +3193,19 @@ gcn_expand_epilogue (void)
       /* Assume that an exit value compatible with gcn-run is expected.
          That is, the third input parameter is an int*.
 
-         We can't allocate any new registers, but the kernarg_reg is
-         dead after this, so we'll use that.  */
+         We can't allocate any new registers, but the dispatch_ptr and
+	 kernarg_reg are dead after this, so we'll use those.  */
+      rtx dispatch_ptr_reg = gen_rtx_REG (DImode, cfun->machine->args.reg
+					  [DISPATCH_PTR_ARG]);
       rtx kernarg_reg = gen_rtx_REG (DImode, cfun->machine->args.reg
 				     [KERNARG_SEGMENT_PTR_ARG]);
       rtx retptr_mem = gen_rtx_MEM (DImode,
 				    gen_rtx_PLUS (DImode, kernarg_reg,
 						  GEN_INT (16)));
       set_mem_addr_space (retptr_mem, ADDR_SPACE_SCALAR_FLAT);
-      emit_move_insn (kernarg_reg, retptr_mem);
+      emit_move_insn (dispatch_ptr_reg, retptr_mem);
 
-      rtx retval_mem = gen_rtx_MEM (SImode, kernarg_reg);
+      rtx retval_mem = gen_rtx_MEM (SImode, dispatch_ptr_reg);
       set_mem_addr_space (retval_mem, ADDR_SPACE_SCALAR_FLAT);
       emit_move_insn (retval_mem,
 		      gen_rtx_REG (SImode, SGPR_REGNO (RETURN_VALUE_REG)));
@@ -5247,11 +5254,12 @@ static void
 output_file_start (void)
 {
   /* In HSACOv4 no attribute setting means the binary supports "any" hardware
-     configuration.  In GCC binaries, this is true for SRAM ECC, but not
-     XNACK.  */
-  const char *xnack = (flag_xnack ? ":xnack+" : ":xnack-");
-  const char *sram_ecc = (flag_sram_ecc == SRAM_ECC_ON ? ":sramecc+"
-			  : flag_sram_ecc == SRAM_ECC_OFF ? ":sramecc-"
+     configuration.  */
+  const char *xnack = (flag_xnack == HSACO_ATTR_ON ? ":xnack+"
+		       : flag_xnack == HSACO_ATTR_OFF ? ":xnack-"
+		       : "");
+  const char *sram_ecc = (flag_sram_ecc == HSACO_ATTR_ON ? ":sramecc+"
+			  : flag_sram_ecc == HSACO_ATTR_OFF ? ":sramecc-"
 			  : "");
 
   const char *cpu;
@@ -5295,7 +5303,7 @@ void
 gcn_hsa_declare_function_name (FILE *file, const char *name, tree)
 {
   int sgpr, vgpr;
-  bool xnack_enabled = false;
+  bool xnack_enabled = TARGET_XNACK;
 
   fputs ("\n\n", file);
 
diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 53e846e15d1..47417ab371e 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -277,12 +277,19 @@
 
 (define_attr "gcn_version" "gcn3,gcn5" (const_string "gcn3"))
 
+(define_attr "xnack" "na,off,on" (const_string "na"))
+
 (define_attr "enabled" ""
-  (cond [(eq_attr "gcn_version" "gcn3") (const_int 1)
-	 (and (eq_attr "gcn_version" "gcn5")
-	      (ne (symbol_ref "TARGET_GCN5_PLUS") (const_int 0)))
-	   (const_int 1)]
-	(const_int 0)))
+  (cond [(and (eq_attr "gcn_version" "gcn5")
+	      (eq (symbol_ref "TARGET_GCN5_PLUS") (const_int 0)))
+	   (const_int 0)
+	 (and (eq_attr "xnack" "off")
+	      (ne (symbol_ref "TARGET_XNACK") (const_int 0)))
+	   (const_int 0)
+	 (and (eq_attr "xnack" "on")
+	      (eq (symbol_ref "TARGET_XNACK") (const_int 0)))
+	   (const_int 0)]
+	(const_int 1)))
 
 ; We need to be able to identify v_readlane and v_writelane with
 ; SGPR lane selection in order to handle "Manually Inserted Wait States".
@@ -472,9 +479,9 @@
 
 (define_insn "*movbi"
   [(set (match_operand:BI 0 "nonimmediate_operand"
-				    "=Sg,   v,Sg,cs,cV,cV,Sm,RS, v,RF, v,RM")
+			  "=Sg,   v,Sg,cs,cV,cV,Sm,&Sm,RS, v,&v,RF, v,&v,RM")
 	(match_operand:BI 1 "gcn_load_operand"
-				    "SSA,vSvA, v,SS, v,SS,RS,Sm,RF, v,RM, v"))]
+			  "SSA,vSvA, v,SS, v,SS,RS, RS,Sm,RF,RF, v,RM,RM, v"))]
   ""
   {
     /* SCC as an operand is currently not accepted by the LLVM assembler, so
@@ -516,66 +523,77 @@
 	return "s_mov_b32\tvcc_lo, %1\;"
 	       "s_mov_b32\tvcc_hi, 0";
     case 6:
-      return "s_load_dword\t%0, %A1\;s_waitcnt\tlgkmcnt(0)";
     case 7:
-      return "s_store_dword\t%1, %A0";
+      return "s_load_dword\t%0, %A1\;s_waitcnt\tlgkmcnt(0)";
     case 8:
-      return "flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0";
+      return "s_store_dword\t%1, %A0";
     case 9:
-      return "flat_store_dword\t%A0, %1%O0%g0";
     case 10:
-      return "global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)";
+      return "flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0";
     case 11:
+      return "flat_store_dword\t%A0, %1%O0%g0";
+    case 12:
+    case 13:
+      return "global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)";
+    case 14:
       return "global_store_dword\t%A0, %1%O0%g0";
     default:
       gcc_unreachable ();
     }
   }
-  [(set_attr "type" "sop1,vop1,vop3a,sopk,vopc,mult,smem,smem,flat,flat,
-		     flat,flat")
-   (set_attr "exec" "*,*,none,*,*,*,*,*,*,*,*,*")
-   (set_attr "length" "4,4,4,4,4,8,12,12,12,12,12,12")])
+  [(set_attr "type" "sop1,vop1,vop3a,sopk,vopc,mult,smem,smem,smem,flat,flat,
+		     flat,flat,flat,flat")
+   (set_attr "exec" "*,*,none,*,*,*,*,*,*,*,*,*,*,*,*")
+   (set_attr "length" "4,4,4,4,4,8,12,12,12,12,12,12,12,12,12")
+   (set_attr "xnack" "*,*,*,*,*,*,off,on,*,off,on,*,off,on,*")])
 
 ; 32bit move pattern
 
 (define_insn "*mov<mode>_insn"
   [(set (match_operand:SISF 0 "nonimmediate_operand"
-		  "=SD,SD,SD,SD,RB,Sm,RS,v,Sg, v, v,RF,v,RLRG,   v,SD, v,RM")
+     "=SD,SD,SD,SD,&SD,RB,Sm,&Sm,RS,v,Sg, v, v,&v,RF,v,RLRG,   v,SD, v,&v,RM")
 	(match_operand:SISF 1 "gcn_load_operand"
-		  "SSA, J, B,RB,Sm,RS,Sm,v, v,Sv,RF, v,B,   v,RLRG, Y,RM, v"))]
+    "SSA, J, B,RB, RB,Sm,RS, RS,Sm,v, v,Sv,RF,RF, v,B,   v,RLRG, Y,RM,RM, v"))]
   ""
   "@
   s_mov_b32\t%0, %1
   s_movk_i32\t%0, %1
   s_mov_b32\t%0, %1
   s_buffer_load%s0\t%0, s[0:3], %1\;s_waitcnt\tlgkmcnt(0)
+  s_buffer_load%s0\t%0, s[0:3], %1\;s_waitcnt\tlgkmcnt(0)
   s_buffer_store%s1\t%1, s[0:3], %0
   s_load_dword\t%0, %A1\;s_waitcnt\tlgkmcnt(0)
+  s_load_dword\t%0, %A1\;s_waitcnt\tlgkmcnt(0)
   s_store_dword\t%1, %A0
   v_mov_b32\t%0, %1
   v_readlane_b32\t%0, %1, 0
   v_writelane_b32\t%0, %1, 0
   flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0
+  flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0
   flat_store_dword\t%A0, %1%O0%g0
   v_mov_b32\t%0, %1
   ds_write_b32\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read_b32\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)
   s_mov_b32\t%0, %1
   global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
+  global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
   global_store_dword\t%A0, %1%O0%g0"
-  [(set_attr "type" "sop1,sopk,sop1,smem,smem,smem,smem,vop1,vop3a,vop3a,flat,
-		     flat,vop1,ds,ds,sop1,flat,flat")
-   (set_attr "exec" "*,*,*,*,*,*,*,*,none,none,*,*,*,*,*,*,*,*")
-   (set_attr "length" "4,4,8,12,12,12,12,4,8,8,12,12,8,12,12,8,12,12")])
+  [(set_attr "type" "sop1,sopk,sop1,smem,smem,smem,smem,smem,smem,vop1,vop3a,
+	      vop3a,flat,flat,flat,vop1,ds,ds,sop1,flat,flat,flat")
+   (set_attr "exec" "*,*,*,*,*,*,*,*,*,*,none,none,*,*,*,*,*,*,*,*,*,*")
+   (set_attr "length"
+	     "4,4,8,12,12,12,12,12,12,4,8,8,12,12,12,8,12,12,8,12,12,12")
+   (set_attr "xnack"
+	     "*,*,*,off,on,*,off,on,*,*,*,*,off,on,*,*,*,*,*,off,on,*")])
 
 ; 8/16bit move pattern
 ; TODO: implement combined load and zero_extend, but *only* for -msram-ecc=on
 
 (define_insn "*mov<mode>_insn"
   [(set (match_operand:QIHI 0 "nonimmediate_operand"
-				 "=SD,SD,SD,v,Sg, v, v,RF,v,RLRG,   v, v,RM")
+			   "=SD,SD,SD,v,Sg, v, v,&v,RF,v,RLRG,   v, v,&v,RM")
 	(match_operand:QIHI 1 "gcn_load_operand"
-				 "SSA, J, B,v, v,Sv,RF, v,B,   v,RLRG,RM, v"))]
+			   "SSA, J, B,v, v,Sv,RF,RF, v,B,   v,RLRG,RM,RM, v"))]
   "gcn_valid_move_p (<MODE>mode, operands[0], operands[1])"
   "@
   s_mov_b32\t%0, %1
@@ -585,24 +603,27 @@
   v_readlane_b32\t%0, %1, 0
   v_writelane_b32\t%0, %1, 0
   flat_load%o1\t%0, %A1%O1%g1\;s_waitcnt\t0
+  flat_load%o1\t%0, %A1%O1%g1\;s_waitcnt\t0
   flat_store%s0\t%A0, %1%O0%g0
   v_mov_b32\t%0, %1
   ds_write%b0\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read%u1\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)
   global_load%o1\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
+  global_load%o1\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
   global_store%s0\t%A0, %1%O0%g0"
-  [(set_attr "type"
-	     "sop1,sopk,sop1,vop1,vop3a,vop3a,flat,flat,vop1,ds,ds,flat,flat")
-   (set_attr "exec" "*,*,*,*,none,none,*,*,*,*,*,*,*")
-   (set_attr "length" "4,4,8,4,4,4,12,12,8,12,12,12,12")])
+  [(set_attr "type" "sop1,sopk,sop1,vop1,vop3a,vop3a,flat,flat,flat,vop1,ds,ds,
+	             flat,flat,flat")
+   (set_attr "exec" "*,*,*,*,none,none,*,*,*,*,*,*,*,*,*")
+   (set_attr "length" "4,4,8,4,4,4,12,12,12,8,12,12,12,12,12")
+   (set_attr "xnack" "*,*,*,*,*,*,off,on,*,*,*,*,off,on,*")])
 
 ; 64bit move pattern
 
 (define_insn_and_split "*mov<mode>_insn"
   [(set (match_operand:DIDF 0 "nonimmediate_operand"
-			  "=SD,SD,SD,RS,Sm,v, v,Sg, v, v,RF,RLRG,   v, v,RM")
+		"=SD,SD,SD,RS,Sm,&Sm,v, v,Sg, v, v,&v,RF,RLRG,   v, v,&v,RM")
 	(match_operand:DIDF 1 "general_operand"
-			  "SSA, C,DB,Sm,RS,v,DB, v,Sv,RF, v,   v,RLRG,RM, v"))]
+		"SSA, C,DB,Sm,RS, RS,v,DB, v,Sv,RF,RF, v,   v,RLRG,RM,RM, v"))]
   "GET_CODE(operands[1]) != SYMBOL_REF"
   "@
   s_mov_b64\t%0, %1
@@ -610,15 +631,18 @@
   #
   s_store_dwordx2\t%1, %A0
   s_load_dwordx2\t%0, %A1\;s_waitcnt\tlgkmcnt(0)
+  s_load_dwordx2\t%0, %A1\;s_waitcnt\tlgkmcnt(0)
   #
   #
   #
   #
   flat_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\t0
+  flat_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\t0
   flat_store_dwordx2\t%A0, %1%O0%g0
   ds_write_b64\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read_b64\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)
   global_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
+  global_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
   global_store_dwordx2\t%A0, %1%O0%g0"
   "reload_completed
    && ((!MEM_P (operands[0]) && !MEM_P (operands[1])
@@ -649,29 +673,33 @@
 	operands[3] = inhi;
       }
   }
-  [(set_attr "type" "sop1,sop1,mult,smem,smem,vmult,vmult,vmult,vmult,flat,
-		     flat,ds,ds,flat,flat")
-   (set_attr "length" "4,8,*,12,12,*,*,*,*,12,12,12,12,12,12")])
+  [(set_attr "type" "sop1,sop1,mult,smem,smem,smem,vmult,vmult,vmult,vmult,
+	      flat,flat,flat,ds,ds,flat,flat,flat")
+   (set_attr "length" "4,8,*,12,12,12,*,*,*,*,12,12,12,12,12,12,12,12")
+   (set_attr "xnack" "*,*,*,*,off,on,*,*,*,*,off,on,*,*,*,off,on,*")])
 
 ; 128-bit move.
 
 (define_insn_and_split "*movti_insn"
   [(set (match_operand:TI 0 "nonimmediate_operand"
-				      "=SD,RS,Sm,RF, v,v, v,SD,RM, v,RL, v")
-	(match_operand:TI 1 "general_operand"  
-				      "SSB,Sm,RS, v,RF,v,Sv, v, v,RM, v,RL"))]
+			     "=SD,RS,Sm,&Sm,RF, v,&v,v, v,SD,RM, v,&v,RL, v")
+	(match_operand:TI 1 "general_operand"
+			     "SSB,Sm,RS, RS, v,RF,RF,v,Sv, v, v,RM,RM, v,RL"))]
   ""
   "@
   #
   s_store_dwordx4\t%1, %A0
   s_load_dwordx4\t%0, %A1\;s_waitcnt\tlgkmcnt(0)
+  s_load_dwordx4\t%0, %A1\;s_waitcnt\tlgkmcnt(0)
   flat_store_dwordx4\t%A0, %1%O0%g0
   flat_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\t0
+  flat_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\t0
   #
   #
   #
   global_store_dwordx4\t%A0, %1%O0%g0
   global_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
+  global_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
   ds_write_b128\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read_b128\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)"
   "reload_completed
@@ -693,10 +721,11 @@
     operands[0] = gcn_operand_part (TImode, operands[0], 0);
     operands[1] = gcn_operand_part (TImode, operands[1], 0);
   }
-  [(set_attr "type" "mult,smem,smem,flat,flat,vmult,vmult,vmult,flat,flat,\
-		     ds,ds")
-   (set_attr "delayeduse" "*,*,yes,*,*,*,*,*,yes,*,*,*")
-   (set_attr "length" "*,12,12,12,12,*,*,*,12,12,12,12")])
+  [(set_attr "type" "mult,smem,smem,smem,flat,flat,flat,vmult,vmult,vmult,flat,
+	             flat,flat,ds,ds")
+   (set_attr "delayeduse" "*,*,yes,yes,*,*,*,*,*,*,*,yes,*,*,*")
+   (set_attr "length" "*,12,12,12,12,12,12,*,*,*,12,12,12,12,12")
+   (set_attr "xnack" "*,*,off,on,*,off,on,*,*,*,*,off,on,*,*")])
 
 ;; }}}
 ;; {{{ Prologue/Epilogue
@@ -864,6 +893,8 @@
   (clobber (reg:BI SCC_REG))]
  "GET_CODE (operands[1]) == SYMBOL_REF || GET_CODE (operands[1]) == LABEL_REF"
   {
+    /* This s_load may not be XNACK-safe on devices where the GOT may fault.
+       DGPUs are most likely fine.  */
     if (SYMBOL_REF_P (operands[1])
 	&& SYMBOL_REF_WEAK (operands[1]))
 	return "s_getpc_b64\t%0\;"
@@ -888,6 +919,8 @@
   {
     /* !!! These sequences clobber CC_SAVE_REG.  */
 
+    /* This s_load may not be XNACK-safe on devices where the GOT may fault.
+       DGPUs are most likely fine.  */
     if (SYMBOL_REF_P (operands[1])
 	&& SYMBOL_REF_WEAK (operands[1]))
 	return "; s_mov_b32\ts22, scc is not supported by the assembler.\;"
diff --git a/gcc/config/gcn/gcn.opt b/gcc/config/gcn/gcn.opt
index 9606aaf0b1a..759f7a064c9 100644
--- a/gcc/config/gcn/gcn.opt
+++ b/gcc/config/gcn/gcn.opt
@@ -81,23 +81,23 @@ Wopenacc-dims
 Target Var(warn_openacc_dims) Warning
 Warn about invalid OpenACC dimensions.
 
-mxnack
-Target Var(flag_xnack) Init(0)
-Compile for devices requiring XNACK enabled. Default off.
-
 Enum
-Name(sram_ecc_type) Type(enum sram_ecc_type)
+Name(hsaco_attr_type) Type(enum hsaco_attr_type)
 SRAM-ECC modes:
 
 EnumValue
-Enum(sram_ecc_type) String(off) Value(SRAM_ECC_OFF)
+Enum(hsaco_attr_type) String(off) Value(HSACO_ATTR_OFF)
 
 EnumValue
-Enum(sram_ecc_type) String(on) Value(SRAM_ECC_ON)
+Enum(hsaco_attr_type) String(on) Value(HSACO_ATTR_ON)
 
 EnumValue
-Enum(sram_ecc_type) String(any) Value(SRAM_ECC_ANY)
+Enum(hsaco_attr_type) String(any) Value(HSACO_ATTR_ANY)
+
+mxnack=
+Target RejectNegative Joined ToLower Enum(hsaco_attr_type) Var(flag_xnack) Init(HSACO_ATTR_ANY)
+Compile for devices requiring XNACK enabled. Default off.
 
 msram-ecc=
-Target RejectNegative Joined ToLower Enum(sram_ecc_type) Var(flag_sram_ecc) Init(SRAM_ECC_ANY)
+Target RejectNegative Joined ToLower Enum(hsaco_attr_type) Var(flag_sram_ecc) Init(HSACO_ATTR_ANY)
 Compile for devices with the SRAM ECC feature enabled, or not. Default \"any\".
diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index ed93ae844e4..16c479b6f5a 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -72,10 +72,14 @@
 
 #define SET_XNACK_ON(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_XNACK_V4) \
 				 | EF_AMDGPU_FEATURE_XNACK_ON_V4)
+#define SET_XNACK_ANY(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_XNACK_V4) \
+				  | EF_AMDGPU_FEATURE_XNACK_ANY_V4)
 #define SET_XNACK_OFF(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_XNACK_V4) \
 				  | EF_AMDGPU_FEATURE_XNACK_OFF_V4)
-#define TEST_XNACK(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \
-			 == EF_AMDGPU_FEATURE_XNACK_ON_V4)
+#define TEST_XNACK_ANY(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \
+			     == EF_AMDGPU_FEATURE_XNACK_ANY_V4)
+#define TEST_XNACK_ON(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \
+			    == EF_AMDGPU_FEATURE_XNACK_ON_V4)
 
 #define SET_SRAM_ECC_ON(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_SRAMECC_V4) \
 				    | EF_AMDGPU_FEATURE_SRAMECC_ON_V4)
@@ -881,9 +885,11 @@ main (int argc, char **argv)
 	fPIC = true;
       else if (strcmp (argv[i], "-fpic") == 0)
 	fpic = true;
-      else if (strcmp (argv[i], "-mxnack") == 0)
+      else if (strcmp (argv[i], "-mxnack=on") == 0)
 	SET_XNACK_ON (elf_flags);
-      else if (strcmp (argv[i], "-mno-xnack") == 0)
+      else if (strcmp (argv[i], "-mxnack=any") == 0)
+	SET_XNACK_ANY (elf_flags);
+      else if (strcmp (argv[i], "-mxnack=off") == 0)
 	SET_XNACK_OFF (elf_flags);
       else if (strcmp (argv[i], "-msram-ecc=on") == 0)
 	SET_SRAM_ECC_ON (elf_flags);
@@ -1042,8 +1048,9 @@ main (int argc, char **argv)
       obstack_ptr_grow (&ld_argv_obstack, gcn_s2_name);
       obstack_ptr_grow (&ld_argv_obstack, "-lgomp");
       obstack_ptr_grow (&ld_argv_obstack,
-			(TEST_XNACK (elf_flags)
-			 ? "-mxnack" : "-mno-xnack"));
+			(TEST_XNACK_ON (elf_flags) ? "-mxnack=on"
+			 : TEST_XNACK_ANY (elf_flags) ? "-mxnack=any"
+			 : "-mxnack=off"));
       obstack_ptr_grow (&ld_argv_obstack,
 			(TEST_SRAM_ECC_ON (elf_flags) ? "-msram-ecc=on"
 			 : TEST_SRAM_ECC_ANY (elf_flags) ? "-msram-ecc=any"
-- 
2.25.1

-------------- next part --------------
From 687640af27af0d2a5bd92a5de44a9a7d33526052 Mon Sep 17 00:00:00 2001
From: Andrew Stubbs <ams@codesourcery.com>
Date: Fri, 17 Jun 2022 13:07:11 +0100
Subject: [PATCH 3/4] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK

The AMD GCN runtime must be set to the correct mode for Unified Shared Memory
to work, but this is not always clear at compile and link time due to the split
nature of the offload compilation pipeline.

This patch sets a new attribute on OpenMP offload functions to ensure that the
information is passed all the way to the backend.  The backend then places a
marker in the assembler code for mkoffload to find. Finally mkoffload places a
constructor function into the final program to ensure that the HSA_XNACK
environment variable passes the correct mode to the GPU.

The HSA_XNACK variable must be set before the HSA runtime is even loaded, so
it makes more sense to have this set within the constructor than at some point
later within libgomp or the GCN plugin.

gcc/ChangeLog:

	* config/gcn/gcn.c (unified_shared_memory_enabled): New variable.
	(gcn_init_cumulative_args): Handle attribute "omp unified memory".
	(gcn_hsa_declare_function_name): Emit "MKOFFLOAD OPTIONS: USM+".
	* config/gcn/mkoffload.c (TEST_XNACK_OFF): New macro.
	(process_asm): Detect "MKOFFLOAD OPTIONS: USM+".
	Emit configure_xnack constructor, as required.
	* omp-low.c (create_omp_child_function): Add attribute "omp unified
	memory".
---
 gcc/ChangeLog.omp           | 11 +++++++++++
 gcc/config/gcn/gcn.cc       | 28 +++++++++++++++++++++++++++-
 gcc/config/gcn/mkoffload.cc | 37 ++++++++++++++++++++++++++++++++++++-
 gcc/omp-low.cc              |  4 ++++
 4 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp
index 7af24604841..6e059579a81 100644
--- a/gcc/ChangeLog.omp
+++ b/gcc/ChangeLog.omp
@@ -1,3 +1,14 @@
+2022-06-17  Andrew Stubbs  <ams@codesourcery.com>
+
+	* config/gcn/gcn.c (unified_shared_memory_enabled): New variable.
+	(gcn_init_cumulative_args): Handle attribute "omp unified memory".
+	(gcn_hsa_declare_function_name): Emit "MKOFFLOAD OPTIONS: USM+".
+	* config/gcn/mkoffload.c (TEST_XNACK_OFF): New macro.
+	(process_asm): Detect "MKOFFLOAD OPTIONS: USM+".
+	Emit configure_xnack constructor, as required.
+	* omp-low.c (create_omp_child_function): Add attribute "omp unified
+	memory".
+
 2022-06-10  Andrew Stubbs  <ams@codesourcery.com>
 
 	* config/gcn/gcn-hsa.h (XNACKOPT): New macro.
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 15aa8c138ee..98e56305436 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -68,6 +68,11 @@ static bool ext_gcn_constants_init = 0;
 
 enum gcn_isa gcn_isa = ISA_GCN3;	/* Default to GCN3.  */
 
+/* Record whether the host compiler added "omp unifed memory" attributes to
+   any functions.  We can then pass this on to mkoffload to ensure xnack is
+   compatible there too.  */
+static bool unified_shared_memory_enabled = false;
+
 /* Reserve this much space for LDS (for propagating variables from
    worker-single mode to worker-partitioned mode), per workgroup.  Global
    analysis could calculate an exact bound, but we don't do that yet.
@@ -2542,6 +2547,25 @@ gcn_init_cumulative_args (CUMULATIVE_ARGS *cum /* Argument info to init */ ,
   if (!caller && cfun->machine->normal_function)
     gcn_detect_incoming_pointer_arg (fndecl);
 
+  if (fndecl && lookup_attribute ("omp unified memory",
+				  DECL_ATTRIBUTES (fndecl)))
+    {
+      unified_shared_memory_enabled = true;
+
+      switch (gcn_arch)
+	{
+	case PROCESSOR_FIJI:
+	case PROCESSOR_VEGA10:
+	case PROCESSOR_VEGA20:
+	  error ("GPU architecture does not support Unified Shared Memory");
+	default:
+	  ;
+	}
+
+      if (flag_xnack == HSACO_ATTR_OFF)
+	error ("Unified Shared Memory is enabled, but XNACK is disabled");
+    }
+
   reinit_regs ();
 }
 
@@ -5455,12 +5479,14 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, tree)
   assemble_name (file, name);
   fputs (":\n", file);
 
-  /* This comment is read by mkoffload.  */
+  /* These comments are read by mkoffload.  */
   if (flag_openacc)
     fprintf (file, "\t;; OPENACC-DIMS: %d, %d, %d : %s\n",
 	     oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_GANG),
 	     oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_WORKER),
 	     oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_VECTOR), name);
+  if (unified_shared_memory_enabled)
+    fprintf (asm_out_file, "\t;; MKOFFLOAD OPTIONS: USM+\n");
 }
 
 /* Implement TARGET_ASM_SELECT_SECTION.
diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 16c479b6f5a..c430c4e8cee 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -80,6 +80,8 @@
 			     == EF_AMDGPU_FEATURE_XNACK_ANY_V4)
 #define TEST_XNACK_ON(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \
 			    == EF_AMDGPU_FEATURE_XNACK_ON_V4)
+#define TEST_XNACK_OFF(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \
+			     == EF_AMDGPU_FEATURE_XNACK_OFF_V4)
 
 #define SET_SRAM_ECC_ON(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_SRAMECC_V4) \
 				    | EF_AMDGPU_FEATURE_SRAMECC_ON_V4)
@@ -474,6 +476,7 @@ static void
 process_asm (FILE *in, FILE *out, FILE *cfile)
 {
   int fn_count = 0, var_count = 0, dims_count = 0, regcount_count = 0;
+  bool unified_shared_memory_enabled = false;
   struct obstack fns_os, dims_os, regcounts_os;
   obstack_init (&fns_os);
   obstack_init (&dims_os);
@@ -498,6 +501,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
   fn_count += 2;
 
   char buf[1000];
+  char dummy;
   enum
     { IN_CODE,
       IN_METADATA,
@@ -517,6 +521,9 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 		dims_count++;
 	      }
 
+	    if (sscanf (buf, " ;; MKOFFLOAD OPTIONS: USM+%c", &dummy) > 0)
+	      unified_shared_memory_enabled = true;
+
 	    break;
 	  }
 	case IN_METADATA:
@@ -565,7 +572,6 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 	  }
 	}
 
-      char dummy;
       if (sscanf (buf, " .section .gnu.offload_vars%c", &dummy) > 0)
 	{
 	  state = IN_VARS;
@@ -616,6 +622,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 
   fprintf (cfile, "#include <stdlib.h>\n");
   fprintf (cfile, "#include <stdbool.h>\n\n");
+  fprintf (cfile, "#include <stdio.h>\n\n");
 
   fprintf (cfile, "static const int gcn_num_vars = %d;\n\n", var_count);
 
@@ -656,6 +663,34 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
     }
   fprintf (cfile, "\n};\n\n");
 
+  /* Emit a constructor function to set the HSA_XNACK environment variable.
+     This must be done before the ROCr runtime library is loaded.
+     We never override a user value (exit empty string), but we do emit a
+     useful diagnostic in the wrong mode (the ROCr message is not good.  */
+  if (TEST_XNACK_OFF (elf_flags) && unified_shared_memory_enabled)
+    fatal_error (input_location,
+		 "conflicting settings; XNACK is forced off but Unified "
+		 "Shared Memory is on");
+  if (!TEST_XNACK_ANY (elf_flags) || unified_shared_memory_enabled)
+    fprintf (cfile,
+	     "static __attribute__((constructor))\n"
+	     "void configure_xnack (void)\n"
+	     "{\n"
+	     "  const char *val = getenv (\"HSA_XNACK\");\n"
+	     "  if (!val || val[0] == '\\0')\n"
+	     "    setenv (\"HSA_XNACK\", \"%d\", true);\n"
+	     "  else if (%s)\n"
+	     "    {\n"
+	     "      fprintf (stderr, \"error: HSA_XNACK=%%s is incompatible; "
+			    "please unset\\n\", val);\n"
+	     "      exit (1);\n"
+	     "    }\n"
+	     "}\n\n",
+	     unified_shared_memory_enabled || TEST_XNACK_ON (elf_flags),
+	     (unified_shared_memory_enabled || TEST_XNACK_ON (elf_flags)
+	      ? "val[0] != '1' || val[1] != '\\0'"
+	      : "val[0] == '1' && val[1] == '\\0'"));
+
   obstack_free (&fns_os, NULL);
   for (i = 0; i < dims_count; i++)
     free (dims[i].name);
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index dc53373cac7..04b315d47b6 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -2413,6 +2413,10 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
 	DECL_ATTRIBUTES (decl)
 	  = tree_cons (get_identifier (target_attr),
 		       NULL_TREE, DECL_ATTRIBUTES (decl));
+      if (flag_offload_memory == OFFLOAD_MEMORY_UNIFIED)
+	DECL_ATTRIBUTES (decl)
+	  = tree_cons (get_identifier ("omp unified memory"),
+		       NULL_TREE, DECL_ATTRIBUTES (decl));
     }
 
   t = build_decl (DECL_SOURCE_LOCATION (decl),
-- 
2.25.1

-------------- next part --------------
From 12d14a9a255c1cc10e4506935327aabd9766967d Mon Sep 17 00:00:00 2001
From: Andrew Stubbs <ams@codesourcery.com>
Date: Mon, 20 Jun 2022 15:51:15 +0100
Subject: [PATCH 4/4] amdgcn: libgomp plugin USM implementation

Implement the Unified Shared Memory API calls in the GCN plugin.

The allocate and free are pretty straight-forward because all "target" memory
allocations are compatible with USM, on the right hardware.  However, there's
no known way to check what memory region was used, after the fact, so we use a
splay tree to record allocations so we can answer "is_usm_ptr" later.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (struct usm_splay_tree_key_s): New.
	(usm_splay_compare): New.
	(splay_tree_prefix): New.
	(GOMP_OFFLOAD_usm_alloc): New.
	(GOMP_OFFLOAD_usm_free): New.
	(GOMP_OFFLOAD_is_usm_ptr): New.
	(GOMP_OFFLOAD_supported_features): Move into the OpenMP API fold.
	Add GOMP_REQUIRES_UNIFIED_ADDRESS and
	GOMP_REQUIRES_UNIFIED_SHARED_MEMORY.
	(gomp_fatal): New.
	(splay_tree_c): New.
	* testsuite/lib/libgomp.exp (check_effective_target_omp_usm): New.
	* testsuite/libgomp.c++/usm-1.C: Use dg-require-effective-target.
	* testsuite/libgomp.c-c++-common/requires-1.c: Likewise.
	* testsuite/libgomp.c/usm-1.c: Likewise.
	* testsuite/libgomp.c/usm-2.c: Likewise.
	* testsuite/libgomp.c/usm-3.c: Likewise.
	* testsuite/libgomp.c/usm-4.c: Likewise.
	* testsuite/libgomp.c/usm-5.c: Likewise.
	* testsuite/libgomp.c/usm-6.c: Likewise.
---
 libgomp/ChangeLog.omp                         |  23 ++++
 libgomp/plugin/plugin-gcn.c                   | 104 ++++++++++++++++--
 libgomp/testsuite/lib/libgomp.exp             |  22 ++++
 libgomp/testsuite/libgomp.c++/usm-1.C         |   2 +-
 .../libgomp.c-c++-common/requires-1.c         |   1 +
 libgomp/testsuite/libgomp.c/usm-1.c           |   1 +
 libgomp/testsuite/libgomp.c/usm-2.c           |   1 +
 libgomp/testsuite/libgomp.c/usm-3.c           |   1 +
 libgomp/testsuite/libgomp.c/usm-4.c           |   1 +
 libgomp/testsuite/libgomp.c/usm-5.c           |   2 +-
 libgomp/testsuite/libgomp.c/usm-6.c           |   2 +-
 11 files changed, 150 insertions(+), 10 deletions(-)

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 0dbc13381b6..6d3c4772e86 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,3 +1,26 @@
+2022-06-20  Andrew Stubbs  <ams@codesourcery.com>
+
+	* plugin/plugin-gcn.c (struct usm_splay_tree_key_s): New.
+	(usm_splay_compare): New.
+	(splay_tree_prefix): New.
+	(GOMP_OFFLOAD_usm_alloc): New.
+	(GOMP_OFFLOAD_usm_free): New.
+	(GOMP_OFFLOAD_is_usm_ptr): New.
+	(GOMP_OFFLOAD_supported_features): Move into the OpenMP API fold.
+	Add GOMP_REQUIRES_UNIFIED_ADDRESS and
+	GOMP_REQUIRES_UNIFIED_SHARED_MEMORY.
+	(gomp_fatal): New.
+	(splay_tree_c): New.
+	* testsuite/lib/libgomp.exp (check_effective_target_omp_usm): New.
+	* testsuite/libgomp.c++/usm-1.C: Use dg-require-effective-target.
+	* testsuite/libgomp.c-c++-common/requires-1.c: Likewise.
+	* testsuite/libgomp.c/usm-1.c: Likewise.
+	* testsuite/libgomp.c/usm-2.c: Likewise.
+	* testsuite/libgomp.c/usm-3.c: Likewise.
+	* testsuite/libgomp.c/usm-4.c: Likewise.
+	* testsuite/libgomp.c/usm-5.c: Likewise.
+	* testsuite/libgomp.c/usm-6.c: Likewise.
+
 2022-05-30  Kwok Cheung Yeung  <kcy@codesourcery.com>
 
 	* testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index d246351397e..ac40d6e7e2e 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3820,6 +3820,89 @@ GOMP_OFFLOAD_evaluate_device (int device_num, const char *kind,
   return !isa || isa_code (isa) == agent->device_isa;
 }
 
+/* Use a splay tree to track USM allocations.  */
+
+typedef struct usm_splay_tree_node_s *usm_splay_tree_node;
+typedef struct usm_splay_tree_s *usm_splay_tree;
+typedef struct usm_splay_tree_key_s *usm_splay_tree_key;
+
+struct usm_splay_tree_key_s {
+  void *addr;
+  size_t size;
+};
+
+static inline int
+usm_splay_compare (usm_splay_tree_key x, usm_splay_tree_key y)
+{
+  if ((x->addr <= y->addr && x->addr + x->size > y->addr)
+      || (y->addr <= x->addr && y->addr + y->size > x->addr))
+    return 0;
+
+  return (x->addr > y->addr ? 1 : -1);
+}
+
+#define splay_tree_prefix usm
+#include "../splay-tree.h"
+
+static struct usm_splay_tree_s usm_map = { NULL };
+
+/* Allocate memory suitable for Unified Shared Memory.
+
+   In fact, AMD memory need only be "coarse grained", which target
+   allocations already are.  We do need to track allocations so that
+   GOMP_OFFLOAD_is_usm_ptr can look them up.  */
+
+void *
+GOMP_OFFLOAD_usm_alloc (int device, size_t size)
+{
+  void *ptr = GOMP_OFFLOAD_alloc (device, size);
+
+  usm_splay_tree_node node = malloc (sizeof (struct usm_splay_tree_node_s));
+  node->key.addr = ptr;
+  node->key.size = size;
+  node->left = NULL;
+  node->right = NULL;
+  usm_splay_tree_insert (&usm_map, node);
+
+  return ptr;
+}
+
+/* Free memory allocated via GOMP_OFFLOAD_usm_alloc.  */
+
+bool
+GOMP_OFFLOAD_usm_free (int device, void *ptr)
+{
+  struct usm_splay_tree_key_s key = { ptr, 1 };
+  usm_splay_tree_key node = usm_splay_tree_lookup (&usm_map, &key);
+  if (node)
+    {
+      usm_splay_tree_remove (&usm_map, &key);
+      free (node);
+    }
+
+  return GOMP_OFFLOAD_free (device, ptr);
+}
+
+/* True if the memory was allocated via GOMP_OFFLOAD_usm_alloc.  */
+
+bool
+GOMP_OFFLOAD_is_usm_ptr (void *ptr)
+{
+  struct usm_splay_tree_key_s key = { ptr, 1 };
+  return usm_splay_tree_lookup (&usm_map, &key);
+}
+
+/* Indicate which GOMP_REQUIRES_* features are supported.  */
+
+bool
+GOMP_OFFLOAD_supported_features (unsigned int *mask)
+{
+  *mask &= ~(GOMP_REQUIRES_UNIFIED_ADDRESS
+             | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY);
+
+  return (*mask == 0);
+}
+
 /* }}} */
 /* {{{ OpenACC Plugin API  */
 
@@ -4110,12 +4193,19 @@ GOMP_OFFLOAD_openacc_destroy_thread_data (void *data)
   free (data);
 }
 
-/* Indicate which GOMP_REQUIRES_* features are supported, currently none.  */
+/* }}} */
+/* {{{ USM splay tree */
 
-bool
-GOMP_OFFLOAD_supported_features (unsigned int *mask)
-{
-  return (*mask == 0);
-}
+/* Include this now so that splay-tree.c doesn't include it later.  This
+   avoids a conflict with splay_tree_prefix.  */
+#include "libgomp.h"
 
-/* }}} */
+/* This allows splay-tree.c to call gomp_fatal in this context.  The splay
+   tree code doesn't use the variadic arguments right now.  */
+#define gomp_fatal(MSG, ...) GOMP_PLUGIN_fatal (MSG)
+
+/* Include the splay tree code inline, with the prefixes added.  */
+#define splay_tree_prefix usm
+#define splay_tree_c
+#include "../splay-tree.h"
+/* }}}  */
diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 0aaa58f19c5..9bc3f8acd1c 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -548,3 +548,25 @@ int main() {
     return 0;
 } } "-lcuda -lcudart" ]
 }
+
+# return 1 if OpenMP Unified Share Memory is supported
+
+proc check_effective_target_omp_usm { } {
+    if { [libgomp_check_effective_target_offload_target "nvptx"] } {
+	return 1
+    }
+
+    if { [libgomp_check_effective_target_offload_target "amdgcn"] } {
+	return [check_no_compiler_messages omp_usm executable {
+           #pragma omp requires unified_shared_memory
+	   int main () {
+	     #pragma omp target
+	       ;
+	     return 0;
+	   }
+	}]
+    }
+
+    return 0
+}
+
diff --git a/libgomp/testsuite/libgomp.c++/usm-1.C b/libgomp/testsuite/libgomp.c++/usm-1.C
index fea25e5f10b..6e88f90d61f 100644
--- a/libgomp/testsuite/libgomp.c++/usm-1.C
+++ b/libgomp/testsuite/libgomp.c++/usm-1.C
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-skip-if "Only valid for nvptx" { ! offload_target_nvptx } } */
+/* { dg-require-effective-target omp_usm } */
 #include <stdint.h>
 
 #pragma omp requires unified_shared_memory
diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-1.c b/libgomp/testsuite/libgomp.c-c++-common/requires-1.c
index 02585adfb6f..0dd40bc0f59 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/requires-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-1.c
@@ -1,4 +1,5 @@
 /* { dg-additional-sources requires-1-aux.c } */
+/* { dg-require-effective-target omp_usm } */
 
 #pragma omp requires unified_shared_memory
 
diff --git a/libgomp/testsuite/libgomp.c/usm-1.c b/libgomp/testsuite/libgomp.c/usm-1.c
index 1b35f19c45b..e73f1816f9a 100644
--- a/libgomp/testsuite/libgomp.c/usm-1.c
+++ b/libgomp/testsuite/libgomp.c/usm-1.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target omp_usm } */
 
 #include <omp.h>
 #include <stdint.h>
diff --git a/libgomp/testsuite/libgomp.c/usm-2.c b/libgomp/testsuite/libgomp.c/usm-2.c
index 689cee7e456..31f2bae7145 100644
--- a/libgomp/testsuite/libgomp.c/usm-2.c
+++ b/libgomp/testsuite/libgomp.c/usm-2.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target omp_usm } */
 
 #include <omp.h>
 #include <stdint.h>
diff --git a/libgomp/testsuite/libgomp.c/usm-3.c b/libgomp/testsuite/libgomp.c/usm-3.c
index 2ca66afe93f..2c78a0d8ced 100644
--- a/libgomp/testsuite/libgomp.c/usm-3.c
+++ b/libgomp/testsuite/libgomp.c/usm-3.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target omp_usm } */
 
 #include <omp.h>
 #include <stdint.h>
diff --git a/libgomp/testsuite/libgomp.c/usm-4.c b/libgomp/testsuite/libgomp.c/usm-4.c
index 753908c8440..1ac5498f73f 100644
--- a/libgomp/testsuite/libgomp.c/usm-4.c
+++ b/libgomp/testsuite/libgomp.c/usm-4.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target omp_usm } */
 
 #include <omp.h>
 #include <stdint.h>
diff --git a/libgomp/testsuite/libgomp.c/usm-5.c b/libgomp/testsuite/libgomp.c/usm-5.c
index 4d8b3cf71b1..563397f941a 100644
--- a/libgomp/testsuite/libgomp.c/usm-5.c
+++ b/libgomp/testsuite/libgomp.c/usm-5.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target offload_device } */
+/* { dg-require-effective-target omp_usm } */
 
 #include <omp.h>
 #include <stdint.h>
diff --git a/libgomp/testsuite/libgomp.c/usm-6.c b/libgomp/testsuite/libgomp.c/usm-6.c
index c207140092a..bd14f8197b3 100644
--- a/libgomp/testsuite/libgomp.c/usm-6.c
+++ b/libgomp/testsuite/libgomp.c/usm-6.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-skip-if "Only valid for nvptx" { ! offload_target_nvptx } } */
+/* { dg-require-effective-target omp_usm } */
 
 #include <stdint.h>
 #include <stdlib.h>
-- 
2.25.1



More information about the Gcc-patches mailing list