gcc/ChangeLog.dmf

   1 ==================== Branch work163-dmf, patch #106 ====================
   2
   3 PowerPC: Add support for 1,024 bit DMR registers.
   4
   5 This patch is a prelimianry patch to add the full 1,024 bit dense math register
   6 (DMRs) for -mcpu=future.  The MMA 512-bit accumulators map onto the top of the
   7 DMR register.
   8
   9 This patch only adds the new 1,024 bit register support.  It does not add
  10 support for any instructions that need 1,024 bit registers instead of 512 bit
  11 registers.
  12
  13 I used the new mode 'TDOmode' to be the opaque mode used for 1,024 bit
  14 registers.  The 'wD' constraint added in previous patches is used for these
  15 registers.  I added support to do load and store of DMRs via the VSX registers,
  16 since there are no load/store dense math instructions.  I added the new keyword
  17 '__dmr' to create 1,024 bit types that can be loaded into DMRs.  At present, I
  18 don't have aliases for __dmr512 and __dmr1024 that we've discussed internally.
  19
  20 The patches have been tested on both little and big endian systems.  Can I check
  21 it into the master branch?
  22
  23 2024-03-19   Michael Meissner  <meissner@linux.ibm.com>
  24
  25 gcc/
  26
  27         * config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec.
  28         (UNSPEC_DM_INSERT512_LOWER): Likewise.
  29         (UNSPEC_DM_EXTRACT512): Likewise.
  30         (UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise.
  31         (UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise.
  32         (movtdo): New define_expand and define_insn_and_split to implement 1,024
  33         bit DMR registers.
  34         (movtdo_insert512_upper): New insn.
  35         (movtdo_insert512_lower): Likewise.
  36         (movtdo_extract512): Likewise.
  37         (reload_dmr_from_memory): Likewise.
  38         (reload_dmr_to_memory): Likewise.
  39         * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR
  40         support.
  41         (rs6000_init_builtins): Add support for __dmr keyword.
  42         * config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support
  43         for TDOmode.
  44         (rs6000_function_arg): Likewise.
  45         * config/rs6000/rs6000-modes.def (TDOmode): New mode.
  46         * config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add
  47         support for TDOmode.
  48         (rs6000_hard_regno_mode_ok_uncached): Likewise.
  49         (rs6000_hard_regno_mode_ok): Likewise.
  50         (rs6000_modes_tieable_p): Likewise.
  51         (rs6000_debug_reg_global): Likewise.
  52         (rs6000_setup_reg_addr_masks): Likewise.
  53         (rs6000_init_hard_regno_mode_ok): Add support for TDOmode.  Setup reload
  54         hooks for DMR mode.
  55         (reg_offset_addressing_ok_p): Add support for TDOmode.
  56         (rs6000_emit_move): Likewise.
  57         (rs6000_secondary_reload_simple_move): Likewise.
  58         (rs6000_preferred_reload_class): Likewise.
  59         (rs6000_secondary_reload_class): Likewise.
  60         (rs6000_mangle_type): Add mangling for __dmr type.
  61         (rs6000_dmr_register_move_cost): Add support for TDOmode.
  62         (rs6000_split_multireg_move): Likewise.
  63         (rs6000_invalid_conversion): Likewise.
  64         * config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode.
  65         (enum rs6000_builtin_type_index): Add DMR type nodes.
  66         (dmr_type_node): Likewise.
  67         (ptr_dmr_type_node): Likewise.
  68
  69 gcc/testsuite/
  70
  71         * gcc.target/powerpc/dm-1024bit.c: New test.
  72
  73 ==================== Branch work163-dmf, patch #105 ====================
  74
  75 Add dense math test for new instruction names.
  76
  77 2024-03-19   Michael Meissner  <meissner@linux.ibm.com>
  78
  79 gcc/testsuite/
  80
  81         * gcc.target/powerpc/dm-double-test.c: New test.
  82         * lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New
  83         target test.
  84
  85 ==================== Branch work163-dmf, patch #104 ====================
  86
  87 PowerPC: Switch to dense math names for all MMA operations.
  88
  89 This patch changes the assembler instruction names for MMA instructions from
  90 the original name used in power10 to the new name when used with the dense math
  91 system.  I.e. xvf64gerpp becomes dmxvf64gerpp.  The assembler will emit the
  92 same bits for either spelling.
  93
  94 For the non-prefixed MMA instructions, we add a 'dm' prefix in front of the
  95 instruction.  However, the prefixed instructions have a 'pm' prefix, and we add
  96 the 'dm' prefix afterwards.  To prevent having two sets of parallel int
  97 attributes, we remove the "pm" prefix from the instruction string in the
  98 attributes, and add it later, both in the insn name and in the output template.
  99
 100 2024-03-19   Michael Meissner  <meissner@linux.ibm.com>
 101
 102 gcc/
 103
 104         * config/rs6000/mma.md (vvi4i4i8): Change the instruction to not have a
 105         "pm" prefix.
 106         (avvi4i4i8): Likewise.
 107         (vvi4i4i2): Likewise.
 108         (avvi4i4i2): Likewise.
 109         (vvi4i4): Likewise.
 110         (avvi4i4): Likewise.
 111         (pvi4i2): Likewise.
 112         (apvi4i2): Likewise.
 113         (vvi4i4i4): Likewise.
 114         (avvi4i4i4): Likewise.
 115         (mma_xxsetaccz): Add support for running on DMF systems, generating the
 116         dense math instruction and using the dense math accumulators.
 117         (mma_<vv>): Likewise.
 118         (mma_<pv>): Likewise.
 119         (mma_<avv>): Likewise.
 120         (mma_<apv>): Likewise.
 121         (mma_pm<vvi4i4i8>): Add support for running on DMF systems, generating
 122         the dense math instruction and using the dense math accumulators.
 123         Rename the insn with a 'pm' prefix and add either 'pm' or 'pmdm'
 124         prefixes based on whether we have the original MMA specification or if
 125         we have dense math support.
 126         (mma_pm<avvi4i4i8>): Likewise.
 127         (mma_pm<vvi4i4i2>): Likewise.
 128         (mma_pm<avvi4i4i2>): Likewise.
 129         (mma_pm<vvi4i4>): Likewise.
 130         (mma_pm<avvi4i4): Likewise.
 131         (mma_pm<pvi4i2>): Likewise.
 132         (mma_pm<apvi4i2): Likewise.
 133         (mma_pm<vvi4i4i4>): Likewise.
 134         (mma_pm<avvi4i4i4>): Likewise.
 135
 136 ==================== Branch work163-dmf, patch #103 ====================
 137
 138 Add support for dense math registers.
 139
 140 The MMA subsystem added the notion of accumulator registers as an optional
 141 feature of ISA 3.1 (power10).  In ISA 3.1, these accumulators overlapped with
 142 the VSX registers 0..31, but logically the accumulator registers were separate
 143 from the FPR registers.  In ISA 3.1, it was anticipated that in future systems,
 144 the accumulator registers may no overlap with the FPR registers.  This patch
 145 adds the support for dense math registers as separate registers.
 146
 147 This particular patch does not change the MMA support to use the accumulators
 148 within the dense math registers.  This patch just adds the basic support for
 149 having separate DMRs.  The next patch will switch the MMA support to use the
 150 accumulators if -mcpu=future is used.
 151
 152 For testing purposes, I added an undocumented option '-mdense-math' to enable
 153 or disable the dense math support.
 154
 155 This patch adds a new constraint (wD).  If MMA is selected but dense math is
 156 not selected (i.e. -mcpu=power10), the wD constraint will allow access to
 157 accumulators that overlap with VSX registers 0..31.  If both MMA and dense math
 158 are selected (i.e. -mcpu=future), the wD constraint will only allow dense math
 159 registers.
 160
 161 This patch modifies the existing %A output modifier.  If MMA is selected but
 162 dense math is not selected, then %A output modifier converts the VSX register
 163 number to the accumulator number, by dividing it by 4.  If both MMA and dense
 164 math are selected, then %A will map the separate DMR registers into 0..7.
 165
 166 The intention is that user code using extended asm can be modified to run on
 167 both MMA without dense math and MMA with dense math:
 168
 169     1)  If possible, don't use extended asm, but instead use the MMA built-in
 170         functions;
 171
 172     2)  If you do need to write extended asm, change the d constraints
 173         targetting accumulators should now use wD;
 174
 175     3)  Only use the built-in zero, assemble and disassemble functions create
 176         move data between vector quad types and dense math accumulators.
 177         I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
 178         extended asm code.  The reason is these instructions assume there is a
 179         1-to-1 correspondence between 4 adjacent FPR registers and an
 180         accumulator that overlaps with those instructions.  With accumulators
 181         now being separate registers, there no longer is a 1-to-1
 182         correspondence.
 183
 184 It is possible that the mangling for DMRs and the GDB register numbers may
 185 produce other changes in the future.
 186
 187 2024-03-19   Michael Meissner  <meissner@linux.ibm.com>
 188
 189         * config/rs6000/mma.md (movxo): Add comments about dense math registers.
 190         (movxo_nodm): Rename from movxo and restrict the usage to machines
 191         without dense math registers.
 192         (movxo_dm): New insn for movxo support for machines with dense math
 193         registers.
 194         (mma_<acc>): Restrict usage to machines without dense math registers.
 195         (mma_xxsetaccz): Make a define_expand, and add support for dense math
 196         registers.
 197         (mma_xxsetaccz_nodm): Rename from mma_xxsetaccz, and restrict to
 198         machines without dense math registers.
 199         (mma_dmsetaccz): New insn.
 200         * config/rs6000/predicates.md (dmr_operand): New predicate.
 201         (accumulator_operand): Add support for dense math registers.
 202         * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_mma_builtin): Do
 203         not de-prime accumulator when disassembling a vector quad.
 204         * config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE.
 205         (enum rs6000_reload_reg_type): Add RELOAD_REG_DMR.
 206         (LAST_RELOAD_REG_CLASS): Add support for DMR registers and the wD
 207         constraint.
 208         (reload_reg_map): Likewise.
 209         (rs6000_reg_names): Likewise.
 210         (alt_reg_names): Likewise.
 211         (rs6000_hard_regno_nregs_internal): Likewise.
 212         (rs6000_hard_regno_mode_ok_uncached): Likewise.
 213         (rs6000_debug_reg_global): Likewise.
 214         (rs6000_setup_reg_addr_masks): Likewise.
 215         (rs6000_init_hard_regno_mode_ok): Likewise.
 216         (rs6000_secondary_reload_memory): Add support for DMR registers.
 217         (rs6000_secondary_reload_simple_move): Likewise.
 218         (rs6000_preferred_reload_class): Likewise.
 219         (rs6000_secondary_reload_class): Likewise.
 220         (print_operand): Make %A handle both FPRs and DMRs.
 221         (rs6000_dmr_register_move_cost): New helper function.
 222         (rs6000_register_move_cost): Add support for DMR registers.
 223         (rs6000_memory_move_cost): Likewise.
 224         (rs6000_compute_pressure_classes): Likewise.
 225         (rs6000_debugger_regno): Likewise.
 226         (rs6000_split_multireg_move): Add support for DMRs.
 227         * config/rs6000/rs6000.h (TARGET_DENSE_MATH): New macro.
 228         (TARGET_MMA_DENSE_MATH): Likewise.
 229         (TARGET_MMA_NO_DENSE_MATH): Likewise
 230         (UNITS_PER_DMR_WORD): Likewise.
 231         (FIRST_PSEUDO_REGISTER): Update for DMRs.
 232         (FIXED_REGISTERS): Add DMRs.
 233         (CALL_REALLY_USED_REGISTERS): Likewise.
 234         (REG_ALLOC_ORDER): Likewise.
 235         (DMR_REGNO_P): New macro.
 236         (enum reg_class): Add DM_REGS.
 237         (REG_CLASS_NAMES): Likewise.
 238         (REG_CLASS_CONTENTS): Likewise.
 239         (enum r6000_reg_class_enum): Add RS6000_CONSTRAINT_wD.
 240         (REGISTER_NAMES): Add DMR registers.
 241         (ADDITIONAL_REGISTER_NAMES): Likewise.
 242
 243 ==================== Branch work163-dmf, patch #102 ====================
 244
 245 Add wD constraint.
 246
 247 This patch adds a new constraint ('wD') that matches the accumulator registers
 248 that overlap with VSX registers 0..31 on power10.  Future patches will add the
 249 support for a separate accumulator register class that will be used when the
 250 support for dense math registes is added.
 251
 252 2024-03-19   Michael Meissner  <meissner@linux.ibm.com>
 253
 254         * config/rs6000/constraints.md (wD): New constraint.
 255         * config/rs6000/mma.md (mma_disassemble_acc): Likewise.
 256         (mma_<vv>): Likewise.
 257         (mma_<avv>): Likewise.
 258         (mma_<pv>): Likewise.
 259         (mma_<apv>): Likewise.
 260         (mma_<vvi4i4i8>): Likewise.
 261         (mma_<avvi4i4i8>): Likewise.
 262         (mma_<vvi4i4i2>): Likewise.
 263         (mma_<avvi4i4i2>): Likewise.
 264         (mma_<vvi4i4>): Likewise.
 265         (mma_<avvi4i4>): Likewise.
 266         (mma_<pvi4i2): Likewise.
 267         (mma_<apvi4i2>): Likewise.
 268         (mma_<vvi4i4i4>): Likewise.
 269         (mma_<avvi4i4i4): Likewise.
 270         * config/rs6000/predicates.md (accumulator_operand): New predicate.
 271         * config/rs6000/rs6000.cc (rs6000_debug_reg_global): Print the register
 272         class for the 'wD' constraint.
 273         (rs6000_init_hard_regno_mode_ok): Set the 'wD' register constraint
 274         class.
 275         * config/rs6000/rs6000.h (enum r6000_reg_class_enum): Add element for
 276         the 'wD' constraint.
 277         * doc/md.texi (PowerPC constraints): Document the 'wD' constraint.
 278
 279 ==================== Branch work163-dmf, patch #101 ====================
 280
 281 Use vector pair load/store for memcpy with -mcpu=future
 282
 283 In the development for the power10 processor, GCC did not enable using the load
 284 vector pair and store vector pair instructions when optimizing things like
 285 memory copy.  This patch enables using those instructions if -mcpu=future is
 286 used.
 287
 288 2024-03-18  Michael Meissner  <meissner@linux.ibm.com>
 289
 290 gcc/
 291
 292         * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): Enable using
 293         load vector pair and store vector pair instructions for memory copy
 294         operations.
 295         (POWERPC_MASKS): Make the bit for enabling using load vector pair and
 296         store vector pair operations set and reset when the PowerPC processor is
 297         changed.
 298
 299 ==================== Branch work163-dmf, baseline ====================
 300
 301 Add ChangeLog.dmf and update REVISION.
 302
 303 2024-03-18  Michael Meissner  <meissner@linux.ibm.com>
 304
 305 gcc/
 306
 307         * ChangeLog.dmf: New file for branch.
 308         * REVISION: Update.
 309
 310 2024-03-18   Michael Meissner  <meissner@linux.ibm.com>
 311
 312         Clone branch