1 ==================== Branch work163-dmf, patch #106 ====================
3 PowerPC: Add support for 1,024 bit DMR registers.
5 This patch is a prelimianry patch to add the full 1,024 bit dense math register
6 (DMRs) for -mcpu=future. The MMA 512-bit accumulators map onto the top of the
9 This patch only adds the new 1,024 bit register support. It does not add
10 support for any instructions that need 1,024 bit registers instead of 512 bit
13 I used the new mode 'TDOmode' to be the opaque mode used for 1,024 bit
14 registers. The 'wD' constraint added in previous patches is used for these
15 registers. I added support to do load and store of DMRs via the VSX registers,
16 since there are no load/store dense math instructions. I added the new keyword
17 '__dmr' to create 1,024 bit types that can be loaded into DMRs. At present, I
18 don't have aliases for __dmr512 and __dmr1024 that we've discussed internally.
20 The patches have been tested on both little and big endian systems. Can I check
21 it into the master branch?
23 2024-03-19 Michael Meissner <meissner@linux.ibm.com>
27 * config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec.
28 (UNSPEC_DM_INSERT512_LOWER): Likewise.
29 (UNSPEC_DM_EXTRACT512): Likewise.
30 (UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise.
31 (UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise.
32 (movtdo): New define_expand and define_insn_and_split to implement 1,024
34 (movtdo_insert512_upper): New insn.
35 (movtdo_insert512_lower): Likewise.
36 (movtdo_extract512): Likewise.
37 (reload_dmr_from_memory): Likewise.
38 (reload_dmr_to_memory): Likewise.
39 * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR
41 (rs6000_init_builtins): Add support for __dmr keyword.
42 * config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support
44 (rs6000_function_arg): Likewise.
45 * config/rs6000/rs6000-modes.def (TDOmode): New mode.
46 * config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add
48 (rs6000_hard_regno_mode_ok_uncached): Likewise.
49 (rs6000_hard_regno_mode_ok): Likewise.
50 (rs6000_modes_tieable_p): Likewise.
51 (rs6000_debug_reg_global): Likewise.
52 (rs6000_setup_reg_addr_masks): Likewise.
53 (rs6000_init_hard_regno_mode_ok): Add support for TDOmode. Setup reload
55 (reg_offset_addressing_ok_p): Add support for TDOmode.
56 (rs6000_emit_move): Likewise.
57 (rs6000_secondary_reload_simple_move): Likewise.
58 (rs6000_preferred_reload_class): Likewise.
59 (rs6000_secondary_reload_class): Likewise.
60 (rs6000_mangle_type): Add mangling for __dmr type.
61 (rs6000_dmr_register_move_cost): Add support for TDOmode.
62 (rs6000_split_multireg_move): Likewise.
63 (rs6000_invalid_conversion): Likewise.
64 * config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode.
65 (enum rs6000_builtin_type_index): Add DMR type nodes.
66 (dmr_type_node): Likewise.
67 (ptr_dmr_type_node): Likewise.
71 * gcc.target/powerpc/dm-1024bit.c: New test.
73 ==================== Branch work163-dmf, patch #105 ====================
75 Add dense math test for new instruction names.
77 2024-03-19 Michael Meissner <meissner@linux.ibm.com>
81 * gcc.target/powerpc/dm-double-test.c: New test.
82 * lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New
85 ==================== Branch work163-dmf, patch #104 ====================
87 PowerPC: Switch to dense math names for all MMA operations.
89 This patch changes the assembler instruction names for MMA instructions from
90 the original name used in power10 to the new name when used with the dense math
91 system. I.e. xvf64gerpp becomes dmxvf64gerpp. The assembler will emit the
92 same bits for either spelling.
94 For the non-prefixed MMA instructions, we add a 'dm' prefix in front of the
95 instruction. However, the prefixed instructions have a 'pm' prefix, and we add
96 the 'dm' prefix afterwards. To prevent having two sets of parallel int
97 attributes, we remove the "pm" prefix from the instruction string in the
98 attributes, and add it later, both in the insn name and in the output template.
100 2024-03-19 Michael Meissner <meissner@linux.ibm.com>
104 * config/rs6000/mma.md (vvi4i4i8): Change the instruction to not have a
106 (avvi4i4i8): Likewise.
107 (vvi4i4i2): Likewise.
108 (avvi4i4i2): Likewise.
113 (vvi4i4i4): Likewise.
114 (avvi4i4i4): Likewise.
115 (mma_xxsetaccz): Add support for running on DMF systems, generating the
116 dense math instruction and using the dense math accumulators.
117 (mma_<vv>): Likewise.
118 (mma_<pv>): Likewise.
119 (mma_<avv>): Likewise.
120 (mma_<apv>): Likewise.
121 (mma_pm<vvi4i4i8>): Add support for running on DMF systems, generating
122 the dense math instruction and using the dense math accumulators.
123 Rename the insn with a 'pm' prefix and add either 'pm' or 'pmdm'
124 prefixes based on whether we have the original MMA specification or if
125 we have dense math support.
126 (mma_pm<avvi4i4i8>): Likewise.
127 (mma_pm<vvi4i4i2>): Likewise.
128 (mma_pm<avvi4i4i2>): Likewise.
129 (mma_pm<vvi4i4>): Likewise.
130 (mma_pm<avvi4i4): Likewise.
131 (mma_pm<pvi4i2>): Likewise.
132 (mma_pm<apvi4i2): Likewise.
133 (mma_pm<vvi4i4i4>): Likewise.
134 (mma_pm<avvi4i4i4>): Likewise.
136 ==================== Branch work163-dmf, patch #103 ====================
138 Add support for dense math registers.
140 The MMA subsystem added the notion of accumulator registers as an optional
141 feature of ISA 3.1 (power10). In ISA 3.1, these accumulators overlapped with
142 the VSX registers 0..31, but logically the accumulator registers were separate
143 from the FPR registers. In ISA 3.1, it was anticipated that in future systems,
144 the accumulator registers may no overlap with the FPR registers. This patch
145 adds the support for dense math registers as separate registers.
147 This particular patch does not change the MMA support to use the accumulators
148 within the dense math registers. This patch just adds the basic support for
149 having separate DMRs. The next patch will switch the MMA support to use the
150 accumulators if -mcpu=future is used.
152 For testing purposes, I added an undocumented option '-mdense-math' to enable
153 or disable the dense math support.
155 This patch adds a new constraint (wD). If MMA is selected but dense math is
156 not selected (i.e. -mcpu=power10), the wD constraint will allow access to
157 accumulators that overlap with VSX registers 0..31. If both MMA and dense math
158 are selected (i.e. -mcpu=future), the wD constraint will only allow dense math
161 This patch modifies the existing %A output modifier. If MMA is selected but
162 dense math is not selected, then %A output modifier converts the VSX register
163 number to the accumulator number, by dividing it by 4. If both MMA and dense
164 math are selected, then %A will map the separate DMR registers into 0..7.
166 The intention is that user code using extended asm can be modified to run on
167 both MMA without dense math and MMA with dense math:
169 1) If possible, don't use extended asm, but instead use the MMA built-in
172 2) If you do need to write extended asm, change the d constraints
173 targetting accumulators should now use wD;
175 3) Only use the built-in zero, assemble and disassemble functions create
176 move data between vector quad types and dense math accumulators.
177 I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
178 extended asm code. The reason is these instructions assume there is a
179 1-to-1 correspondence between 4 adjacent FPR registers and an
180 accumulator that overlaps with those instructions. With accumulators
181 now being separate registers, there no longer is a 1-to-1
184 It is possible that the mangling for DMRs and the GDB register numbers may
185 produce other changes in the future.
187 2024-03-19 Michael Meissner <meissner@linux.ibm.com>
189 * config/rs6000/mma.md (movxo): Add comments about dense math registers.
190 (movxo_nodm): Rename from movxo and restrict the usage to machines
191 without dense math registers.
192 (movxo_dm): New insn for movxo support for machines with dense math
194 (mma_<acc>): Restrict usage to machines without dense math registers.
195 (mma_xxsetaccz): Make a define_expand, and add support for dense math
197 (mma_xxsetaccz_nodm): Rename from mma_xxsetaccz, and restrict to
198 machines without dense math registers.
199 (mma_dmsetaccz): New insn.
200 * config/rs6000/predicates.md (dmr_operand): New predicate.
201 (accumulator_operand): Add support for dense math registers.
202 * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_mma_builtin): Do
203 not de-prime accumulator when disassembling a vector quad.
204 * config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE.
205 (enum rs6000_reload_reg_type): Add RELOAD_REG_DMR.
206 (LAST_RELOAD_REG_CLASS): Add support for DMR registers and the wD
208 (reload_reg_map): Likewise.
209 (rs6000_reg_names): Likewise.
210 (alt_reg_names): Likewise.
211 (rs6000_hard_regno_nregs_internal): Likewise.
212 (rs6000_hard_regno_mode_ok_uncached): Likewise.
213 (rs6000_debug_reg_global): Likewise.
214 (rs6000_setup_reg_addr_masks): Likewise.
215 (rs6000_init_hard_regno_mode_ok): Likewise.
216 (rs6000_secondary_reload_memory): Add support for DMR registers.
217 (rs6000_secondary_reload_simple_move): Likewise.
218 (rs6000_preferred_reload_class): Likewise.
219 (rs6000_secondary_reload_class): Likewise.
220 (print_operand): Make %A handle both FPRs and DMRs.
221 (rs6000_dmr_register_move_cost): New helper function.
222 (rs6000_register_move_cost): Add support for DMR registers.
223 (rs6000_memory_move_cost): Likewise.
224 (rs6000_compute_pressure_classes): Likewise.
225 (rs6000_debugger_regno): Likewise.
226 (rs6000_split_multireg_move): Add support for DMRs.
227 * config/rs6000/rs6000.h (TARGET_DENSE_MATH): New macro.
228 (TARGET_MMA_DENSE_MATH): Likewise.
229 (TARGET_MMA_NO_DENSE_MATH): Likewise
230 (UNITS_PER_DMR_WORD): Likewise.
231 (FIRST_PSEUDO_REGISTER): Update for DMRs.
232 (FIXED_REGISTERS): Add DMRs.
233 (CALL_REALLY_USED_REGISTERS): Likewise.
234 (REG_ALLOC_ORDER): Likewise.
235 (DMR_REGNO_P): New macro.
236 (enum reg_class): Add DM_REGS.
237 (REG_CLASS_NAMES): Likewise.
238 (REG_CLASS_CONTENTS): Likewise.
239 (enum r6000_reg_class_enum): Add RS6000_CONSTRAINT_wD.
240 (REGISTER_NAMES): Add DMR registers.
241 (ADDITIONAL_REGISTER_NAMES): Likewise.
243 ==================== Branch work163-dmf, patch #102 ====================
247 This patch adds a new constraint ('wD') that matches the accumulator registers
248 that overlap with VSX registers 0..31 on power10. Future patches will add the
249 support for a separate accumulator register class that will be used when the
250 support for dense math registes is added.
252 2024-03-19 Michael Meissner <meissner@linux.ibm.com>
254 * config/rs6000/constraints.md (wD): New constraint.
255 * config/rs6000/mma.md (mma_disassemble_acc): Likewise.
256 (mma_<vv>): Likewise.
257 (mma_<avv>): Likewise.
258 (mma_<pv>): Likewise.
259 (mma_<apv>): Likewise.
260 (mma_<vvi4i4i8>): Likewise.
261 (mma_<avvi4i4i8>): Likewise.
262 (mma_<vvi4i4i2>): Likewise.
263 (mma_<avvi4i4i2>): Likewise.
264 (mma_<vvi4i4>): Likewise.
265 (mma_<avvi4i4>): Likewise.
266 (mma_<pvi4i2): Likewise.
267 (mma_<apvi4i2>): Likewise.
268 (mma_<vvi4i4i4>): Likewise.
269 (mma_<avvi4i4i4): Likewise.
270 * config/rs6000/predicates.md (accumulator_operand): New predicate.
271 * config/rs6000/rs6000.cc (rs6000_debug_reg_global): Print the register
272 class for the 'wD' constraint.
273 (rs6000_init_hard_regno_mode_ok): Set the 'wD' register constraint
275 * config/rs6000/rs6000.h (enum r6000_reg_class_enum): Add element for
277 * doc/md.texi (PowerPC constraints): Document the 'wD' constraint.
279 ==================== Branch work163-dmf, patch #101 ====================
281 Use vector pair load/store for memcpy with -mcpu=future
283 In the development for the power10 processor, GCC did not enable using the load
284 vector pair and store vector pair instructions when optimizing things like
285 memory copy. This patch enables using those instructions if -mcpu=future is
288 2024-03-18 Michael Meissner <meissner@linux.ibm.com>
292 * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): Enable using
293 load vector pair and store vector pair instructions for memory copy
295 (POWERPC_MASKS): Make the bit for enabling using load vector pair and
296 store vector pair operations set and reset when the PowerPC processor is
299 ==================== Branch work163-dmf, baseline ====================
301 Add ChangeLog.dmf and update REVISION.
303 2024-03-18 Michael Meissner <meissner@linux.ibm.com>
307 * ChangeLog.dmf: New file for branch.
310 2024-03-18 Michael Meissner <meissner@linux.ibm.com>