]>
Commit | Line | Data |
---|---|---|
672c4b0e MM |
1 | ==================== Branch work163-dmf, patch #106 ==================== |
2 | ||
3 | PowerPC: Add support for 1,024 bit DMR registers. | |
4 | ||
5 | This patch is a prelimianry patch to add the full 1,024 bit dense math register | |
6 | (DMRs) for -mcpu=future. The MMA 512-bit accumulators map onto the top of the | |
7 | DMR register. | |
8 | ||
9 | This patch only adds the new 1,024 bit register support. It does not add | |
10 | support for any instructions that need 1,024 bit registers instead of 512 bit | |
11 | registers. | |
12 | ||
13 | I used the new mode 'TDOmode' to be the opaque mode used for 1,024 bit | |
14 | registers. The 'wD' constraint added in previous patches is used for these | |
15 | registers. I added support to do load and store of DMRs via the VSX registers, | |
16 | since there are no load/store dense math instructions. I added the new keyword | |
17 | '__dmr' to create 1,024 bit types that can be loaded into DMRs. At present, I | |
18 | don't have aliases for __dmr512 and __dmr1024 that we've discussed internally. | |
19 | ||
20 | The patches have been tested on both little and big endian systems. Can I check | |
21 | it into the master branch? | |
22 | ||
23 | 2024-03-19 Michael Meissner <meissner@linux.ibm.com> | |
24 | ||
25 | gcc/ | |
26 | ||
27 | * config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec. | |
28 | (UNSPEC_DM_INSERT512_LOWER): Likewise. | |
29 | (UNSPEC_DM_EXTRACT512): Likewise. | |
30 | (UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise. | |
31 | (UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise. | |
32 | (movtdo): New define_expand and define_insn_and_split to implement 1,024 | |
33 | bit DMR registers. | |
34 | (movtdo_insert512_upper): New insn. | |
35 | (movtdo_insert512_lower): Likewise. | |
36 | (movtdo_extract512): Likewise. | |
37 | (reload_dmr_from_memory): Likewise. | |
38 | (reload_dmr_to_memory): Likewise. | |
39 | * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR | |
40 | support. | |
41 | (rs6000_init_builtins): Add support for __dmr keyword. | |
42 | * config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support | |
43 | for TDOmode. | |
44 | (rs6000_function_arg): Likewise. | |
45 | * config/rs6000/rs6000-modes.def (TDOmode): New mode. | |
46 | * config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add | |
47 | support for TDOmode. | |
48 | (rs6000_hard_regno_mode_ok_uncached): Likewise. | |
49 | (rs6000_hard_regno_mode_ok): Likewise. | |
50 | (rs6000_modes_tieable_p): Likewise. | |
51 | (rs6000_debug_reg_global): Likewise. | |
52 | (rs6000_setup_reg_addr_masks): Likewise. | |
53 | (rs6000_init_hard_regno_mode_ok): Add support for TDOmode. Setup reload | |
54 | hooks for DMR mode. | |
55 | (reg_offset_addressing_ok_p): Add support for TDOmode. | |
56 | (rs6000_emit_move): Likewise. | |
57 | (rs6000_secondary_reload_simple_move): Likewise. | |
58 | (rs6000_preferred_reload_class): Likewise. | |
59 | (rs6000_secondary_reload_class): Likewise. | |
60 | (rs6000_mangle_type): Add mangling for __dmr type. | |
61 | (rs6000_dmr_register_move_cost): Add support for TDOmode. | |
62 | (rs6000_split_multireg_move): Likewise. | |
63 | (rs6000_invalid_conversion): Likewise. | |
64 | * config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode. | |
65 | (enum rs6000_builtin_type_index): Add DMR type nodes. | |
66 | (dmr_type_node): Likewise. | |
67 | (ptr_dmr_type_node): Likewise. | |
68 | ||
69 | gcc/testsuite/ | |
70 | ||
71 | * gcc.target/powerpc/dm-1024bit.c: New test. | |
72 | ||
73 | ==================== Branch work163-dmf, patch #105 ==================== | |
74 | ||
75 | Add dense math test for new instruction names. | |
76 | ||
77 | 2024-03-19 Michael Meissner <meissner@linux.ibm.com> | |
78 | ||
79 | gcc/testsuite/ | |
80 | ||
81 | * gcc.target/powerpc/dm-double-test.c: New test. | |
82 | * lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New | |
83 | target test. | |
84 | ||
85 | ==================== Branch work163-dmf, patch #104 ==================== | |
86 | ||
87 | PowerPC: Switch to dense math names for all MMA operations. | |
88 | ||
89 | This patch changes the assembler instruction names for MMA instructions from | |
90 | the original name used in power10 to the new name when used with the dense math | |
91 | system. I.e. xvf64gerpp becomes dmxvf64gerpp. The assembler will emit the | |
92 | same bits for either spelling. | |
93 | ||
94 | For the non-prefixed MMA instructions, we add a 'dm' prefix in front of the | |
95 | instruction. However, the prefixed instructions have a 'pm' prefix, and we add | |
96 | the 'dm' prefix afterwards. To prevent having two sets of parallel int | |
97 | attributes, we remove the "pm" prefix from the instruction string in the | |
98 | attributes, and add it later, both in the insn name and in the output template. | |
99 | ||
100 | 2024-03-19 Michael Meissner <meissner@linux.ibm.com> | |
101 | ||
102 | gcc/ | |
103 | ||
104 | * config/rs6000/mma.md (vvi4i4i8): Change the instruction to not have a | |
105 | "pm" prefix. | |
106 | (avvi4i4i8): Likewise. | |
107 | (vvi4i4i2): Likewise. | |
108 | (avvi4i4i2): Likewise. | |
109 | (vvi4i4): Likewise. | |
110 | (avvi4i4): Likewise. | |
111 | (pvi4i2): Likewise. | |
112 | (apvi4i2): Likewise. | |
113 | (vvi4i4i4): Likewise. | |
114 | (avvi4i4i4): Likewise. | |
115 | (mma_xxsetaccz): Add support for running on DMF systems, generating the | |
116 | dense math instruction and using the dense math accumulators. | |
117 | (mma_<vv>): Likewise. | |
118 | (mma_<pv>): Likewise. | |
119 | (mma_<avv>): Likewise. | |
120 | (mma_<apv>): Likewise. | |
121 | (mma_pm<vvi4i4i8>): Add support for running on DMF systems, generating | |
122 | the dense math instruction and using the dense math accumulators. | |
123 | Rename the insn with a 'pm' prefix and add either 'pm' or 'pmdm' | |
124 | prefixes based on whether we have the original MMA specification or if | |
125 | we have dense math support. | |
126 | (mma_pm<avvi4i4i8>): Likewise. | |
127 | (mma_pm<vvi4i4i2>): Likewise. | |
128 | (mma_pm<avvi4i4i2>): Likewise. | |
129 | (mma_pm<vvi4i4>): Likewise. | |
130 | (mma_pm<avvi4i4): Likewise. | |
131 | (mma_pm<pvi4i2>): Likewise. | |
132 | (mma_pm<apvi4i2): Likewise. | |
133 | (mma_pm<vvi4i4i4>): Likewise. | |
134 | (mma_pm<avvi4i4i4>): Likewise. | |
135 | ||
136 | ==================== Branch work163-dmf, patch #103 ==================== | |
137 | ||
138 | Add support for dense math registers. | |
139 | ||
140 | The MMA subsystem added the notion of accumulator registers as an optional | |
141 | feature of ISA 3.1 (power10). In ISA 3.1, these accumulators overlapped with | |
142 | the VSX registers 0..31, but logically the accumulator registers were separate | |
143 | from the FPR registers. In ISA 3.1, it was anticipated that in future systems, | |
144 | the accumulator registers may no overlap with the FPR registers. This patch | |
145 | adds the support for dense math registers as separate registers. | |
146 | ||
147 | This particular patch does not change the MMA support to use the accumulators | |
148 | within the dense math registers. This patch just adds the basic support for | |
149 | having separate DMRs. The next patch will switch the MMA support to use the | |
150 | accumulators if -mcpu=future is used. | |
151 | ||
152 | For testing purposes, I added an undocumented option '-mdense-math' to enable | |
153 | or disable the dense math support. | |
154 | ||
155 | This patch adds a new constraint (wD). If MMA is selected but dense math is | |
156 | not selected (i.e. -mcpu=power10), the wD constraint will allow access to | |
157 | accumulators that overlap with VSX registers 0..31. If both MMA and dense math | |
158 | are selected (i.e. -mcpu=future), the wD constraint will only allow dense math | |
159 | registers. | |
160 | ||
161 | This patch modifies the existing %A output modifier. If MMA is selected but | |
162 | dense math is not selected, then %A output modifier converts the VSX register | |
163 | number to the accumulator number, by dividing it by 4. If both MMA and dense | |
164 | math are selected, then %A will map the separate DMR registers into 0..7. | |
165 | ||
166 | The intention is that user code using extended asm can be modified to run on | |
167 | both MMA without dense math and MMA with dense math: | |
168 | ||
169 | 1) If possible, don't use extended asm, but instead use the MMA built-in | |
170 | functions; | |
171 | ||
172 | 2) If you do need to write extended asm, change the d constraints | |
173 | targetting accumulators should now use wD; | |
174 | ||
175 | 3) Only use the built-in zero, assemble and disassemble functions create | |
176 | move data between vector quad types and dense math accumulators. | |
177 | I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the | |
178 | extended asm code. The reason is these instructions assume there is a | |
179 | 1-to-1 correspondence between 4 adjacent FPR registers and an | |
180 | accumulator that overlaps with those instructions. With accumulators | |
181 | now being separate registers, there no longer is a 1-to-1 | |
182 | correspondence. | |
183 | ||
184 | It is possible that the mangling for DMRs and the GDB register numbers may | |
185 | produce other changes in the future. | |
186 | ||
187 | 2024-03-19 Michael Meissner <meissner@linux.ibm.com> | |
188 | ||
189 | * config/rs6000/mma.md (movxo): Add comments about dense math registers. | |
190 | (movxo_nodm): Rename from movxo and restrict the usage to machines | |
191 | without dense math registers. | |
192 | (movxo_dm): New insn for movxo support for machines with dense math | |
193 | registers. | |
194 | (mma_<acc>): Restrict usage to machines without dense math registers. | |
195 | (mma_xxsetaccz): Make a define_expand, and add support for dense math | |
196 | registers. | |
197 | (mma_xxsetaccz_nodm): Rename from mma_xxsetaccz, and restrict to | |
198 | machines without dense math registers. | |
199 | (mma_dmsetaccz): New insn. | |
200 | * config/rs6000/predicates.md (dmr_operand): New predicate. | |
201 | (accumulator_operand): Add support for dense math registers. | |
202 | * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_mma_builtin): Do | |
203 | not de-prime accumulator when disassembling a vector quad. | |
204 | * config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE. | |
205 | (enum rs6000_reload_reg_type): Add RELOAD_REG_DMR. | |
206 | (LAST_RELOAD_REG_CLASS): Add support for DMR registers and the wD | |
207 | constraint. | |
208 | (reload_reg_map): Likewise. | |
209 | (rs6000_reg_names): Likewise. | |
210 | (alt_reg_names): Likewise. | |
211 | (rs6000_hard_regno_nregs_internal): Likewise. | |
212 | (rs6000_hard_regno_mode_ok_uncached): Likewise. | |
213 | (rs6000_debug_reg_global): Likewise. | |
214 | (rs6000_setup_reg_addr_masks): Likewise. | |
215 | (rs6000_init_hard_regno_mode_ok): Likewise. | |
216 | (rs6000_secondary_reload_memory): Add support for DMR registers. | |
217 | (rs6000_secondary_reload_simple_move): Likewise. | |
218 | (rs6000_preferred_reload_class): Likewise. | |
219 | (rs6000_secondary_reload_class): Likewise. | |
220 | (print_operand): Make %A handle both FPRs and DMRs. | |
221 | (rs6000_dmr_register_move_cost): New helper function. | |
222 | (rs6000_register_move_cost): Add support for DMR registers. | |
223 | (rs6000_memory_move_cost): Likewise. | |
224 | (rs6000_compute_pressure_classes): Likewise. | |
225 | (rs6000_debugger_regno): Likewise. | |
226 | (rs6000_split_multireg_move): Add support for DMRs. | |
227 | * config/rs6000/rs6000.h (TARGET_DENSE_MATH): New macro. | |
228 | (TARGET_MMA_DENSE_MATH): Likewise. | |
229 | (TARGET_MMA_NO_DENSE_MATH): Likewise | |
230 | (UNITS_PER_DMR_WORD): Likewise. | |
231 | (FIRST_PSEUDO_REGISTER): Update for DMRs. | |
232 | (FIXED_REGISTERS): Add DMRs. | |
233 | (CALL_REALLY_USED_REGISTERS): Likewise. | |
234 | (REG_ALLOC_ORDER): Likewise. | |
235 | (DMR_REGNO_P): New macro. | |
236 | (enum reg_class): Add DM_REGS. | |
237 | (REG_CLASS_NAMES): Likewise. | |
238 | (REG_CLASS_CONTENTS): Likewise. | |
239 | (enum r6000_reg_class_enum): Add RS6000_CONSTRAINT_wD. | |
240 | (REGISTER_NAMES): Add DMR registers. | |
241 | (ADDITIONAL_REGISTER_NAMES): Likewise. | |
242 | ||
243 | ==================== Branch work163-dmf, patch #102 ==================== | |
244 | ||
245 | Add wD constraint. | |
246 | ||
247 | This patch adds a new constraint ('wD') that matches the accumulator registers | |
248 | that overlap with VSX registers 0..31 on power10. Future patches will add the | |
249 | support for a separate accumulator register class that will be used when the | |
250 | support for dense math registes is added. | |
251 | ||
252 | 2024-03-19 Michael Meissner <meissner@linux.ibm.com> | |
253 | ||
254 | * config/rs6000/constraints.md (wD): New constraint. | |
255 | * config/rs6000/mma.md (mma_disassemble_acc): Likewise. | |
256 | (mma_<vv>): Likewise. | |
257 | (mma_<avv>): Likewise. | |
258 | (mma_<pv>): Likewise. | |
259 | (mma_<apv>): Likewise. | |
260 | (mma_<vvi4i4i8>): Likewise. | |
261 | (mma_<avvi4i4i8>): Likewise. | |
262 | (mma_<vvi4i4i2>): Likewise. | |
263 | (mma_<avvi4i4i2>): Likewise. | |
264 | (mma_<vvi4i4>): Likewise. | |
265 | (mma_<avvi4i4>): Likewise. | |
266 | (mma_<pvi4i2): Likewise. | |
267 | (mma_<apvi4i2>): Likewise. | |
268 | (mma_<vvi4i4i4>): Likewise. | |
269 | (mma_<avvi4i4i4): Likewise. | |
270 | * config/rs6000/predicates.md (accumulator_operand): New predicate. | |
271 | * config/rs6000/rs6000.cc (rs6000_debug_reg_global): Print the register | |
272 | class for the 'wD' constraint. | |
273 | (rs6000_init_hard_regno_mode_ok): Set the 'wD' register constraint | |
274 | class. | |
275 | * config/rs6000/rs6000.h (enum r6000_reg_class_enum): Add element for | |
276 | the 'wD' constraint. | |
277 | * doc/md.texi (PowerPC constraints): Document the 'wD' constraint. | |
278 | ||
279 | ==================== Branch work163-dmf, patch #101 ==================== | |
280 | ||
281 | Use vector pair load/store for memcpy with -mcpu=future | |
282 | ||
283 | In the development for the power10 processor, GCC did not enable using the load | |
284 | vector pair and store vector pair instructions when optimizing things like | |
285 | memory copy. This patch enables using those instructions if -mcpu=future is | |
286 | used. | |
287 | ||
288 | 2024-03-18 Michael Meissner <meissner@linux.ibm.com> | |
289 | ||
290 | gcc/ | |
291 | ||
292 | * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): Enable using | |
293 | load vector pair and store vector pair instructions for memory copy | |
294 | operations. | |
295 | (POWERPC_MASKS): Make the bit for enabling using load vector pair and | |
296 | store vector pair operations set and reset when the PowerPC processor is | |
297 | changed. | |
298 | ||
a275a33d MM |
299 | ==================== Branch work163-dmf, baseline ==================== |
300 | ||
672c4b0e MM |
301 | Add ChangeLog.dmf and update REVISION. |
302 | ||
303 | 2024-03-18 Michael Meissner <meissner@linux.ibm.com> | |
304 | ||
305 | gcc/ | |
306 | ||
307 | * ChangeLog.dmf: New file for branch. | |
308 | * REVISION: Update. | |
309 | ||
a275a33d MM |
310 | 2024-03-18 Michael Meissner <meissner@linux.ibm.com> |
311 | ||
312 | Clone branch |