]> gcc.gnu.org Git - gcc.git/blob - gcc/ChangeLog.dmf
Update ChangeLog.*
[gcc.git] / gcc / ChangeLog.dmf
1 ==================== Branch work163-dmf, patch #106 ====================
2
3 PowerPC: Add support for 1,024 bit DMR registers.
4
5 This patch is a prelimianry patch to add the full 1,024 bit dense math register
6 (DMRs) for -mcpu=future. The MMA 512-bit accumulators map onto the top of the
7 DMR register.
8
9 This patch only adds the new 1,024 bit register support. It does not add
10 support for any instructions that need 1,024 bit registers instead of 512 bit
11 registers.
12
13 I used the new mode 'TDOmode' to be the opaque mode used for 1,024 bit
14 registers. The 'wD' constraint added in previous patches is used for these
15 registers. I added support to do load and store of DMRs via the VSX registers,
16 since there are no load/store dense math instructions. I added the new keyword
17 '__dmr' to create 1,024 bit types that can be loaded into DMRs. At present, I
18 don't have aliases for __dmr512 and __dmr1024 that we've discussed internally.
19
20 The patches have been tested on both little and big endian systems. Can I check
21 it into the master branch?
22
23 2024-03-19 Michael Meissner <meissner@linux.ibm.com>
24
25 gcc/
26
27 * config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec.
28 (UNSPEC_DM_INSERT512_LOWER): Likewise.
29 (UNSPEC_DM_EXTRACT512): Likewise.
30 (UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise.
31 (UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise.
32 (movtdo): New define_expand and define_insn_and_split to implement 1,024
33 bit DMR registers.
34 (movtdo_insert512_upper): New insn.
35 (movtdo_insert512_lower): Likewise.
36 (movtdo_extract512): Likewise.
37 (reload_dmr_from_memory): Likewise.
38 (reload_dmr_to_memory): Likewise.
39 * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR
40 support.
41 (rs6000_init_builtins): Add support for __dmr keyword.
42 * config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support
43 for TDOmode.
44 (rs6000_function_arg): Likewise.
45 * config/rs6000/rs6000-modes.def (TDOmode): New mode.
46 * config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add
47 support for TDOmode.
48 (rs6000_hard_regno_mode_ok_uncached): Likewise.
49 (rs6000_hard_regno_mode_ok): Likewise.
50 (rs6000_modes_tieable_p): Likewise.
51 (rs6000_debug_reg_global): Likewise.
52 (rs6000_setup_reg_addr_masks): Likewise.
53 (rs6000_init_hard_regno_mode_ok): Add support for TDOmode. Setup reload
54 hooks for DMR mode.
55 (reg_offset_addressing_ok_p): Add support for TDOmode.
56 (rs6000_emit_move): Likewise.
57 (rs6000_secondary_reload_simple_move): Likewise.
58 (rs6000_preferred_reload_class): Likewise.
59 (rs6000_secondary_reload_class): Likewise.
60 (rs6000_mangle_type): Add mangling for __dmr type.
61 (rs6000_dmr_register_move_cost): Add support for TDOmode.
62 (rs6000_split_multireg_move): Likewise.
63 (rs6000_invalid_conversion): Likewise.
64 * config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode.
65 (enum rs6000_builtin_type_index): Add DMR type nodes.
66 (dmr_type_node): Likewise.
67 (ptr_dmr_type_node): Likewise.
68
69 gcc/testsuite/
70
71 * gcc.target/powerpc/dm-1024bit.c: New test.
72
73 ==================== Branch work163-dmf, patch #105 ====================
74
75 Add dense math test for new instruction names.
76
77 2024-03-19 Michael Meissner <meissner@linux.ibm.com>
78
79 gcc/testsuite/
80
81 * gcc.target/powerpc/dm-double-test.c: New test.
82 * lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New
83 target test.
84
85 ==================== Branch work163-dmf, patch #104 ====================
86
87 PowerPC: Switch to dense math names for all MMA operations.
88
89 This patch changes the assembler instruction names for MMA instructions from
90 the original name used in power10 to the new name when used with the dense math
91 system. I.e. xvf64gerpp becomes dmxvf64gerpp. The assembler will emit the
92 same bits for either spelling.
93
94 For the non-prefixed MMA instructions, we add a 'dm' prefix in front of the
95 instruction. However, the prefixed instructions have a 'pm' prefix, and we add
96 the 'dm' prefix afterwards. To prevent having two sets of parallel int
97 attributes, we remove the "pm" prefix from the instruction string in the
98 attributes, and add it later, both in the insn name and in the output template.
99
100 2024-03-19 Michael Meissner <meissner@linux.ibm.com>
101
102 gcc/
103
104 * config/rs6000/mma.md (vvi4i4i8): Change the instruction to not have a
105 "pm" prefix.
106 (avvi4i4i8): Likewise.
107 (vvi4i4i2): Likewise.
108 (avvi4i4i2): Likewise.
109 (vvi4i4): Likewise.
110 (avvi4i4): Likewise.
111 (pvi4i2): Likewise.
112 (apvi4i2): Likewise.
113 (vvi4i4i4): Likewise.
114 (avvi4i4i4): Likewise.
115 (mma_xxsetaccz): Add support for running on DMF systems, generating the
116 dense math instruction and using the dense math accumulators.
117 (mma_<vv>): Likewise.
118 (mma_<pv>): Likewise.
119 (mma_<avv>): Likewise.
120 (mma_<apv>): Likewise.
121 (mma_pm<vvi4i4i8>): Add support for running on DMF systems, generating
122 the dense math instruction and using the dense math accumulators.
123 Rename the insn with a 'pm' prefix and add either 'pm' or 'pmdm'
124 prefixes based on whether we have the original MMA specification or if
125 we have dense math support.
126 (mma_pm<avvi4i4i8>): Likewise.
127 (mma_pm<vvi4i4i2>): Likewise.
128 (mma_pm<avvi4i4i2>): Likewise.
129 (mma_pm<vvi4i4>): Likewise.
130 (mma_pm<avvi4i4): Likewise.
131 (mma_pm<pvi4i2>): Likewise.
132 (mma_pm<apvi4i2): Likewise.
133 (mma_pm<vvi4i4i4>): Likewise.
134 (mma_pm<avvi4i4i4>): Likewise.
135
136 ==================== Branch work163-dmf, patch #103 ====================
137
138 Add support for dense math registers.
139
140 The MMA subsystem added the notion of accumulator registers as an optional
141 feature of ISA 3.1 (power10). In ISA 3.1, these accumulators overlapped with
142 the VSX registers 0..31, but logically the accumulator registers were separate
143 from the FPR registers. In ISA 3.1, it was anticipated that in future systems,
144 the accumulator registers may no overlap with the FPR registers. This patch
145 adds the support for dense math registers as separate registers.
146
147 This particular patch does not change the MMA support to use the accumulators
148 within the dense math registers. This patch just adds the basic support for
149 having separate DMRs. The next patch will switch the MMA support to use the
150 accumulators if -mcpu=future is used.
151
152 For testing purposes, I added an undocumented option '-mdense-math' to enable
153 or disable the dense math support.
154
155 This patch adds a new constraint (wD). If MMA is selected but dense math is
156 not selected (i.e. -mcpu=power10), the wD constraint will allow access to
157 accumulators that overlap with VSX registers 0..31. If both MMA and dense math
158 are selected (i.e. -mcpu=future), the wD constraint will only allow dense math
159 registers.
160
161 This patch modifies the existing %A output modifier. If MMA is selected but
162 dense math is not selected, then %A output modifier converts the VSX register
163 number to the accumulator number, by dividing it by 4. If both MMA and dense
164 math are selected, then %A will map the separate DMR registers into 0..7.
165
166 The intention is that user code using extended asm can be modified to run on
167 both MMA without dense math and MMA with dense math:
168
169 1) If possible, don't use extended asm, but instead use the MMA built-in
170 functions;
171
172 2) If you do need to write extended asm, change the d constraints
173 targetting accumulators should now use wD;
174
175 3) Only use the built-in zero, assemble and disassemble functions create
176 move data between vector quad types and dense math accumulators.
177 I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
178 extended asm code. The reason is these instructions assume there is a
179 1-to-1 correspondence between 4 adjacent FPR registers and an
180 accumulator that overlaps with those instructions. With accumulators
181 now being separate registers, there no longer is a 1-to-1
182 correspondence.
183
184 It is possible that the mangling for DMRs and the GDB register numbers may
185 produce other changes in the future.
186
187 2024-03-19 Michael Meissner <meissner@linux.ibm.com>
188
189 * config/rs6000/mma.md (movxo): Add comments about dense math registers.
190 (movxo_nodm): Rename from movxo and restrict the usage to machines
191 without dense math registers.
192 (movxo_dm): New insn for movxo support for machines with dense math
193 registers.
194 (mma_<acc>): Restrict usage to machines without dense math registers.
195 (mma_xxsetaccz): Make a define_expand, and add support for dense math
196 registers.
197 (mma_xxsetaccz_nodm): Rename from mma_xxsetaccz, and restrict to
198 machines without dense math registers.
199 (mma_dmsetaccz): New insn.
200 * config/rs6000/predicates.md (dmr_operand): New predicate.
201 (accumulator_operand): Add support for dense math registers.
202 * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_mma_builtin): Do
203 not de-prime accumulator when disassembling a vector quad.
204 * config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE.
205 (enum rs6000_reload_reg_type): Add RELOAD_REG_DMR.
206 (LAST_RELOAD_REG_CLASS): Add support for DMR registers and the wD
207 constraint.
208 (reload_reg_map): Likewise.
209 (rs6000_reg_names): Likewise.
210 (alt_reg_names): Likewise.
211 (rs6000_hard_regno_nregs_internal): Likewise.
212 (rs6000_hard_regno_mode_ok_uncached): Likewise.
213 (rs6000_debug_reg_global): Likewise.
214 (rs6000_setup_reg_addr_masks): Likewise.
215 (rs6000_init_hard_regno_mode_ok): Likewise.
216 (rs6000_secondary_reload_memory): Add support for DMR registers.
217 (rs6000_secondary_reload_simple_move): Likewise.
218 (rs6000_preferred_reload_class): Likewise.
219 (rs6000_secondary_reload_class): Likewise.
220 (print_operand): Make %A handle both FPRs and DMRs.
221 (rs6000_dmr_register_move_cost): New helper function.
222 (rs6000_register_move_cost): Add support for DMR registers.
223 (rs6000_memory_move_cost): Likewise.
224 (rs6000_compute_pressure_classes): Likewise.
225 (rs6000_debugger_regno): Likewise.
226 (rs6000_split_multireg_move): Add support for DMRs.
227 * config/rs6000/rs6000.h (TARGET_DENSE_MATH): New macro.
228 (TARGET_MMA_DENSE_MATH): Likewise.
229 (TARGET_MMA_NO_DENSE_MATH): Likewise
230 (UNITS_PER_DMR_WORD): Likewise.
231 (FIRST_PSEUDO_REGISTER): Update for DMRs.
232 (FIXED_REGISTERS): Add DMRs.
233 (CALL_REALLY_USED_REGISTERS): Likewise.
234 (REG_ALLOC_ORDER): Likewise.
235 (DMR_REGNO_P): New macro.
236 (enum reg_class): Add DM_REGS.
237 (REG_CLASS_NAMES): Likewise.
238 (REG_CLASS_CONTENTS): Likewise.
239 (enum r6000_reg_class_enum): Add RS6000_CONSTRAINT_wD.
240 (REGISTER_NAMES): Add DMR registers.
241 (ADDITIONAL_REGISTER_NAMES): Likewise.
242
243 ==================== Branch work163-dmf, patch #102 ====================
244
245 Add wD constraint.
246
247 This patch adds a new constraint ('wD') that matches the accumulator registers
248 that overlap with VSX registers 0..31 on power10. Future patches will add the
249 support for a separate accumulator register class that will be used when the
250 support for dense math registes is added.
251
252 2024-03-19 Michael Meissner <meissner@linux.ibm.com>
253
254 * config/rs6000/constraints.md (wD): New constraint.
255 * config/rs6000/mma.md (mma_disassemble_acc): Likewise.
256 (mma_<vv>): Likewise.
257 (mma_<avv>): Likewise.
258 (mma_<pv>): Likewise.
259 (mma_<apv>): Likewise.
260 (mma_<vvi4i4i8>): Likewise.
261 (mma_<avvi4i4i8>): Likewise.
262 (mma_<vvi4i4i2>): Likewise.
263 (mma_<avvi4i4i2>): Likewise.
264 (mma_<vvi4i4>): Likewise.
265 (mma_<avvi4i4>): Likewise.
266 (mma_<pvi4i2): Likewise.
267 (mma_<apvi4i2>): Likewise.
268 (mma_<vvi4i4i4>): Likewise.
269 (mma_<avvi4i4i4): Likewise.
270 * config/rs6000/predicates.md (accumulator_operand): New predicate.
271 * config/rs6000/rs6000.cc (rs6000_debug_reg_global): Print the register
272 class for the 'wD' constraint.
273 (rs6000_init_hard_regno_mode_ok): Set the 'wD' register constraint
274 class.
275 * config/rs6000/rs6000.h (enum r6000_reg_class_enum): Add element for
276 the 'wD' constraint.
277 * doc/md.texi (PowerPC constraints): Document the 'wD' constraint.
278
279 ==================== Branch work163-dmf, patch #101 ====================
280
281 Use vector pair load/store for memcpy with -mcpu=future
282
283 In the development for the power10 processor, GCC did not enable using the load
284 vector pair and store vector pair instructions when optimizing things like
285 memory copy. This patch enables using those instructions if -mcpu=future is
286 used.
287
288 2024-03-18 Michael Meissner <meissner@linux.ibm.com>
289
290 gcc/
291
292 * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): Enable using
293 load vector pair and store vector pair instructions for memory copy
294 operations.
295 (POWERPC_MASKS): Make the bit for enabling using load vector pair and
296 store vector pair operations set and reset when the PowerPC processor is
297 changed.
298
299 ==================== Branch work163-dmf, baseline ====================
300
301 Add ChangeLog.dmf and update REVISION.
302
303 2024-03-18 Michael Meissner <meissner@linux.ibm.com>
304
305 gcc/
306
307 * ChangeLog.dmf: New file for branch.
308 * REVISION: Update.
309
310 2024-03-18 Michael Meissner <meissner@linux.ibm.com>
311
312 Clone branch
This page took 0.045877 seconds and 5 git commands to generate.