gcc.gnu.org Git - gcc.git/log

Merge commit 'refs/users/meissner/heads/work163-dmf' of git+ssh://gcc.gnu.org/git/gcc into me/work163-dmf

Update ChangeLog.*

PowerPC: Add support for 1,024 bit DMR registers.

This patch is a prelimianry patch to add the full 1,024 bit dense math register
(DMRs) for -mcpu=future.  The MMA 512-bit accumulators map onto the top of the
DMR register.

This patch only adds the new 1,024 bit register support.  It does not add
support for any instructions that need 1,024 bit registers instead of 512 bit
registers.

I used the new mode 'TDOmode' to be the opaque mode used for 1,024 bit
registers.  The 'wD' constraint added in previous patches is used for these
registers.  I added support to do load and store of DMRs via the VSX registers,
since there are no load/store dense math instructions.  I added the new keyword
'__dmr' to create 1,024 bit types that can be loaded into DMRs.  At present, I
don't have aliases for __dmr512 and __dmr1024 that we've discussed internally.

The patches have been tested on both little and big endian systems.  Can I check
it into the master branch?

2024-03-19   Michael Meissner  <meissner@linux.ibm.com>

gcc/

* config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec.
(UNSPEC_DM_INSERT512_LOWER): Likewise.
(UNSPEC_DM_EXTRACT512): Likewise.
(UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise.
(UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise.
(movtdo): New define_expand and define_insn_and_split to implement 1,024
bit DMR registers.
(movtdo_insert512_upper): New insn.
(movtdo_insert512_lower): Likewise.
(movtdo_extract512): Likewise.
(reload_dmr_from_memory): Likewise.
(reload_dmr_to_memory): Likewise.
* config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR
support.
(rs6000_init_builtins): Add support for __dmr keyword.
* config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support
for TDOmode.
(rs6000_function_arg): Likewise.
* config/rs6000/rs6000-modes.def (TDOmode): New mode.
* config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add
support for TDOmode.
(rs6000_hard_regno_mode_ok_uncached): Likewise.
(rs6000_hard_regno_mode_ok): Likewise.
(rs6000_modes_tieable_p): Likewise.
(rs6000_debug_reg_global): Likewise.
(rs6000_setup_reg_addr_masks): Likewise.
(rs6000_init_hard_regno_mode_ok): Add support for TDOmode.  Setup reload
hooks for DMR mode.
(reg_offset_addressing_ok_p): Add support for TDOmode.
(rs6000_emit_move): Likewise.
(rs6000_secondary_reload_simple_move): Likewise.
(rs6000_preferred_reload_class): Likewise.
(rs6000_secondary_reload_class): Likewise.
(rs6000_mangle_type): Add mangling for __dmr type.
(rs6000_dmr_register_move_cost): Add support for TDOmode.
(rs6000_split_multireg_move): Likewise.
(rs6000_invalid_conversion): Likewise.
* config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode.
(enum rs6000_builtin_type_index): Add DMR type nodes.
(dmr_type_node): Likewise.
(ptr_dmr_type_node): Likewise.

gcc/testsuite/

* gcc.target/powerpc/dm-1024bit.c: New test.

Add dense math test for new instruction names.

2024-03-19 Michael Meissner <meissner@linux.ibm.com>

gcc/testsuite/

* gcc.target/powerpc/dm-double-test.c: New test.
* lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New
target test.

PowerPC: Switch to dense math names for all MMA operations.

This patch changes the assembler instruction names for MMA instructions from
the original name used in power10 to the new name when used with the dense math
system.  I.e. xvf64gerpp becomes dmxvf64gerpp.  The assembler will emit the
same bits for either spelling.

For the non-prefixed MMA instructions, we add a 'dm' prefix in front of the
instruction.  However, the prefixed instructions have a 'pm' prefix, and we add
the 'dm' prefix afterwards.  To prevent having two sets of parallel int
attributes, we remove the "pm" prefix from the instruction string in the
attributes, and add it later, both in the insn name and in the output template.

2024-03-19   Michael Meissner  <meissner@linux.ibm.com>

gcc/

* config/rs6000/mma.md (vvi4i4i8): Change the instruction to not have a
"pm" prefix.
(avvi4i4i8): Likewise.
(vvi4i4i2): Likewise.
(avvi4i4i2): Likewise.
(vvi4i4): Likewise.
(avvi4i4): Likewise.
(pvi4i2): Likewise.
(apvi4i2): Likewise.
(vvi4i4i4): Likewise.
(avvi4i4i4): Likewise.
(mma_xxsetaccz): Add support for running on DMF systems, generating the
dense math instruction and using the dense math accumulators.
(mma_<vv>): Likewise.
(mma_<pv>): Likewise.
(mma_<avv>): Likewise.
(mma_<apv>): Likewise.
(mma_pm<vvi4i4i8>): Add support for running on DMF systems, generating
the dense math instruction and using the dense math accumulators.
Rename the insn with a 'pm' prefix and add either 'pm' or 'pmdm'
prefixes based on whether we have the original MMA specification or if
we have dense math support.
(mma_pm<avvi4i4i8>): Likewise.
(mma_pm<vvi4i4i2>): Likewise.
(mma_pm<avvi4i4i2>): Likewise.
(mma_pm<vvi4i4>): Likewise.
(mma_pm<avvi4i4): Likewise.
(mma_pm<pvi4i2>): Likewise.
(mma_pm<apvi4i2): Likewise.
(mma_pm<vvi4i4i4>): Likewise.
(mma_pm<avvi4i4i4>): Likewise.

Add support for dense math registers.

The MMA subsystem added the notion of accumulator registers as an optional
feature of ISA 3.1 (power10).  In ISA 3.1, these accumulators overlapped with
the VSX registers 0..31, but logically the accumulator registers were separate
from the FPR registers.  In ISA 3.1, it was anticipated that in future systems,
the accumulator registers may no overlap with the FPR registers.  This patch
adds the support for dense math registers as separate registers.

This particular patch does not change the MMA support to use the accumulators
within the dense math registers.  This patch just adds the basic support for
having separate DMRs.  The next patch will switch the MMA support to use the
accumulators if -mcpu=future is used.

For testing purposes, I added an undocumented option '-mdense-math' to enable
or disable the dense math support.

This patch adds a new constraint (wD).  If MMA is selected but dense math is
not selected (i.e. -mcpu=power10), the wD constraint will allow access to
accumulators that overlap with VSX registers 0..31.  If both MMA and dense math
are selected (i.e. -mcpu=future), the wD constraint will only allow dense math
registers.

This patch modifies the existing %A output modifier.  If MMA is selected but
dense math is not selected, then %A output modifier converts the VSX register
number to the accumulator number, by dividing it by 4.  If both MMA and dense
math are selected, then %A will map the separate DMR registers into 0..7.

The intention is that user code using extended asm can be modified to run on
both MMA without dense math and MMA with dense math:

    1) If possible, don't use extended asm, but instead use the MMA built-in
functions;

    2) If you do need to write extended asm, change the d constraints
targetting accumulators should now use wD;

    3) Only use the built-in zero, assemble and disassemble functions create
move data between vector quad types and dense math accumulators.
I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
extended asm code.  The reason is these instructions assume there is a
1-to-1 correspondence between 4 adjacent FPR registers and an
accumulator that overlaps with those instructions.  With accumulators
now being separate registers, there no longer is a 1-to-1
correspondence.

It is possible that the mangling for DMRs and the GDB register numbers may
produce other changes in the future.

2024-03-19   Michael Meissner  <meissner@linux.ibm.com>

* config/rs6000/mma.md (movxo): Add comments about dense math registers.
(movxo_nodm): Rename from movxo and restrict the usage to machines
without dense math registers.
(movxo_dm): New insn for movxo support for machines with dense math
registers.
(mma_<acc>): Restrict usage to machines without dense math registers.
(mma_xxsetaccz): Make a define_expand, and add support for dense math
registers.
(mma_xxsetaccz_nodm): Rename from mma_xxsetaccz, and restrict to
machines without dense math registers.
(mma_dmsetaccz): New insn.
* config/rs6000/predicates.md (dmr_operand): New predicate.
(accumulator_operand): Add support for dense math registers.
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_mma_builtin): Do
not de-prime accumulator when disassembling a vector quad.
* config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE.
(enum rs6000_reload_reg_type): Add RELOAD_REG_DMR.
(LAST_RELOAD_REG_CLASS): Add support for DMR registers and the wD
constraint.
(reload_reg_map): Likewise.
(rs6000_reg_names): Likewise.
(alt_reg_names): Likewise.
(rs6000_hard_regno_nregs_internal): Likewise.
(rs6000_hard_regno_mode_ok_uncached): Likewise.
(rs6000_debug_reg_global): Likewise.
(rs6000_setup_reg_addr_masks): Likewise.
(rs6000_init_hard_regno_mode_ok): Likewise.
(rs6000_secondary_reload_memory): Add support for DMR registers.
(rs6000_secondary_reload_simple_move): Likewise.
(rs6000_preferred_reload_class): Likewise.
(rs6000_secondary_reload_class): Likewise.
(print_operand): Make %A handle both FPRs and DMRs.
(rs6000_dmr_register_move_cost): New helper function.
(rs6000_register_move_cost): Add support for DMR registers.
(rs6000_memory_move_cost): Likewise.
(rs6000_compute_pressure_classes): Likewise.
(rs6000_debugger_regno): Likewise.
(rs6000_split_multireg_move): Add support for DMRs.
* config/rs6000/rs6000.h (TARGET_DENSE_MATH): New macro.
(TARGET_MMA_DENSE_MATH): Likewise.
(TARGET_MMA_NO_DENSE_MATH): Likewise
(UNITS_PER_DMR_WORD): Likewise.
(FIRST_PSEUDO_REGISTER): Update for DMRs.
(FIXED_REGISTERS): Add DMRs.
(CALL_REALLY_USED_REGISTERS): Likewise.
(REG_ALLOC_ORDER): Likewise.
(DMR_REGNO_P): New macro.
(enum reg_class): Add DM_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(enum r6000_reg_class_enum): Add RS6000_CONSTRAINT_wD.
(REGISTER_NAMES): Add DMR registers.
(ADDITIONAL_REGISTER_NAMES): Likewise.

Add wD constraint.

This patch adds a new constraint ('wD') that matches the accumulator registers
that overlap with VSX registers 0..31 on power10. Future patches will add the
support for a separate accumulator register class that will be used when the
support for dense math registes is added.

2024-03-19 Michael Meissner <meissner@linux.ibm.com>

* config/rs6000/constraints.md (wD): New constraint.
* config/rs6000/mma.md (mma_disassemble_acc): Likewise.
(mma_<vv>): Likewise.
(mma_<avv>): Likewise.
(mma_<pv>): Likewise.
(mma_<apv>): Likewise.
(mma_<vvi4i4i8>): Likewise.
(mma_<avvi4i4i8>): Likewise.
(mma_<vvi4i4i2>): Likewise.
(mma_<avvi4i4i2>): Likewise.
(mma_<vvi4i4>): Likewise.
(mma_<avvi4i4>): Likewise.
(mma_<pvi4i2): Likewise.
(mma_<apvi4i2>): Likewise.
(mma_<vvi4i4i4>): Likewise.
(mma_<avvi4i4i4): Likewise.
* config/rs6000/predicates.md (accumulator_operand): New predicate.
* config/rs6000/rs6000.cc (rs6000_debug_reg_global): Print the register
class for the 'wD' constraint.
(rs6000_init_hard_regno_mode_ok): Set the 'wD' register constraint
class.
* config/rs6000/rs6000.h (enum r6000_reg_class_enum): Add element for
the 'wD' constraint.
* doc/md.texi (PowerPC constraints): Document the 'wD' constraint.

Use vector pair load/store for memcpy with -mcpu=future

In the development for the power10 processor, GCC did not enable using the load
vector pair and store vector pair instructions when optimizing things like
memory copy. This patch enables using those instructions if -mcpu=future is
used.

2024-03-18 Michael Meissner <meissner@linux.ibm.com>

gcc/

* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): Enable using
load vector pair and store vector pair instructions for memory copy
operations.
(POWERPC_MASKS): Make the bit for enabling using load vector pair and
store vector pair operations set and reset when the PowerPC processor is
changed.

Add ChangeLog.dmf and update REVISION.

2024-03-18 Michael Meissner <meissner@linux.ibm.com>

gcc/

* ChangeLog.dmf: New file for branch.
* REVISION: Update.

Add -mcpu=future tuning support.

This patch makes -mtune=future use the same tuning decision as -mtune=power11.

2024-03-18 Michael Meissner <meissner@linux.ibm.com>

gcc/

* config/rs6000/power10.md (all reservations): Add future as an
alterntive to power10 and power11.

Add -mcpu=future support.

This patch adds the future option to the -mcpu= and -mtune= switches.

This patch treats the future like a power11 in terms of costs and reassociation
width.

This patch issues a ".machine future" to the assembly file if you use
-mcpu=power11.

This patch defines _ARCH_PWR_FUTURE if the user uses -mcpu=future.

This patch allows GCC to be configured with the --with-cpu=future and
--with-tune=future options.

This patch passes -mfuture to the assembler if the user uses -mcpu=future.

2024-03-18 Michael Meissner <meissner@linux.ibm.com>

gcc/

* config.gcc (rs6000*-*-*, powerpc*-*-*): Add support for power11.
* config/rs6000/aix71.h (ASM_CPU_SPEC): Add support for -mcpu=power11.
* config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/driver-rs6000.cc (asm_names): Likewise.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
_ARCH_PWR_FUTURE if -mcpu=future.
* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): New define.
(POWERPC_MASKS): Add future isa bit.
(power11 cpu): Add future definition.
* config/rs6000/rs6000-opts.h (PROCESSOR_FUTURE): Add future processor.
* config/rs6000/rs6000-string.cc (expand_compare_loop): Likewise.
* config/rs6000/rs6000-tables.opt: Regenerate.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Add future
support.
(rs6000_machine_from_flags): Likewise.
(rs6000_reassociation_width): Likewise.
(rs6000_adjust_cost): Likewise.
(rs6000_issue_rate): Likewise.
(rs6000_sched_reorder): Likewise.
(rs6000_sched_reorder2): Likewise.
(rs6000_register_move_cost): Likewise.
(rs6000_opt_masks): Likewise.
* config/rs6000/rs6000.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/rs6000.md (cpu attribute): Add future.
* config/rs6000/rs6000.opt (-mpower11): Add internal future ISA flag.
* doc/invoke.texi (RS/6000 and PowerPC Options): Document -mcpu=future.

Revert all changes