Compiling module_domain.fppized.f90 from 521.wrf with -Ofast -march=haswell reveals machine dep reorg : 89.07 ( 95%) 0.02 ( 18%) 89.10 ( 95%) 54 kB ( 0%) TOTAL : 95.13 0.13 95.32 103803 kB perf reports (checking is enabled but -fno-checking supplied): 83.74% f951 f951 [.] (anonymous namespace)::scalar_chain::analyze_register_chain 5.80% f951 f951 [.] bitmap_bit_p 5.13% f951 f951 [.] (anonymous namespace)::general_scalar_chain::mark_dual_mode_def which is exactly what I noticed when reviewing the pass functionality: /* ??? The following is quadratic since analyze_register_chain iterates over all refs to look for dual-mode regs. Instead this should be done separately for all regs mentioned in the chain once. */
I will have a look.
So in particular for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref)) if (!HARD_REGISTER_P (DF_REF_REG (ref))) for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref)); def; def = DF_REF_NEXT_REG (def)) analyze_register_chain (candidates, ref); looks odd since that iterates over all defs in insn_uid, then over all defs of that register everywhere else in the function and analyze_register_chain then iterating over the corresponding def->use chain. I'd say the iteration over all defs everywhere else is not necessary and it should be simply Index: config/i386/i386-features.c =================================================================== --- config/i386/i386-features.c (revision 274764) +++ config/i386/i386-features.c (working copy) @@ -419,10 +419,7 @@ scalar_chain::add_insn (bitmap candidate df_ref def; for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref)) if (!HARD_REGISTER_P (DF_REF_REG (ref))) - for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref)); - def; - def = DF_REF_NEXT_REG (def)) - analyze_register_chain (candidates, def); + analyze_register_chain (candidates, ref); for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref)) if (!DF_REF_REG_MEM_P (ref)) analyze_register_chain (candidates, ref); which fixes the slowness. I'm going to test that.
Uh, all other DF_REG_REG_CHAIN uses need to be updated as well I guess, how we convert defs and uses seems to be a slight mess :/ I'm going to try to rewrite this part to for insn in insns for def in insn if def in defs_conv replace def in insn with subreg of new pseudo from new defs-map emit copy from new def to original scalar def else replace def with subreg for use in insn if use in defs_conv & ~defs replace use in insn with subreg of new pseudo from new defs-map emit copy from old scalar def to new def else replace use with subreg [of new pseudo from new defs-map] that should also support chains where not all defs of a pseudo are part of the chain. Since we have ud and du chains we can use those more.
Author: rguenth Date: Mon Aug 26 10:35:59 2019 New Revision: 274926 URL: https://gcc.gnu.org/viewcvs?rev=274926&root=gcc&view=rev Log: 2019-08-26 Richard Biener <rguenther@suse.de> PR target/91522 PR target/91527 * config/i386/i386-features.h (general_scalar_chain::defs_map): New member. (general_scalar_chain::replace_with_subreg): Remove. (general_scalar_chain::replace_with_subreg_in_insn): Likewise. (general_scalar_chain::convert_reg): Adjust signature. * config/i386/i386-features.c (scalar_chain::add_insn): Do not iterate over all defs of a reg. (general_scalar_chain::replace_with_subreg): Remove. (general_scalar_chain::replace_with_subreg_in_insn): Likewise. (general_scalar_chain::make_vector_copies): Populate defs_map, place copy only after defs that are used as vectors in the chain. (general_scalar_chain::convert_reg): Emit a copy for a specific def in a specific instruction. (general_scalar_chain::convert_op): All reg uses are converted here. (general_scalar_chain::convert_insn): Emit copies for scalar uses of defs here. Replace uses with the copies we created. Replace and convert the def. Adjust REG_DEAD notes, remove REG_EQUIV/EQUAL notes. (general_scalar_chain::convert_registers): Only handle copies into the chain here. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386-features.c trunk/gcc/config/i386/i386-features.h
Fixed.