Bug 91522 - [10 Regression] STV is slow
Summary: [10 Regression] STV is slow
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 10.0
: P3 normal
Target Milestone: 10.0
Assignee: Richard Biener
URL:
Keywords: compile-time-hog
Depends on:
Blocks:
 
Reported: 2019-08-22 12:41 UTC by Richard Biener
Modified: 2019-08-26 10:40 UTC (History)
1 user (show)

See Also:
Host:
Target: x86_64-*-* i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2019-08-22 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2019-08-22 12:41:29 UTC
Compiling module_domain.fppized.f90 from 521.wrf with -Ofast -march=haswell
reveals

 machine dep reorg                  :  89.07 ( 95%)   0.02 ( 18%)  89.10 ( 95%)      54 kB (  0%)
 TOTAL                              :  95.13          0.13         95.32         103803 kB

perf reports (checking is enabled but -fno-checking supplied):

  83.74%  f951      f951                                     [.] (anonymous namespace)::scalar_chain::analyze_register_chain
   5.80%  f951      f951                                     [.] bitmap_bit_p                                                   
   5.13%  f951      f951                                     [.] (anonymous namespace)::general_scalar_chain::mark_dual_mode_def

which is exactly what I noticed when reviewing the pass functionality:

  /* ???  The following is quadratic since analyze_register_chain
     iterates over all refs to look for dual-mode regs.  Instead this
     should be done separately for all regs mentioned in the chain once.  */
Comment 1 Richard Biener 2019-08-22 12:41:57 UTC
I will have a look.
Comment 2 Richard Biener 2019-08-22 13:31:47 UTC
So in particular

  for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
    if (!HARD_REGISTER_P (DF_REF_REG (ref)))
      for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref));
           def;
           def = DF_REF_NEXT_REG (def))
        analyze_register_chain (candidates, ref);

looks odd since that iterates over all defs in insn_uid, then over
all defs of that register everywhere else in the function and
analyze_register_chain then iterating over the corresponding def->use
chain.  I'd say the iteration over all defs everywhere else is
not necessary and it should be simply

Index: config/i386/i386-features.c
===================================================================
--- config/i386/i386-features.c (revision 274764)
+++ config/i386/i386-features.c (working copy)
@@ -419,10 +419,7 @@ scalar_chain::add_insn (bitmap candidate
   df_ref def;
   for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
     if (!HARD_REGISTER_P (DF_REF_REG (ref)))
-      for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref));
-          def;
-          def = DF_REF_NEXT_REG (def))
-       analyze_register_chain (candidates, def);
+      analyze_register_chain (candidates, ref);
   for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
     if (!DF_REF_REG_MEM_P (ref))
       analyze_register_chain (candidates, ref);

which fixes the slowness.  I'm going to test that.
Comment 3 Richard Biener 2019-08-22 16:20:54 UTC
Uh, all other DF_REG_REG_CHAIN uses need to be updated as well I guess, how
we convert defs and uses seems to be a slight mess :/  I'm going to try
to rewrite this part to

 for insn in insns
   for def in insn
     if def in defs_conv
       replace def in insn with subreg of new pseudo from new defs-map
       emit copy from new def to original scalar def
     else
       replace def with subreg
   for use in insn
     if use in defs_conv & ~defs
       replace use in insn with subreg of new pseudo from new defs-map
       emit copy from old scalar def to new def
     else
       replace use with subreg [of new pseudo from new defs-map]

that should also support chains where not all defs of a pseudo are part
of the chain.  Since we have ud and du chains we can use those more.
Comment 4 Richard Biener 2019-08-26 10:36:30 UTC
Author: rguenth
Date: Mon Aug 26 10:35:59 2019
New Revision: 274926

URL: https://gcc.gnu.org/viewcvs?rev=274926&root=gcc&view=rev
Log:
2019-08-26  Richard Biener  <rguenther@suse.de>

	PR target/91522
	PR target/91527
	* config/i386/i386-features.h (general_scalar_chain::defs_map):
	New member.
	(general_scalar_chain::replace_with_subreg): Remove.
	(general_scalar_chain::replace_with_subreg_in_insn): Likewise.
	(general_scalar_chain::convert_reg): Adjust signature.
	* config/i386/i386-features.c (scalar_chain::add_insn): Do not
	iterate over all defs of a reg.
	(general_scalar_chain::replace_with_subreg): Remove.
	(general_scalar_chain::replace_with_subreg_in_insn): Likewise.
	(general_scalar_chain::make_vector_copies): Populate defs_map,
	place copy only after defs that are used as vectors in the chain.
	(general_scalar_chain::convert_reg): Emit a copy for a specific
	def in a specific instruction.
	(general_scalar_chain::convert_op): All reg uses are converted here.
	(general_scalar_chain::convert_insn): Emit copies for scalar
	uses of defs here.  Replace uses with the copies we created.
	Replace and convert the def.  Adjust REG_DEAD notes, remove
	REG_EQUIV/EQUAL notes.
	(general_scalar_chain::convert_registers): Only handle copies
	into the chain here.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386-features.c
    trunk/gcc/config/i386/i386-features.h
Comment 5 Richard Biener 2019-08-26 10:40:21 UTC
Fixed.