This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

CFG merge part 7 - superblock/trace scheduling


Hi,
This patch adds option to enable superblock scheduling (aka EBB scheduler) for
non-IA-64 targets as well as "trace scheduling" (aka EBB scheduler preceeded by
the trace formation).
My experience with this feature are bit mixed - by default it causes 2%
regression on Athlon.  By significantly improving MD description I was able to
get about 0.3%/0.8% speedup on Specint/Specfp, that is about the 1/3rd of
effect of scheduling overall on this target.  I hope I will have time to
cleanup and send this part soon too as well.  The improvements are noticeable
in local scheduler as well, but since the basic blocks are small they are much
less visible.

I did also limited testing un ultrasparc2 with byte benchmark, and the effect
was about the similar for the "FP" part of benchmark and nothing measurable for
integer part.

I would expect the benefits to be better on in-order wide issue CPUs, but I
have none I can test on.  I think we can add it as an experimental option and
see what experience we will get.  The patch is small enought.  In case we
decide to enable it by default, we probably should solve better the way of
updating liveness after the scheduling (I think we can just locally propagate
each trace scheduled oring in the live registers on outgoing edges), but at the
moment I think it is not critical.

It should be also mentioned that the trace scheudling can be
singnificantly strenghtened by adding support for compensation code, but
I didn't intended to do that as I hope we will get more sophisticated
algorithm in some future.  My primary interest has been to get the CFG
part working.

Also I have bad experimence with enabling this pre-reload.  The pass
increases register lifetimes so even 32regs machines start to have
problems.

Boostrapped/regtested with -fsched2-use-traces on i386.  OK?

Honza
Fri Feb  7 12:43:56 CET 2003  Jan Hubicka  <jh@suse.cz>
	* toplev.c (flag_sched2_use_superblocks, flag_sched2_use_traces):  New global variables.
	(lang_independent_options):  Add -fsched2-use-superblocks -fsced2-use-traces.
	(rest_of_compilation): Deal with it.
	* invoke.texi (-fsched2-use-traces, fsched2-use-superblocks):  Declare.
Index: toplev.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/toplev.c,v
retrieving revision 1.705
diff -c -3 -p -r1.705 toplev.c
*** toplev.c	6 Feb 2003 01:47:55 -0000	1.705
--- toplev.c	7 Feb 2003 11:43:35 -0000
*************** int flag_pedantic_errors = 0;
*** 739,744 ****
--- 745,757 ----
  int flag_schedule_insns = 0;
  int flag_schedule_insns_after_reload = 0;
  
+ /* When flag_schedule_insns_after_reload is set, use EBB scheduler.  */
+ int flag_sched2_use_superblocks = 0;
+ 
+ /* When flag_schedule_insns_after_reload is set, construct traces and EBB
+    scheduler.  */
+ int flag_sched2_use_traces = 0;
+ 
  /* The following flags have effect only for scheduling before register
     allocation:
  
*************** static const lang_independent_options f_
*** 1065,1070 ****
--- 1080,1089 ----
     N_("Allow speculative motion of some loads") },
    {"sched-spec-load-dangerous",&flag_schedule_speculative_load_dangerous, 1,
     N_("Allow speculative motion of more loads") },
+   {"sched2-use-superblocks", &flag_sched2_use_superblocks, 1,
+    N_("If scheduling post reload, do superblock sheduling") },
+   {"sched2-use-traces", &flag_sched2_use_traces, 1,
+    N_("If scheduling post reload, do trace sheduling") },
    {"branch-count-reg",&flag_branch_on_count_reg, 1,
     N_("Replace add,compare,branch with branch on count reg") },
    {"pic", &flag_pic, 1,
*************** rest_of_compilation (decl)
*** 3463,3469 ****
  
        split_all_insns (1);
  
!       schedule_insns (rtl_dump_file);
  
        close_dump_file (DFI_sched2, print_rtl_with_bb, insns);
        timevar_pop (TV_SCHED2);
--- 3495,3516 ----
  
        split_all_insns (1);
  
!       if (flag_sched2_use_superblocks || flag_sched2_use_traces)
! 	{
!  	  if (!flag_sched2_use_traces)
! 	    reorder_basic_blocks ();
! 	  else
! 	    tracer ();
! 	  cleanup_cfg (CLEANUP_EXPENSIVE | CLEANUP_UPDATE_LIFE);
! 	  schedule_ebbs (rtl_dump_file);
! 	  /* No liveness updating code yet, but it should be easy to do  */
! 	  count_or_remove_death_notes (NULL, 1);
! 	  life_analysis (get_insns (), rtl_dump_file, PROP_DEATH_NOTES);
! 	  cleanup_cfg (CLEANUP_EXPENSIVE | CLEANUP_UPDATE_LIFE
! 		       | (flag_crossjumping ? CLEANUP_CROSSJUMP : 0));
! 	}
!       else
!         schedule_insns (rtl_dump_file);
  
        close_dump_file (DFI_sched2, print_rtl_with_bb, insns);
        timevar_pop (TV_SCHED2);
Index: doc/invoke.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/invoke.texi,v
retrieving revision 1.234
diff -c -3 -p -r1.234 invoke.texi
*** doc/invoke.texi	4 Feb 2003 01:27:46 -0000	1.234
--- doc/invoke.texi	7 Feb 2003 11:43:38 -0000
*************** in the following sections.
*** 287,296 ****
  -frerun-cse-after-loop  -frerun-loop-opt @gol
  -fschedule-insns  -fschedule-insns2 @gol
  -fno-sched-interblock  -fno-sched-spec  -fsched-spec-load @gol
! -fsched-spec-load-dangerous  -fsignaling-nans @gol
  -fsingle-precision-constant  -fssa -fssa-ccp -fssa-dce @gol
  -fstrength-reduce  -fstrict-aliasing  -ftracer -fthread-jumps @gol
  -funroll-all-loops  -funroll-loops  @gol
  --param @var{name}=@var{value}
  -O  -O0  -O1  -O2  -O3  -Os}
  
--- 287,297 ----
  -frerun-cse-after-loop  -frerun-loop-opt @gol
  -fschedule-insns  -fschedule-insns2 @gol
  -fno-sched-interblock  -fno-sched-spec  -fsched-spec-load @gol
! -fsched-spec-load-dangerous  -fsched2-use-superblocks @gol
! -fsched2-use-traces  -fsignaling-nans @gol
  -fsingle-precision-constant  -fssa -fssa-ccp -fssa-dce @gol
  -fstrength-reduce  -fstrict-aliasing  -ftracer -fthread-jumps @gol
  -funroll-all-loops  -funroll-loops  @gol
  --param @var{name}=@var{value}
  -O  -O0  -O1  -O2  -O3  -Os}
  
*************** Allow speculative motion of more load in
*** 3890,3895 ****
--- 3894,3923 ----
  sense when scheduling before register allocation, i.e.@: with
  @option{-fschedule-insns} or at @option{-O2} or higher.
  
+ @item -fsched2-use-superblocks
+ @opindex fsched2-use-superblocks
+ When schedulilng after register allocation, do use superblock scheduling
+ algorithm.  Superblock scheduling allows motion acress basic block boundaries
+ resulting on faster schedules.  This option is experimental, as not all machine
+ descriptions used by GCC model the CPU closely enought to avoid unreliable
+ results from the algorithm. 
+ 
+ This only makes sense when scheduling after register allocation, i.e.@: with
+ @option{-fschedule-insns2} or at @option{-O2} or higher.
+ 
+ @item -fsched2-use-traces
+ @opindex fsched2-use-traces
+ Use @option{-fsched2-use-superblocks} algorithm when scheduling after register
+ allocation and additionally perform code duplication in order to increase the
+ size of superblocks using tracer pass.  See @option{-ftracer} for details on
+ trace formation.
+ 
+ This mode should produce faster but singificantly longer programs.  Also
+ without @code{-fbranch-probabilities} the traces constructed may not match the
+ reality and hurt the performance.  This only makes
+ sense when scheduling after register allocation, i.e.@: with
+ @option{-fschedule-insns2} or at @option{-O2} or higher.
+ 
  @item -fcaller-saves
  @opindex fcaller-saves
  Enable values to be allocated in registers that will be clobbered by


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]