This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Speedup genautomata.c


Hi,

besides from genattrtab (on which I'm planning to send an updated speedup
patch soon) also genautomata is a major serialization point in a parallel
build, on ia64 (and only there).  Basically all the time there is spend in
minimizing the resulting DFAs, like so:

 transformation: 0.276958, building NDFA: 7.479862, NDFA -> DFA: 14.161846
 DFA minimization: 184.686922, making insn equivalence: 0.009999
 all automaton generation: 206.529603, output: 2.788576

Ouch.  Some gprofing and thinking reveals that the huge number of calls to
state_is_differed and it's relative slowness are the reason for this.  
The insight to fix this is to see that the presence set for the first
cycle for each state (which is one part of the equivalence relation which
the process tries to work out) does not change during splitting the
equivalence classes.  Hence it can be cached before starting to build the 
equivalence classes (the new presence_signature member).

Initially I was just going to build a hash value from that presence set in
order to just speedup the non-equalness case in state_is_differed, 
which was already cool:

 transformation: 0.285956, building NDFA: 7.449868, NDFA -> DFA: 14.145851
 DFA minimization: 10.024475, making insn equivalence: 0.009998
 all automaton generation: 31.821163, output: 2.779577

but then I saw that in fact the set is not that large and can be just
memoized completely in the state.  That created the opportunity to do even
better, namely to sort all states beforehand on that presence set and
create a better initial approximation of the equivalence classes (only
those with the same presence set can be in one eq-class at all, which can
be found out in linear time when going over a sorted set).  As splitting
of eq-classes is quadratic in the number of elements at worst this speeds
up the worst case quite some more, like so:

 transformation: 0.280956, building NDFA: 7.459866, NDFA -> DFA: 14.270830
 DFA minimization: 3.393484, making insn equivalence: 0.012999
 all automaton generation: 25.329150, output: 2.802574

Also the number of out arcs (another criteria for the equivalence) doesn't 
change, or better said, can be trivially tracked in the add_ and 
remove_arc code, so that's another sort criteria.

The patch below implements the above.  It was bootstrapped and regtested 
on i686,x86_64,ppc,ppc64,s390,s390x,ia64 .  Furthermore I built 
cross-compilers on
  alpha, arm, hppa, i386, ia64, mips, ppc, s390, sh, sparc, x86_64
to look at the automatas actually built, and see for changes in compiling 
a big code test, and the kdecore library (with -O2).  No changes in 
assembler occured.  Also the automates itself didn't change when the 
sorting step is not done.  When that is active only the automates for ia64 
(slightly) and for ppc (a bit more) changed at all.

Okay for trunk?


Ciao,
Michael.

	* genautomata.c (<struct state>, num_out_arcs, presence_signature):
	New members.
	(remove_arc, add_arc): Update num_out_arcs member.
	(set_out_arc_insns_equiv_num): Returns instead of number of out arcs.
	(cache_presence): New function.
	(compare_states_for_equiv): New function.
	(state_is_differed): Don't take number of outargs, adjust callers.
	Use new invariant for speeding up.
	(init_equiv_class): Create initial classes based on sorted
	input.
	(partition_equiv_class): Don't track out_arcs_num.
	(evaluate_equiv_classes): Call cache_presence on all states and
	sort them.

Index: genautomata.c
===================================================================
*** genautomata.c	(revision 111989)
--- genautomata.c	(working copy)
*************** struct state
*** 677,682 ****
--- 677,683 ----
    /* The following field value is the first arc output from given
       state.  */
    arc_t first_out_arc;
+   unsigned int num_out_arcs;
    /* The following field is used to form NDFA.  */
    char it_was_placed_in_stack_for_NDFA_forming;
    /* The following field is used to form DFA.  */
*************** struct state
*** 700,705 ****
--- 701,707 ----
       automaton.  The field value is state corresponding to equivalence
       class to which given state belongs.  */
    state_t equiv_class_state;
+   unsigned int *presence_signature;
    /* The following field value is the order number of given state.
       The states in final DFA is enumerated with the aid of the
       following field.  */
*************** remove_arc (state_t from_state, arc_t ar
*** 3862,3867 ****
--- 3864,3870 ----
      from_state->first_out_arc = arc->next_out_arc;
    else
      prev_arc->next_out_arc = arc->next_out_arc;
+   from_state->num_out_arcs--;
    free_arc (arc);
  }
  
*************** add_arc (state_t from_state, state_t to_
*** 3908,3913 ****
--- 3911,3917 ----
    ainsn->arc_exists_p = 1;
    new_arc->next_out_arc = from_state->first_out_arc;
    from_state->first_out_arc = new_arc;
+   from_state->num_out_arcs++;
    new_arc->next_arc_marked_by_insn = NULL;
    return new_arc;
  }
*************** add_achieved_state (state_t state)
*** 5614,5637 ****
     out arcs of STATE by equiv_class_num_1 (if ODD_ITERATION_FLAG has
     nonzero value) or by equiv_class_num_2 of the destination state.
     The function returns number of out arcs of STATE.  */
! static int
  set_out_arc_insns_equiv_num (state_t state, int odd_iteration_flag)
  {
-   int state_out_arcs_num;
    arc_t arc;
  
-   state_out_arcs_num = 0;
    for (arc = first_out_arc (state); arc != NULL; arc = next_out_arc (arc))
      {
        gcc_assert (!arc->insn->insn_reserv_decl->equiv_class_num);
-       state_out_arcs_num++;
        arc->insn->insn_reserv_decl->equiv_class_num
  	= (odd_iteration_flag
             ? arc->to_state->equiv_class_num_1
  	   : arc->to_state->equiv_class_num_2);
        gcc_assert (arc->insn->insn_reserv_decl->equiv_class_num);
      }
-   return state_out_arcs_num;
  }
  
  /* The function clears equivalence numbers and alt_states in all insns
--- 5618,5637 ----
     out arcs of STATE by equiv_class_num_1 (if ODD_ITERATION_FLAG has
     nonzero value) or by equiv_class_num_2 of the destination state.
     The function returns number of out arcs of STATE.  */
! static void
  set_out_arc_insns_equiv_num (state_t state, int odd_iteration_flag)
  {
    arc_t arc;
  
    for (arc = first_out_arc (state); arc != NULL; arc = next_out_arc (arc))
      {
        gcc_assert (!arc->insn->insn_reserv_decl->equiv_class_num);
        arc->insn->insn_reserv_decl->equiv_class_num
  	= (odd_iteration_flag
             ? arc->to_state->equiv_class_num_1
  	   : arc->to_state->equiv_class_num_2);
        gcc_assert (arc->insn->insn_reserv_decl->equiv_class_num);
      }
  }
  
  /* The function clears equivalence numbers and alt_states in all insns
*************** first_cycle_unit_presence (state_t state
*** 5666,5671 ****
--- 5666,5691 ----
    return false;
  }
  
+ /* This fills in the presence_signature[] member of STATE.  */
+ static void
+ cache_presence (state_t state)
+ {
+   int i, num = 0;
+   unsigned int sz;
+   sz = (description->query_units_num + sizeof (int) * CHAR_BIT - 1)
+         / (sizeof (int) * CHAR_BIT);
+   
+   state->presence_signature = create_node (sz * sizeof (int));
+   for (i = 0; i < description->units_num; i++)
+     if (units_array [i]->query_p)
+       {
+ 	int presence1_p = first_cycle_unit_presence (state, i);
+ 	state->presence_signature[num / (sizeof (int) * CHAR_BIT)]
+ 	  |= (!!presence1_p) << (num % (sizeof (int) * CHAR_BIT));
+ 	num++;
+       }
+ }
+ 
  /* The function returns nonzero value if STATE is not equivalent to
     ANOTHER_STATE from the same current partition on equivalence
     classes.  Another state has ANOTHER_STATE_OUT_ARCS_NUM number of
*************** first_cycle_unit_presence (state_t state
*** 5673,5724 ****
     by ODD_ITERATION_FLAG.  */
  static int
  state_is_differed (state_t state, state_t another_state,
! 		   int another_state_out_arcs_num, int odd_iteration_flag)
  {
    arc_t arc;
!   int state_out_arcs_num;
!   int i, presence1_p, presence2_p;
  
-   state_out_arcs_num = 0;
    for (arc = first_out_arc (state); arc != NULL; arc = next_out_arc (arc))
      {
-       state_out_arcs_num++;
        if ((odd_iteration_flag
             ? arc->to_state->equiv_class_num_1
  	   : arc->to_state->equiv_class_num_2)
            != arc->insn->insn_reserv_decl->equiv_class_num)
          return 1;
      }
!   if (state_out_arcs_num != another_state_out_arcs_num)
      return 1;
!   /* Now we are looking at the states with the point of view of query
!      units.  */
!   for (i = 0; i < description->units_num; i++)
!     if (units_array [i]->query_p)
!       {
! 	presence1_p = first_cycle_unit_presence (state, i);
! 	presence2_p = first_cycle_unit_presence (another_state, i);
! 	if ((presence1_p && !presence2_p) || (!presence1_p && presence2_p))
! 	  return 1;
!       }
    return 0;
  }
  
  /* The function makes initial partition of STATES on equivalent
!    classes.  */
! static state_t
! init_equiv_class (VEC(state_t,heap) *states)
  {
    size_t i;
    state_t prev = 0;
  
    for (i = 0; i < VEC_length (state_t, states); i++)
      {
!       VEC_index (state_t, states, i)->equiv_class_num_1 = 1;
!       VEC_index (state_t, states, i)->next_equiv_class_state = prev;
!       prev = VEC_index (state_t, states, i);
!     }
!   return prev;
  }
  
  /* The function copies pointers to equivalent states from vla FROM
--- 5693,5781 ----
     by ODD_ITERATION_FLAG.  */
  static int
  state_is_differed (state_t state, state_t another_state,
! 		   int odd_iteration_flag)
  {
    arc_t arc;
!   unsigned int sz, si;
! 
!   gcc_assert (state->num_out_arcs == another_state->num_out_arcs);
! 
!   sz = (description->query_units_num + sizeof (int) * CHAR_BIT - 1)
! 	/ (sizeof (int) * CHAR_BIT);
! 
!   for (si = 0; si < sz; si++)
!     gcc_assert (state->presence_signature[si]
! 		== another_state->presence_signature[si]);
  
    for (arc = first_out_arc (state); arc != NULL; arc = next_out_arc (arc))
      {
        if ((odd_iteration_flag
             ? arc->to_state->equiv_class_num_1
  	   : arc->to_state->equiv_class_num_2)
            != arc->insn->insn_reserv_decl->equiv_class_num)
          return 1;
      }
! 
!   return 0;
! }
! 
! /* Compares two states pointed to by STATE_PTR_1 and STATE_PTR_2
!    and return -1, 0 or 1.  This function can be used as predicate for
!    qsort().  It requires the member presence_signature[] of both
!    states be filled.  */
! static int
! compare_states_for_equiv (const void *state_ptr_1,
! 			  const void *state_ptr_2)
! {
!   state_t s1 = *(state_t *)state_ptr_1;
!   state_t s2 = *(state_t *)state_ptr_2;
!   unsigned int sz, si;
!   if (s1->num_out_arcs < s2->num_out_arcs)
!     return -1;
!   else if (s1->num_out_arcs > s2->num_out_arcs)
      return 1;
! 
!   sz = (description->query_units_num + sizeof (int) * CHAR_BIT - 1)
! 	/ (sizeof (int) * CHAR_BIT);
! 
!   for (si = 0; si < sz; si++)
!     if (s1->presence_signature[si] < s2->presence_signature[si])
!       return -1;
!     else if (s1->presence_signature[si] > s2->presence_signature[si])
!       return 1;
    return 0;
  }
  
  /* The function makes initial partition of STATES on equivalent
!    classes and saves it into *CLASSES.  This function requires the input
!    to be sorted via compare_states_for_equiv().  */
! static int
! init_equiv_class (VEC(state_t,heap) *states, VEC (state_t,heap) **classes)
  {
    size_t i;
    state_t prev = 0;
+   int class_num = 1;
  
+   *classes = VEC_alloc (state_t,heap, 150);
    for (i = 0; i < VEC_length (state_t, states); i++)
      {
!       state_t state = VEC_index (state_t, states, i);
!       if (prev)
!         {
! 	  if (compare_states_for_equiv (&prev, &state) != 0)
! 	    {
! 	      VEC_safe_push (state_t,heap, *classes, prev);
! 	      class_num++;
! 	      prev = NULL;
! 	    }
!         }
!       state->equiv_class_num_1 = class_num;
!       state->next_equiv_class_state = prev;
!       prev = state;
!     }
!   if (prev)
!     VEC_safe_push (state_t,heap, *classes, prev);
!   return class_num;
  }
  
  /* The function copies pointers to equivalent states from vla FROM
*************** copy_equiv_class (VEC(state_t,heap) **to
*** 5729,5734 ****
--- 5786,5792 ----
    VEC_free (state_t,heap, *to);
    *to = VEC_copy (state_t,heap, from);
  }
+ 
  /* The function processes equivalence class given by its first state,
     FIRST_STATE, on odd iteration if ODD_ITERATION_FLAG.  If there
     are not equivalent states, the function partitions the class
*************** partition_equiv_class (state_t first_sta
*** 5746,5752 ****
    state_t curr_state;
    state_t prev_state;
    state_t next_state;
-   int out_arcs_num;
  
    partition_p = 0;
  
--- 5804,5809 ----
*************** partition_equiv_class (state_t first_sta
*** 5756,5770 ****
        if (first_state->next_equiv_class_state != NULL)
  	{
  	  /* There are more one states in the class equivalence.  */
! 	  out_arcs_num = set_out_arc_insns_equiv_num (first_state,
! 						      odd_iteration_flag);
  	  for (prev_state = first_state,
  		 curr_state = first_state->next_equiv_class_state;
  	       curr_state != NULL;
  	       curr_state = next_state)
  	    {
  	      next_state = curr_state->next_equiv_class_state;
! 	      if (state_is_differed (curr_state, first_state, out_arcs_num,
  				     odd_iteration_flag))
  		{
  		  /* Remove curr state from the class equivalence.  */
--- 5813,5826 ----
        if (first_state->next_equiv_class_state != NULL)
  	{
  	  /* There are more one states in the class equivalence.  */
! 	  set_out_arc_insns_equiv_num (first_state, odd_iteration_flag);
  	  for (prev_state = first_state,
  		 curr_state = first_state->next_equiv_class_state;
  	       curr_state != NULL;
  	       curr_state = next_state)
  	    {
  	      next_state = curr_state->next_equiv_class_state;
! 	      if (state_is_differed (curr_state, first_state, 
  				     odd_iteration_flag))
  		{
  		  /* Remove curr state from the class equivalence.  */
*************** static void
*** 5797,5803 ****
  evaluate_equiv_classes (automaton_t automaton,
  			VEC(state_t,heap) **equiv_classes)
  {
-   state_t new_equiv_class;
    int new_equiv_class_num;
    int odd_iteration_flag;
    int finish_flag;
--- 5853,5858 ----
*************** evaluate_equiv_classes (automaton_t auto
*** 5806,5817 ****
  
    all_achieved_states = VEC_alloc (state_t,heap, 1500);
    pass_states (automaton, add_achieved_state);
!   new_equiv_class = init_equiv_class (all_achieved_states);
!   odd_iteration_flag = 0;
!   new_equiv_class_num = 1;
  
!   next_iteration_classes = VEC_alloc (state_t,heap, 150);
!   VEC_quick_push (state_t, next_iteration_classes, new_equiv_class);
  
    do
      {
--- 5861,5874 ----
  
    all_achieved_states = VEC_alloc (state_t,heap, 1500);
    pass_states (automaton, add_achieved_state);
!   pass_states (automaton, cache_presence);
!   qsort (VEC_address (state_t, all_achieved_states),
! 	 VEC_length (state_t, all_achieved_states),
!          sizeof (state_t), compare_states_for_equiv);
  
!   odd_iteration_flag = 0;
!   new_equiv_class_num = init_equiv_class (all_achieved_states,
!   					  &next_iteration_classes);
  
    do
      {


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]