This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [cfg-branch] improved Software Trace Cache


> > Hi,
> > 
> > I have improved STC. Now, when it selects the best successor of duplicated BB it ignores the edges to the BBs that are already in the trace. So it creates better traces and duplicates less BBs.
> 
> If you ignore successors already covered by same trace you mean that you prohibit loop unrolling?

The first version would stop the trace on the last BB of the loop, this version continues with the trace after the loop. Neither version does loop unrolling.

> What the code will do on the cfg with loop:
> 
> 1->3
> 2->3
> 3->4
> 3->2
> 
> Frequencies:
> 1 [100], 2 [900], 3 [1000], 4 [100]

It will create this trace: 1, 3, 2, 3', 4

> > The duplication in the last round (there are 4 rounds with different thresholds) is done only if the code does not grow. This eliminates the majority of duplications.
> > I have found out that this version duplicates about 5% of BBs on gcc sources (the first version duplicated about 24%). The slowdown on large binaries (gcc, perl) should be none or lower than the first version of STC.
> > 
> > I added 2 elements ('duplicated' and 'original') to struct reorder_block_def. duplicated is a flag whether the BB was duplicated (it is set in the original BB), and original is the original BB (it is set in the child BB).
> > I guess Honza <jh@suse.cz> will not be happy about this, but I think this is the best solution because I need this information for each BB and the number of BBs grows.
> 
> Hmm, you are quite right that I don't like that particulary.
> The original block pointer makes sense in reorder_block_def, as we may use
> it to improve debug output of cfg_layout_finalize.
> (right now it claims about duplicated blocks to be compensation code).
> For now lets keep the duplicated in the structure too until we find better way
> how to get anotations to blocks.

OK

> > *** gcc-old/gcc/Makefile.in	Fri Nov 30 21:57:40 2001
> > !       do
> > !         {
> > ! 	  /* If the probability of an edge is between prob_lower and
> > ! 	     prob_higher the edge is considered to be as probable as the
> > ! 	     temporary best edge.  Similar for frequencies of successors.  */
> > !           int prob_lower = -INT_MAX, prob_higher = -INT_MAX;
> > !           int freq_lower = -INT_MAX, freq_higher = -INT_MAX;
> > ! 	  int scale, prob;
> 
> You do have some formating problems here :)
> Even tabs should be Coding Standard compliant :(

I'll check the tabs.

> > !   for (e = bb->pred; e; e = e->pred_next)
> > !     if (!(e->flags & EDGE_DFS_BACK) && !RBI (e->src)->visited)
> > !       return -bb->frequency;
> > ! 
> > !   /* All edges to predecessors of BB are DFS back edges or the predecessors
> > !      of BB are visited.  I want such basic blocks first.  */
> > !   return -100 * BB_FREQ_MAX - bb->frequency;
> > ! }
> 
> What is exactly purpose for this?

When there are more than one BBs in the heap I want BBs, which have all their predecessors visited or the predecessors have DFS_BACK edge to this BB, first. This should make better traces.

For example:
1->2	(90%)                       1
2->3	(90%)                      / \
3->4	(100%)                    2   5
1->5	(10%)                    / \ /  
5->6  	(100%)                  3   6    
2->6	(10%)                    \ /    
6->4	(100%)                    4      

Frequencies:
1 [1000], 2 [900], 3 [800], 4 [1000], 5 [100], 6 [200]

If there were just return -bb->frequency; in the function, it would make 3 traces:
1 -> 2 -> 3 -> 4
6 (and maybe 4')
5

With this function it makes 2 traces, because it selects BB 5 first (all its predecessors are already visited):
1 -> 2 -> 3 -> 4
5 -> 6 (and maybe 4')

> > !   while (size < 8 * uncond_jump_length)
> > !     {
> > !       edge e, best_edge = NULL;
> > !       int prob_lower = -INT_MAX, prob_higher = -INT_MAX;
> > !       int freq_lower = -INT_MAX, freq_higher = -INT_MAX;
> > ! 
> > !       if (RBI (bb)->original && RBI (RBI (bb)->original)->duplicated == trace)
> > !         return false;
> > !       if (!cfg_layout_can_duplicate_bb_p (bb))
> > !         return false;
> > !       for (insn = bb->head; insn != NEXT_INSN (bb->end);
> > ! 	   insn = NEXT_INSN (insn))
> >   	{
> > ! 	  if (INSN_P (insn))
> > ! 	    size += get_attr_length (insn);
> >   	}
> 
> Looking at the code, the size < 8 * uncond_jump_length needs to be placed
> somewhere here to avoid accepting of arbitary huge blocks.

I thought we could duplicate large BB too if it is worth. It seems not to be a good idea because the code could grow too much.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]