This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] speed up compute_store_table


While compute_store_table isn't the slowest function in the compiler, 
analysis with valgrind's cachegrind skin does show that when compiling 
combine.c we execute more cycles in that function than in any other by a 
factor of nearly two (the only reason this doesn't blow the compiler away 
completely seems to be that most of those instructions hit the cache).  
However, it turns out that most of those cycles can be eliminated:

The biggest sink is the scan over the last_set_in array to kill off the 
marks that say that the current insn contains the last update of a 
particular register.  This array is of size max_gcse_insn, and we scan 
this table after processing every insn in the function.  Assuming that the 
number of insns in a function is approximately proportional to the number 
of pseudos it uses (it's likely to be some small factor larger than this) 
then that means we have a loop here with is O(N^2) on the number of 
pseudos.  Yikes!  Fortunately, we don't have to scan the table.  We can 
use note_stores again to clear things up, in exactly the same way as we 
used it to set them up in the first place.  In the patch below, 
reg_clear_last_set is the function that will do this.

The second biggest sink is the pass over last_set_in to transfer the 
information we've just extracted for the current BB and put it in the 
reg_set_in_block bitmap.  We only do this once per basic block, but it's 
still quite expensive.  We can combine this setting into the original scan 
(we now pass reg_set_in_block[bb->index] as the data parameter to 
set_reg_info, which will set the appropriate bit if this pointer is 
non-null).

Which finally leaves us with a memset of last_set_in at the start of 
processing each BB.  A few moments thought shows us that this set should 
be redundant after the first BB: this function does not alter the insn 
chain, so if we are clearing out all the setters of this array as we scan 
the insns for the block a second time, then it must be the case that the 
array contains all zero once we've finished.  It is therefore safe to 
allocate last_set_in with xcalloc for the first pass, and then never 
explicitly clear it again afterwards.  For safety's sake, I've put a check 
in (wrapped with ENABLE_CHECKING) that asserts this identity condition.

Anyway, the results of all this are that valgrind is showing that we now 
execute 96% fewer instructions in this function than we were doing before. 
 It's not quite as simple as that, since there is a small increase in cost 
for reg_set_info, reg_clear_last_set and note_stores, but the overall 
effect is still a reduction of about 432M instructions from a total run of 
about 8G instructions.

In terms of the overall impact on the compiler, it isn't huge, about 1-2% 
off the bootstrap time, but it's not insignificant.

valgrind --skin=cachegrind access counts for cc1 compiling combine.i for 
arm-elf with -O2

           old              new           saving

Ir    7,993,874,638   7,561,371,780        5.4%
Ir1m     88,369,620      86,038,290        2.6%
Ir2m      1,397,911       1,352,053        3.3%
Dr    2,837,027,610   2,661,966,294        6.1%
Dr1m     78,507,180      74,645,586        4.9%
Dr2m     11,586,479      11,524,025        0.5%
Dw    1,564,534,494   1,556,445,913        0.5%
Dw1m     17,705,886      16,313,848        7.9%
Dw2m      2,600,731       2,589,320        0.4%

Bootstrapped and reg-tested on i686-linux.  An earlier version of the 
patch was also bootstrapped on arm-netbsdelf.

Jeff, have I missed anything fundamental in my analysis of this routine?

<date>  Richard Earnshaw  <rearnsha@arm.com>

	* gcse.c (reg_clear_last_set): New function.
	(reg_set_info): If data is non-null, treat it as an sbitmap of
	registers, set the bit for the register being set.
	(compute_store_table): Allocate last_set_in with xcalloc.  Do not
	memset this array on each iteration.  Pass reg_set_in_block[bb->index]
	to note_stores while computing last_set_in instead of scanning
	last_set_in after the first pass through the insns.
	Clear last_set_in using reg_clear_last_set instead of explicitly
	rescanning after each insn.  If checking is enabled, assert that
	last_set_in is completely zeroed after each bb has been processed.


Attachment: comp-store.patch
Description: comp-store.patch


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]