RFA: patch for coalescing stack slots

Tue Apr 13 19:49:00 GMT 2004

  Currently gcc original global register allocator and reload has a
primitive code for sharing stack slots for spilled registers.

  This patch makes coalescing stack slots of registers which have not
obtained hard registers and slots used for spilled registers.

  The patch decreases size of stack frames allocated for functions.
E.g. this frame stack allocation for Linpack benchmark before and
after the patch:
        subl    $12, %esp               subl    $12, %esp   
!       subl    $80, %esp               subl    $68, %esp   
        subl    $56, %esp               subl    $56, %esp   
        subl    $4, %esp                subl    $4, %esp    
!       subl    $56, %esp               subl    $48, %esp   
        subl    $52, %esp               subl    $44, %esp   
!       subl    $1388, %esp             subl    $1356, %esp 
        subl    $16, %esp               subl    $16, %esp   

  The patch also decrease code size for some platforms like x86
because in many cases smaller displacements are used (we are using the
first found space approach).

----------------CFP2000-----------------
-0.904%           7078           7014 171.swim
 0.000%          17026          17026 183.equake
-0.319%          10023           9991 172.mgrid
 0.000%          12011          12011 179.art
-1.910%          25128          24648 168.wupwise
-1.041%         443993         439369 177.mesa
-3.481%         844452         815060 200.sixtrack
-2.799%         106317         103341 301.apsi
-0.490%          58741          58453 173.applu
Average = -0.643836%

----------------CINT2000-----------------
-0.187%          85436          85276 197.parser
-0.261%         128917         128581 175.vpr
-0.003%         568904         568888 255.vortex
-0.853%          28133          27893 256.bzip2
-0.044%         473971         473763 253.perlbmk
-0.110%         480338         479810 252.eon
-1.467%         181055         178399 300.twolf
-0.102%         421816         421384 254.gap
-0.492%    1.24172e+06     1.2356e+06 176.gcc
-1.586%         204846         201598 186.crafty
 0.000%          32668          32668 164.gzip
 0.000%           9686           9686 181.mcf
Average = -0.34032%

  The patch improves code and data locality therefore gcc becomes a bit
faster.  User time for x86 bootstraping decreased from 14m0.150s to
13m58.890s.  The better code and data locality improves SPECFP2000
benchmark results too (about 2.4%).

   ===============================================
   168.wupwise            890    *         887    *
   171.swim               604    *         609    *
   172.mgrid                     X                X
   173.applu              624    *         627    *
   177.mesa               629    *         639    *
   178.galgel                    X                X
   179.art                244    *         248    *
   183.equake             964    *         963    *
   187.facerec                   X                X
   188.ammp                      X                X
   189.lucas                     X                X
   191.fma3d                     X                X
   200.sixtrack           337    *         385    *
   301.apsi               401    *         407    *
   Est. SPECfp_base2000   535    
   Est. SPECfp2000                         548    

  There is no improvement for SPECINT2000.

   ================================================
   164.gzip                   728*              719*
   175.vpr                    511*              514*
   176.gcc                    851*              864*
   181.mcf                    530*              530*
   186.crafty                 775*              780*
   197.parser                 635*              635*
   252.eon                       X                 X
   253.perlbmk               1014*             1012*
   254.gap                    807*              811*
   255.vortex                 941*              943*
   256.bzip2                  615*              615*
   300.twolf                  648*              623*
   Est. SPECint_base2000      716            
   Est. SPECint2000                             715

  I think the patch will be more important for tree-ssa and
pseudo-register renaming optimizations (-fweb) because the ssa
approach or the pseudo-register renaming generates more
pseudo-registers and live ranges.  But I have no time to check this.

  The patch was tested on bootstrapping for x86, x86_64, itanium and
ppc.  The patch has been also tested on regression tests for the
mentioned platforms.  I did not find new regressions.

  Is the patch ok for committing it to the mainline?  Should this code
work by default (that is what the patch does now)?

Vlad

2004-04-08  Vladimir Makarov  <vmakarov@redhat.com>

	* rtl.h (pseudo_reg_conflict_p): New prototype.

	* global.c (pseudo_reg_conflict_p): New function.

	* reload.h (called_from_global_p): New external variable.
	(find_reloads): Remove one parameter in the prototype.

	* reload.c (hard_regs_live_known): Use bool as the type.
	(find_reloads): Remove parameter live_known.  Use called_from_global_p.

	* reload1.c (called_from_global_p): New external variable.
	(spilled_reg_stack_slot): New structure.
	(spilled_reg_stack_slots_num, spilled_reg_stack_slots): New global
	varaibles.
	(calculate_needs_all_insns, finish_spills): Remove the parameter.
	(alter_reg): Change parameter.  Rewrite code for coalescing
	stack slots.
	(reload): Initialize/finalize the varaibles.  Use new
	parameter value for alter_reg.
	(finish_spills, delete_output_reload): Use new parameter value
	for alter_reg.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: stack_slots_coalescing.patch
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20040413/362dd3c0/attachment.ksh>