This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [tree-ssa PATCH] Pick memory consumption low hanging fruit


On Mon, 2003-11-17 at 22:34, Dan Nicolaescu wrote:
> Andrew MacLeod <amacleod@redhat.com> writes:

> g++ -O2 PR8361.ii -fmem-report                            g++ -O2 -fdisable-tree-ssa PR8361.ii  -fmem-report
> 
> Me    10M        126k			
> Total        878M        137M       8081k		  Total        188M        124M       1866k			
>              ^^^^^                                                     ^^^^
> 
> Total Memory allocated during the compilation process:    Total Memory allocated during the compilation process:
> Total Overhead:                         156404958	  Total Overhead:                          44849046		
> Total Allocated:                        918957116	  Total Allocated:                        507511156
>                                         ^^^^^^^^^                                                 ^^^^^^^^^

> Hmmmm, I just tried a compilation using 
> -O2 -fno-tree-pre -fmem-report 
> The lines corresponding to the ones marked with "^^^^^^^" above are:
> 
> Total        232M        191M       2229k
>             ^^^^^^ 
> Total Overhead:                          88276642
> Total Allocated:                        764615468
>                                         ^^^^^^^^^
> 
> So it seems that tree-pre is doing something that keeps a lot of
> memory from being gc-ed. 

I just did some runs with a build from last night. I also watched the
size of resident memory for the process cc1plus through the system
monitor.

(excuse the width I'll sumarize right after.)

-O2   						-O2 -fno-tree-pre				-O2 -fdisable-tree-ssa    			Mainline -O2              
---                                             -----------------                               ----------------------                          ------------                                           
Size   Allocated        Used    Overhead                                                        Size   Allocated        Used    Overhead                                                               
8           2212k        317k         49k       Size   Allocated        Used    Overhead        8           2508k        657k         56k       Size   Allocated        Used    Overhead
16          3552k        690k         52k       8           1720k        317k         38k       16          3988k       1390k         58k       8           1500k        128k         33k
32            25M       3026k        283k       16          2720k        690k         39k       32            25M       3026k        283k       16          3428k        414k         50k
64            14M       9121k        127k       32            25M       3026k        283k       64            11M       8916k        100k       32          4328k       1986k         46k
128         8192        1024          64        64            10M       9023k         97k       128         8192        1024          64        64          3788k       2386k         33k
256         7824k       6531k         53k       128         8192        1024          64        256         7820k       6531k         53k       128           12k       1024          96
512         5412k       4453k         36k       256         7820k       6531k         53k       512         5412k       4453k         36k       256         7828k       6539k         53k
1024        9176k       8692k         62k       512         5412k       4453k         36k       1024        9176k       8692k         62k       512         5408k       4458k         36k
2048          12k       6144          84        1024        9176k       8692k         62k       2048          12k       6144          84        1024        9176k       8703k         62k
4096        8192        8192          56        2048          12k       6144          84        4096        8192        8192          56        2048          12k       6144          84
8192          24k         24k         84        4096        8192        8192          56        8192          24k         24k         84        4096        8192        8192          56
16384         16k         16k         28        8192          24k         24k         84        16384         16k         16k         28        8192          40k         40k        140
32768        320k        320k        280        16384         16k         16k         28        32768        320k        320k        280        16384         16k         16k         28
65536         64k         64k         28        32768        320k        320k        280        131072        512k        512k        112       32768        320k        320k        280
131072        384k        384k         84       131072        512k        512k        112       524288        512k        512k         28       65536         64k         64k         28
262144        256k        256k         28       262144        256k        256k         28       116           40M         29M        323k       131072        384k        384k         84
116           48M         29M        388k       116           37M         29M        302k       24            43M       7304k        523k       262144        256k        256k         28
24            53M       7193k        643k       24            37M       7194k        447k       12          5884k        524k         97k       108           30M         28M        243k
12          5916k        525k         98k       12          4476k        526k         74k       40            11M       4024k        114k       20            39M         15M        515k
40            13M       4077k        135k       40            12M       4077k        126k       Total        167M         75M       1712k       24            14M       2557k        169k
Total        190M         74M       1932k       Total        156M         74M       1563k                                                       12          2540k         96k         42k
                                                                                                String pool                                     40          6668k       4602k         65k
String pool                                     String pool                                     entries		57476                           Total        128M         76M       1353k
entries		60769                           entries		59191                           identifiers	57476 (100.00%)                   
identifiers	60769 (100.00%)                 identifiers	59191 (100.00%)                 slots		131072                          String pool             
slots		131072                          slots		131072                          bytes		1131k (85k overhead)            entries		14310        
bytes		1160k (88k overhead)            bytes		1141k (87k overhead)            table size	512k                            identifiers	14310 (100.00%) 
table size	512k                            table size	512k                            coll/search	0.7351                          slots		32768 
coll/search	0.7232                          coll/search	0.7202                          ins/search	0.2559                          bytes		748k (38k overhead)
ins/search	0.2669                          ins/search	0.2619                          avg. entry	20.15 bytes (+/- 51.41)         table size	128k   
avg. entry	19.55 bytes (+/- 50.06)         avg. entry	19.75 bytes (+/- 50.71)         longest entry	2315                            coll/search	0.4301 
longest entry	2315                            longest entry	2315                            						ins/search	0.0637 
                                                						                                                avg. entry	53.58 bytes (+/- 95.49)
																		longest entry   2315


Basically the allocated sizes I got were:
-O2 		190MB
-O2 no PRE	156MB
-O2 no SSA	167MB
mainline -O2	128MB

I can't explain the SSA-no-PRE and no-ssa comparision, but It repeats
consistantly. 

What is truly interesting is what the system monitor showed. The
bouncing I refer to here is typically garbage collections happening and
droipping the memory in use figure.

mainline climbed to about 165MB, and bounced around between 160MB and
165MB, until the very end, where it climbed to 191MB and then finished.

tree-ssa disabled climbed to about 215MB,and bounced between there and
183MB until it climbed to 280MB and finished.

tree-ssa enabled, PRE disabled climbed to 225MB, and bounced between
there and 179 MB (There was more bouncing, so more GCs). INterestingly,
it didnt have the same spike at the end as the others, perhaps a timely
GC occured.

tree-ssa enabled, PRE enabled started off similarly, it climbed to
219MB, dropped to 189 MB, but then climbed to 430MB and bounced between
453MB and 369MB for most of the compilation, before dropping off to 235
MB, climbing back to 276MB, dropping to 218MB, then finishing. Since the
GC data doesn't vary that much for PRE, I assume there is a lot of other
memory in there somewhere. As you miht expect, there we more GC's with
PRE enabled than in the other cases.

I dont know how accurate the system monitor reflects things, but I can
run everything again and again, and see pretty much identical results.


User Compile time between mainline and tree-ssa DISABLED was pretty much
identical, (63 seconds) which is good... we just used more memory.

adding tree-ssa, no PRE, the compile time was about 10 seconds higher
(user time), so 73 seconds. When I looked at it last month, both
mainline and ssa were doing it in about 82, so there have clearly been
improvements in mainline. SSA is no longer picking up the gains in the
RTL phases as it was back then, but elimination of some rtl phases would
get us that time pretty much back.
Of course, when we leave user time and goto overall time to account for
the swapping, mainline was overall 83 seconds, ssa disables was 92
seconds, tree-ssa was 95 seconds,and with PRE I got about 130 seconds.
PRE itself took 9 seconds of user time, and the rest was all swapping
time.

I can drop the overall time a bit by clearing everything out of my
machine, but the timing trend for swapping appears to back up the memory
usage shown by the system monitor.

Andrew



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]