This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
CHUD tool [Was: Faster compilation speed: cache behavior]
- From: "Timothy J. Wood" <tjw at omnigroup dot com>
- To: Matt Austern <austern at apple dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Tue, 20 Aug 2002 23:29:16 -0700
- Subject: CHUD tool [Was: Faster compilation speed: cache behavior]
On Tuesday, August 20, 2002, at 02:25 PM, Matt Austern wrote:
FYI, here are the results of a fairly crude test that I did
using one of the Apple performance tools.
I've written a little library that can be used to help gather this
sort of information with CHUD (which is what Matt is using).
http://www.omnigroup.com/~bungi/CHUDChassis-20020820.tar.gz
This project has a dylib target in it that has module load/unload
routines that invoke the CHUD remote client API to start the CHUD
sampling as soon as the app starts and shut it down when the app is
about to exit. It isn't perfect (the disconnect call seems to hang if
the server quit listening to you due to it filling its sample buffer).
Hopefully you'll find it of use.
First, put Shikari into remote listening mode (shift-cmd-r, under the
main app menu), then do something like:
OAKeepAllocationStatistics=1
DYLD_INSERT_LIBRARIES=/Users/Shared/bungi/Build/libCHUDChassis.dylib
./cc1 -quiet reload1.i
(The 'OAKeepAllocationStatistics' goo is a hack to cause
CoreFoundation to do a symbol lookup that provokes the module load
routine in the library -- a wrapper script would be easy to write here).
I'm not sure what settings Matt was using, so the numbers below may
not be something that can be compared against his numbers. I
configured Shikari to the preset "Data Cache Misses (7450)" which
increments a counter every 1000 dL3 cache misses.
(For those that haven't used CHUD, the toolkit will measure
performance across the entire system, hence the entries for the system
library and mach_kernel -- the later presumably being zero-fill page
faults). Not sure what the __floatdidf is...
I'm running this on a 2x800 Quicksilver w/1.5GB, from the head of the
mainline as of a couple hours ago.
./cc1 -quiet reload1.i :
15.8% gt_ggc_mx_lang_tree_node cc1
12.7% memset libSystem.B.dylib
6.9% __floatdidf cc1
5.9% poison_pages cc1
4.6% ggc_alloc cc1
3.7% bzero libSystem.B.dylib
3.5% .L_phys_zero_loop mach_kernel
3.3% ggc_mark_rtx_children_1 cc1
3.3% gt_ggc_mx_emit_status cc1
3.2% ggc_mark_rtx_children cc1
2.7% ggc_set_mark cc1
1.3% gt_ggc_mx_function cc1
1.2% bitmap_initialize cc1
0.9% build_insn_chain cc1
0.8% bitmap_operation cc1
0.8% gt_ggc_mx_varasm_status cc1
0.7% reload cc1
0.7% verify_flow_info cc1
0.6% find_reg_note cc1
0.6% ggc_collect cc1
0.6% scan_one_insn cc1
0.6% purge_hard_subreg_sets cc1
0.6% yyparse cc1
./cc1 -O2 -quiet reload1.i:
10.4% memset libSystem.B.dylib
9.9% gt_ggc_mx_lang_tree_node cc1
7.3% .L_phys_zero_loop mach_kernel
7.0% __floatdidf cc1
6.9% ggc_alloc cc1
6.5% poison_pages cc1
5.8% bzero libSystem.B.dylib
2.4% ggc_pop_context cc1
1.6% ggc_set_mark cc1
1.2% ggc_mark_rtx_children_1 cc1
1.0% ggc_mark_rtx_children cc1
1.0% bitmap_operation cc1
0.9% vm_page_lookup mach_kernel
0.9% find_reg_note cc1
0.9% allocate_reg_life_data cc1
0.8% init_alias_analysis cc1
0.6% vm_map_enter mach_kernel
0.6% verify_flow_info cc1
(I wasn't able to get cc1plus to compile my reload1.i for some reason
-- I'm sure someone at Apple and fiddle with that :)
-tim