This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Faster compilation speed


	I have IBM's hpmcount tool installed on a Power4 AIX 5.1 system
which can use PMAPI to access the hardware performance counters on the
chip.  I would be happy to provide additional data for comparison with the
x86 cache statistics which have been mentioned.

	So that we're all on the same page, what sourcefile is being
compiled with which GCC options?

	I can acquire information like for cc1 -O2 hello.c:

  PM_DTLB_MISS (Data TLB misses)               :            5538
  PM_ITLB_MISS (Instruction TLB misses)        :             819
  PM_LD_MISS_L1 (L1 D cache load misses)       :           43074
  PM_ST_MISS_L1 (L1 D cache store misses)      :          349240
  PM_ST_REF_L1 (L1 D cache store references)   :         1958037
  PM_LD_REF_L1 (L1 D cache load references)    :         3113549

  Utilization rate                           :          29.438 %
  % TLB misses per cycle                     :           0.038 %
  Avg number of loads per TLB miss           :         562.215
  Load and store operations                  :           5.072 M
  Instructions per load/store                :           2.899
  Avg number of loads per load miss          :          72.284
  Avg number of stores per store miss        :           5.607
  Avg number of load/stores per D1 miss      :          12.927
  L1 cache hit rate                          :          92.264 %


  PM_DATA_FROM_L3 (Data loaded from L3)                   :            1420
  PM_DATA_FROM_MEM (Data loaded from memory)              :             144
  PM_DATA_FROM_L35 (Data loaded from L3.5)                :              19
  PM_DATA_FROM_L2 (Data loaded from L2)                   :           36410
  PM_DATA_FROM_L25_SHR (Data loaded from L2.5 shared)     :               0
  PM_DATA_FROM_L275_SHR (Data loaded from L2.75 shared)   :               0
  PM_DATA_FROM_L275_MOD (Data loaded from L2.75 modified) :               0
  PM_DATA_FROM_L25_MOD (Data loaded from L2.5 modified)   :               0

  Memory traffic                             :           0.074 MBytes
  Memory bandwidth                           :           1.589 MBytes/sec
  Total loads from L3                        :           0.001 M
  L3 traffic                                 :           0.184 MBytes
  L3 bandwidth                               :           3.970 MBytes/sec
  L3 Load miss rate                          :           9.097 %
  Total loads from L2                        :           0.036 M
  L2 traffic                                 :           4.660 MBytes
  L2 bandwidth                               :         100.446 MBytes/sec
  L2 Load miss rate                          :           4.167 %


David


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]