Drop frequencies from basic blocks

Markus Trippelsdorf markus@trippelsdorf.de
Sun Nov 5 09:53:00 GMT 2017


On 2017.11.03 at 16:48 +0100, Jan Hubicka wrote:
> this is updated patch which I have comitted after profiledbootstrapping x86-64

Unfortunately, compiling tramp3d-v4.cpp is 6-7% slower after this patch.
This happens with an LTO/PGO bootstrapped gcc using --enable-checking=release.

On X86_64:

Before:
 Performance counter stats for 'g++ -w -Ofast tramp3d-v4.cpp' (4 runs):

      25040.360183      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.25% )
               650      context-switches          #    0.026 K/sec                    ( +- 76.87% )
                 2      cpu-migrations            #    0.000 K/sec                    ( +- 28.87% )
           268,141      page-faults               #    0.011 M/sec                    ( +-  0.01% )
    80,210,085,167      cycles                    #    3.203 GHz                      ( +-  0.26% )  (66.67%)
    21,061,765,388      stalled-cycles-frontend   #   26.26% frontend cycles idle     ( +-  0.37% )  (66.67%)
    24,699,976,439      stalled-cycles-backend    #   30.79% backend cycles idle      ( +-  0.57% )  (66.68%)
    69,167,169,243      instructions              #    0.86  insn per cycle
                                                  #    0.36  stalled cycles per insn  ( +-  0.05% )  (66.68%)
    15,230,229,662      branches                  #  608.227 M/sec                    ( +-  0.06% )  (66.68%)
       986,612,296      branch-misses             #    6.48% of all branches          ( +-  0.07% )  (66.68%)

      25.046439011 seconds time elapsed                                          ( +-  0.25% )

After:
 Performance counter stats for 'g++ -w -Ofast tramp3d-v4.cpp' (4 runs):

      26710.577065      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.27% )
               199      context-switches          #    0.007 K/sec                    ( +- 21.12% )
                 2      cpu-migrations            #    0.000 K/sec                    ( +- 14.29% )
           267,676      page-faults               #    0.010 M/sec                    ( +-  0.01% )
    85,561,962,974      cycles                    #    3.203 GHz                      ( +-  0.26% )  (66.66%)
    19,581,827,643      stalled-cycles-frontend   #   22.89% frontend cycles idle     ( +-  0.30% )  (66.66%)
    26,056,535,726      stalled-cycles-backend    #   30.45% backend cycles idle      ( +-  0.65% )  (66.68%)
    77,222,167,966      instructions              #    0.90  insn per cycle
                                                  #    0.34  stalled cycles per insn  ( +-  0.04% )  (66.68%)
    17,471,652,187      branches                  #  654.110 M/sec                    ( +-  0.05% )  (66.69%)
     1,082,141,013      branch-misses             #    6.19% of all branches          ( +-  0.04% )  (66.69%)

      26.713823720 seconds time elapsed                                          ( +-  0.27% )

==================================================================================================================

On PPC64le:

Before:
 Performance counter stats for 'g++ -w -Ofast tramp3d-v4.cpp' (4 runs):

      24281.894597      task-clock (msec)         #    0.989 CPUs utilized            ( +-  1.85% )
               166      context-switches          #    0.007 K/sec                    ( +-  2.46% )
                 5      cpu-migrations            #    0.000 K/sec                    ( +- 18.03% )
            52,908      page-faults               #    0.002 M/sec                    ( +- 11.61% )
    84,939,354,171      cycles                    #    3.498 GHz                      ( +-  1.82% )  (66.71%)
     4,680,693,343      stalled-cycles-frontend   #    5.51% frontend cycles idle     ( +-  8.75% )  (49.98%)
    46,697,372,688      stalled-cycles-backend    #   54.98% backend cycles idle      ( +-  2.06% )  (50.05%)
    94,990,460,746      instructions              #    1.12  insn per cycle
                                                  #    0.49  stalled cycles per insn  ( +-  0.10% )  (66.72%)
    19,562,344,992      branches                  #  805.635 M/sec                    ( +-  0.07% )  (50.06%)
       807,701,262      branch-misses             #    4.13% of all branches          ( +-  0.45% )  (50.05%)

      24.550558669 seconds time elapsed                                          ( +-  1.83% )

After:
 Performance counter stats for 'g++ -w -Ofast tramp3d-v4.cpp' (4 runs):

      26383.472582      task-clock (msec)         #    0.995 CPUs utilized            ( +-  1.83% )
               202      context-switches          #    0.008 K/sec                    ( +-  1.68% )
                 5      cpu-migrations            #    0.000 K/sec                    ( +- 14.29% )
            53,114      page-faults               #    0.002 M/sec                    ( +- 17.86% )
    92,099,443,793      cycles                    #    3.491 GHz                      ( +-  0.96% )  (66.68%)
     3,706,147,243      stalled-cycles-frontend   #    4.02% frontend cycles idle     ( +-  8.31% )  (50.00%)
    51,376,299,749      stalled-cycles-backend    #   55.78% backend cycles idle      ( +-  0.83% )  (50.05%)
   105,872,124,981      instructions              #    1.15  insn per cycle
                                                  #    0.49  stalled cycles per insn  ( +-  0.05% )  (66.74%)
    22,348,839,937      branches                  #  847.077 M/sec                    ( +-  0.16% )  (50.04%)
       847,288,219      branch-misses             #    3.79% of all branches          ( +-  0.06% )  (50.02%)

      26.511790685 seconds time elapsed                                          ( +-  1.84% )

--
Markus



More information about the Gcc-patches mailing list