This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Benchmarks of v2 (was Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2))

From: David Malcolm <dmalcolm at redhat dot com>
To: Richard Biener <richard dot guenther at gmail dot com>
Cc: Michael Matz <matz at suse dot de>, GCC Patches <gcc-patches at gcc dot gnu dot org>
Date: Tue, 13 Oct 2015 11:32:59 -0400
Subject: Benchmarks of v2 (was Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2))
Authentication-results: sourceware.org; auth=none
References: <1442957171-22904-1-git-send-email-dmalcolm at redhat dot com> <alpine dot LSU dot 2 dot 20 dot 1509231508490 dot 31674 at wotan dot suse dot de> <CAFiYyc3GNP7TAJzKFMybKdVLiaz_jQ4pO-DxMkCZT_oe1EJ3cQ at mail dot gmail dot com> <1443054335 dot 30732 dot 42 dot camel at surprise> <CAFiYyc23PGdoeu_8Dqp3AVpqftAiapqL1qARA-oY+2OrWPT0Wg at mail dot gmail dot com>

On Thu, 2015-09-24 at 10:15 +0200, Richard Biener wrote:
> On Thu, Sep 24, 2015 at 2:25 AM, David Malcolm <dmalcolm@redhat.com> wrote:
> > On Wed, 2015-09-23 at 15:36 +0200, Richard Biener wrote:
> >> On Wed, Sep 23, 2015 at 3:19 PM, Michael Matz <matz@suse.de> wrote:
> >> > Hi,
> >> >
> >> > On Tue, 22 Sep 2015, David Malcolm wrote:
> >> >
> >> >> The drawback is that it could bloat the ad-hoc table.  Can the ad-hoc
> >> >> table ever get smaller, or does it only ever get inserted into?
> >> >
> >> > It only ever grows.
> >> >
> >> >> An idea I had is that we could stash short ranges directly into the 32
> >> >> bits of location_t, by offsetting the per-column-bits somewhat.
> >> >
> >> > It's certainly worth an experiment: let's say you restrict yourself to
> >> > tokens less than 8 characters, you need an additional 3 bits (using one
> >> > value, e.g. zero, as the escape value).  That leaves 20 bits for the line
> >> > numbers (for the normal 8 bit columns), which might be enough for most
> >> > single-file compilations.  For LTO compilation this often won't be enough.
> >> >
> >> >> My plan is to investigate the impact these patches have on the time and
> >> >> memory consumption of the compiler,
> >> >
> >> > When you do so, make sure you're also measuring an LTO compilation with
> >> > debug info of something big (firefox).  I know that we already had issues
> >> > with the size of the linemap data in the past for these cases (probably
> >> > when we added columns).
> >>
> >> The issue we have with LTO is that the linemap gets populated in quite
> >> random order and thus we repeatedly switch files (we've mitigated this
> >> somewhat for GCC 5).  We also considered dropping column info
> >> (and would drop range info) as diagnostics are from optimizers only
> >> with LTO and we keep locations merely for debug info.
> >
> > Thanks.  Presumably the mitigation you're referring to is the
> > lto_location_cache class in lto-streamer-in.c?
> >
> > Am I right in thinking that, right now, the LTO code doesn't support
> > ad-hoc locations? (presumably the block pointers only need to exist
> > during optimization, which happens after the serialization)
> 
> LTO code does support ad-hoc locations but they are "restored" only
> when reading function bodies and stmts (by means of COMBINE_LOCATION_DATA).
> 
> > The obvious simplification would be, as you suggest, to not bother
> > storing range information with LTO, falling back to just the existing
> > representation.  Then there's no need to extend LTO to serialize ad-hoc
> > data; simply store the underlying locus into the bit stream.  I think
> > that this happens already: lto-streamer-out.c calls expand_location and
> > stores the result, so presumably any ad-hoc location_t values made by
> > the v2 patches would have dropped their range data there when I ran the
> > test suite.
> 
> Yep.  We only preserve BLOCKs, so if you don't add extra code to
> preserve ranges they'll be "dropped".
> 
> > If it's acceptable to not bother with ranges for LTO, one way to do the
> > "stashing short ranges into the location_t" idea might be for the
> > bits-per-range of location_t values to be a property of the line_table
> > (or possibly the line map), set up when the struct line_maps is created.
> > For non-LTO it could be some tuned value (maybe from a param?); for LTO
> > it could be zero, so that we have as many bits as before for line/column
> > data.
> 
> That could be a possibility (likewise for column info?)
> 
> Richard.
> 
> > Hope this sounds sane
> > Dave

I did some crude benchmarking of the patchkit, using these scripts:
  https://github.com/davidmalcolm/gcc-benchmarking
(specifically, bb0222b455df8cefb53bfc1246eb0a8038256f30),
using the "big-code.c" and "kdecore.cc" files Michael posted as:
  https://gcc.gnu.org/ml/gcc-patches/2013-09/msg00062.html
and "influence.i", a preprocessed version of SPEC2006's 445.gobmk
engine/influence.c (as an example of a moderate-sized pure C source
file).

This doesn't yet cover very large autogenerated C files, and the .cc
file is only being measured to see the effect on the ad-hoc table (and
tokenization).

"control" was r227977.
"experiment" was the same revision with the v2 patchkit applied.

Recall that this patchkit captures ranges for tokens as an extra field
within tokens within libcpp and the C FE, and adds ranges to the ad-hoc
location lookaside, storing them for all tree nodes within the C FE that
have a location_t, and passing them around within c_expr for all C
expressions (including those that don't have a location_t).

Both control and experiment were built with
  --enable-checking=release \
  --disable-bootstrap \
  --disable-multilib \
  --enable-languages=c,ada,c++,fortran,go,java,lto,objc,obj-c++

The script measures:

(a) wallclock time for "xgcc -S" so it's measuring the driver, parsing,
optimimation, etc, rather than attempting to directly measure parsing.
This is without -ftime-report, since Mikhail indicated it's sufficiently
expensive to skew timings in this post:
  https://gcc.gnu.org/ml/gcc/2015-07/msg00165.html

(b) memory usage: by performing a separate build with -ftime-report,
extracting the "TOTAL" ggc value (actually 3 builds, but it's the same
each time).

Is this a fair way to measure things?  It could be argued that by
measuring totals I'm hiding the extra parsing cost in the overall cost.

Full logs can be seen at:
  https://dmalcolm.fedorapeople.org/gcc/2015-09-25/bmark-v2.txt
(v2 of the patchkit)

I also investigated a version of the patchkit with the token tracking
rewritten to build ad-hoc ranges for *every token*, without attempting
any kind of optimization (e.g. for short ranges).
A log of this can be seen at:
https://dmalcolm.fedorapeople.org/gcc/2015-09-25/bmark-v2-plus-adhoc-ranges-for-tokens.txt
(v2 of the patchkit, with token tracking rewritten to build ad-hoc
ranges for *every token*).
The nice thing about this approach is that lots of token-related
diagnostics gain underlining of the relevant token "for free" simply
from the location_t, without having to individually patch them.  Without
any optimization, the memory consumed by this approach is clearly
larger.

A summary comparing the two logs:

Minimal wallclock time (s) over 10 iterations
                          Control -> v2                                 Control -> v2+adhocloc+at+every+token
kdecore.cc -g -O0          10.306548 -> 10.268712: 1.00x faster          10.247160 -> 10.444528: 1.02x slower
kdecore.cc -g -O1          27.026285 -> 27.220654: 1.01x slower          27.280681 -> 27.622676: 1.01x slower
kdecore.cc -g -O2          43.791668 -> 44.020270: 1.01x slower          43.904934 -> 44.248477: 1.01x slower
kdecore.cc -g -O3          47.471836 -> 47.651101: 1.00x slower          47.645985 -> 48.005495: 1.01x slower
kdecore.cc -g -Os          31.678652 -> 31.802829: 1.00x slower          31.741484 -> 32.033478: 1.01x slower
   empty.c -g -O0            0.012662 -> 0.011932: 1.06x faster            0.012888 -> 0.013143: 1.02x slower
   empty.c -g -O1            0.012685 -> 0.012558: 1.01x faster            0.013164 -> 0.012790: 1.03x faster
   empty.c -g -O2            0.012694 -> 0.012846: 1.01x slower            0.012912 -> 0.013175: 1.02x slower
   empty.c -g -O3            0.012654 -> 0.012699: 1.00x slower            0.012596 -> 0.012792: 1.02x slower
   empty.c -g -Os            0.013057 -> 0.012766: 1.02x faster            0.012691 -> 0.012885: 1.02x slower
big-code.c -g -O0            3.292680 -> 3.325748: 1.01x slower            3.292948 -> 3.303049: 1.00x slower
big-code.c -g -O1          15.701810 -> 15.765014: 1.00x slower          15.714116 -> 15.759254: 1.00x slower
big-code.c -g -O2          22.575615 -> 22.620187: 1.00x slower          22.567406 -> 22.605435: 1.00x slower
big-code.c -g -O3          52.423586 -> 52.590075: 1.00x slower          52.421460 -> 52.703835: 1.01x slower
big-code.c -g -Os          21.153980 -> 21.253598: 1.00x slower          21.146266 -> 21.260138: 1.01x slower
influence.i -g -O0            0.148229 -> 0.149518: 1.01x slower            0.148672 -> 0.156262: 1.05x slower
influence.i -g -O1            0.387397 -> 0.389930: 1.01x slower            0.387734 -> 0.396655: 1.02x slower
influence.i -g -O2            0.587514 -> 0.589604: 1.00x slower            0.588064 -> 0.596510: 1.01x slower
influence.i -g -O3            1.273561 -> 1.280514: 1.01x slower            1.274599 -> 1.287596: 1.01x slower
influence.i -g -Os            0.526045 -> 0.527579: 1.00x slower            0.526827 -> 0.535635: 1.02x slower


Maximal ggc memory (kb)
                     Control -> v2                                 Control -> v2+adhocloc+at+every+token
kdecore.cc -g -O0      650337.000 -> 654435.000: 1.0063x larger      650337.000 -> 711775.000: 1.0945x larger
kdecore.cc -g -O1      931966.000 -> 940144.000: 1.0088x larger      931951.000 -> 989384.000: 1.0616x larger
kdecore.cc -g -O2    1125325.000 -> 1133514.000: 1.0073x larger    1125318.000 -> 1182384.000: 1.0507x larger
kdecore.cc -g -O3    1221408.000 -> 1229596.000: 1.0067x larger    1221410.000 -> 1278658.000: 1.0469x larger
kdecore.cc -g -Os      867140.000 -> 871235.000: 1.0047x larger      867141.000 -> 928700.000: 1.0710x larger
   empty.c -g -O0          1189.000 -> 1192.000: 1.0025x larger          1189.000 -> 1193.000: 1.0034x larger
   empty.c -g -O1          1189.000 -> 1192.000: 1.0025x larger          1189.000 -> 1193.000: 1.0034x larger
   empty.c -g -O2          1189.000 -> 1192.000: 1.0025x larger          1189.000 -> 1193.000: 1.0034x larger
   empty.c -g -O3          1189.000 -> 1192.000: 1.0025x larger          1189.000 -> 1193.000: 1.0034x larger
   empty.c -g -Os          1189.000 -> 1192.000: 1.0025x larger          1189.000 -> 1193.000: 1.0034x larger
big-code.c -g -O0      166584.000 -> 172731.000: 1.0369x larger      166584.000 -> 172726.000: 1.0369x larger
big-code.c -g -O1      279793.000 -> 285940.000: 1.0220x larger      279793.000 -> 285935.000: 1.0220x larger
big-code.c -g -O2      400058.000 -> 406194.000: 1.0153x larger      400058.000 -> 406189.000: 1.0153x larger
big-code.c -g -O3      903648.000 -> 909750.000: 1.0068x larger      903906.000 -> 910001.000: 1.0067x larger
big-code.c -g -Os      357060.000 -> 363010.000: 1.0167x larger      357060.000 -> 363005.000: 1.0166x larger
influence.i -g -O0          9273.000 -> 9719.000: 1.0481x larger         9273.000 -> 13303.000: 1.4346x larger
influence.i -g -O1        12968.000 -> 13414.000: 1.0344x larger        12968.000 -> 16998.000: 1.3108x larger
influence.i -g -O2        16386.000 -> 16768.000: 1.0233x larger        16386.000 -> 20352.000: 1.2420x larger
influence.i -g -O3        35508.000 -> 35763.000: 1.0072x larger        35508.000 -> 39346.000: 1.1081x larger
influence.i -g -Os        14287.000 -> 14669.000: 1.0267x larger        14287.000 -> 18253.000: 1.2776x larger

Thoughts?
Dave

Follow-Ups:
- Re: Benchmarks of v2 (was Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2))
  - From: Richard Biener

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]