This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Benchmarks of v2 (was Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2))

From: Richard Biener <richard dot guenther at gmail dot com>
To: David Malcolm <dmalcolm at redhat dot com>
Cc: Michael Matz <matz at suse dot de>, GCC Patches <gcc-patches at gcc dot gnu dot org>
Date: Wed, 14 Oct 2015 11:00:14 +0200
Subject: Re: Benchmarks of v2 (was Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2))
Authentication-results: sourceware.org; auth=none
References: <1442957171-22904-1-git-send-email-dmalcolm at redhat dot com> <alpine dot LSU dot 2 dot 20 dot 1509231508490 dot 31674 at wotan dot suse dot de> <CAFiYyc3GNP7TAJzKFMybKdVLiaz_jQ4pO-DxMkCZT_oe1EJ3cQ at mail dot gmail dot com> <1443054335 dot 30732 dot 42 dot camel at surprise> <CAFiYyc23PGdoeu_8Dqp3AVpqftAiapqL1qARA-oY+2OrWPT0Wg at mail dot gmail dot com> <1444750379 dot 17932 dot 31 dot camel at surprise>

On Tue, Oct 13, 2015 at 5:32 PM, David Malcolm <dmalcolm@redhat.com> wrote:
> On Thu, 2015-09-24 at 10:15 +0200, Richard Biener wrote:
>> On Thu, Sep 24, 2015 at 2:25 AM, David Malcolm <dmalcolm@redhat.com> wrote:
>> > On Wed, 2015-09-23 at 15:36 +0200, Richard Biener wrote:
>> >> On Wed, Sep 23, 2015 at 3:19 PM, Michael Matz <matz@suse.de> wrote:
>> >> > Hi,
>> >> >
>> >> > On Tue, 22 Sep 2015, David Malcolm wrote:
>> >> >
>> >> >> The drawback is that it could bloat the ad-hoc table.  Can the ad-hoc
>> >> >> table ever get smaller, or does it only ever get inserted into?
>> >> >
>> >> > It only ever grows.
>> >> >
>> >> >> An idea I had is that we could stash short ranges directly into the 32
>> >> >> bits of location_t, by offsetting the per-column-bits somewhat.
>> >> >
>> >> > It's certainly worth an experiment: let's say you restrict yourself to
>> >> > tokens less than 8 characters, you need an additional 3 bits (using one
>> >> > value, e.g. zero, as the escape value).  That leaves 20 bits for the line
>> >> > numbers (for the normal 8 bit columns), which might be enough for most
>> >> > single-file compilations.  For LTO compilation this often won't be enough.
>> >> >
>> >> >> My plan is to investigate the impact these patches have on the time and
>> >> >> memory consumption of the compiler,
>> >> >
>> >> > When you do so, make sure you're also measuring an LTO compilation with
>> >> > debug info of something big (firefox).  I know that we already had issues
>> >> > with the size of the linemap data in the past for these cases (probably
>> >> > when we added columns).
>> >>
>> >> The issue we have with LTO is that the linemap gets populated in quite
>> >> random order and thus we repeatedly switch files (we've mitigated this
>> >> somewhat for GCC 5).  We also considered dropping column info
>> >> (and would drop range info) as diagnostics are from optimizers only
>> >> with LTO and we keep locations merely for debug info.
>> >
>> > Thanks.  Presumably the mitigation you're referring to is the
>> > lto_location_cache class in lto-streamer-in.c?
>> >
>> > Am I right in thinking that, right now, the LTO code doesn't support
>> > ad-hoc locations? (presumably the block pointers only need to exist
>> > during optimization, which happens after the serialization)
>>
>> LTO code does support ad-hoc locations but they are "restored" only
>> when reading function bodies and stmts (by means of COMBINE_LOCATION_DATA).
>>
>> > The obvious simplification would be, as you suggest, to not bother
>> > storing range information with LTO, falling back to just the existing
>> > representation.  Then there's no need to extend LTO to serialize ad-hoc
>> > data; simply store the underlying locus into the bit stream.  I think
>> > that this happens already: lto-streamer-out.c calls expand_location and
>> > stores the result, so presumably any ad-hoc location_t values made by
>> > the v2 patches would have dropped their range data there when I ran the
>> > test suite.
>>
>> Yep.  We only preserve BLOCKs, so if you don't add extra code to
>> preserve ranges they'll be "dropped".
>>
>> > If it's acceptable to not bother with ranges for LTO, one way to do the
>> > "stashing short ranges into the location_t" idea might be for the
>> > bits-per-range of location_t values to be a property of the line_table
>> > (or possibly the line map), set up when the struct line_maps is created.
>> > For non-LTO it could be some tuned value (maybe from a param?); for LTO
>> > it could be zero, so that we have as many bits as before for line/column
>> > data.
>>
>> That could be a possibility (likewise for column info?)
>>
>> Richard.
>>
>> > Hope this sounds sane
>> > Dave
>
> I did some crude benchmarking of the patchkit, using these scripts:
>   https://github.com/davidmalcolm/gcc-benchmarking
> (specifically, bb0222b455df8cefb53bfc1246eb0a8038256f30),
> using the "big-code.c" and "kdecore.cc" files Michael posted as:
>   https://gcc.gnu.org/ml/gcc-patches/2013-09/msg00062.html
> and "influence.i", a preprocessed version of SPEC2006's 445.gobmk
> engine/influence.c (as an example of a moderate-sized pure C source
> file).
>
> This doesn't yet cover very large autogenerated C files, and the .cc
> file is only being measured to see the effect on the ad-hoc table (and
> tokenization).
>
> "control" was r227977.
> "experiment" was the same revision with the v2 patchkit applied.
>
> Recall that this patchkit captures ranges for tokens as an extra field
> within tokens within libcpp and the C FE, and adds ranges to the ad-hoc
> location lookaside, storing them for all tree nodes within the C FE that
> have a location_t, and passing them around within c_expr for all C
> expressions (including those that don't have a location_t).
>
> Both control and experiment were built with
>   --enable-checking=release \
>   --disable-bootstrap \
>   --disable-multilib \
>   --enable-languages=c,ada,c++,fortran,go,java,lto,objc,obj-c++
>
> The script measures:
>
> (a) wallclock time for "xgcc -S" so it's measuring the driver, parsing,
> optimimation, etc, rather than attempting to directly measure parsing.
> This is without -ftime-report, since Mikhail indicated it's sufficiently
> expensive to skew timings in this post:
>   https://gcc.gnu.org/ml/gcc/2015-07/msg00165.html
>
> (b) memory usage: by performing a separate build with -ftime-report,
> extracting the "TOTAL" ggc value (actually 3 builds, but it's the same
> each time).
>
> Is this a fair way to measure things?  It could be argued that by
> measuring totals I'm hiding the extra parsing cost in the overall cost.

Overall cost is what matters.   Time to build the libstdc++ PCHs
would be interesting as well ;)  (and their size)

One could have argued you should have used -fsyntax-only.

> Full logs can be seen at:
>   https://dmalcolm.fedorapeople.org/gcc/2015-09-25/bmark-v2.txt
> (v2 of the patchkit)
>
> I also investigated a version of the patchkit with the token tracking
> rewritten to build ad-hoc ranges for *every token*, without attempting
> any kind of optimization (e.g. for short ranges).
> A log of this can be seen at:
> https://dmalcolm.fedorapeople.org/gcc/2015-09-25/bmark-v2-plus-adhoc-ranges-for-tokens.txt
> (v2 of the patchkit, with token tracking rewritten to build ad-hoc
> ranges for *every token*).
> The nice thing about this approach is that lots of token-related
> diagnostics gain underlining of the relevant token "for free" simply
> from the location_t, without having to individually patch them.  Without
> any optimization, the memory consumed by this approach is clearly
> larger.
>
> A summary comparing the two logs:
>
> Minimal wallclock time (s) over 10 iterations
>                           Control -> v2                                 Control -> v2+adhocloc+at+every+token
> kdecore.cc -g -O0          10.306548 -> 10.268712: 1.00x faster          10.247160 -> 10.444528: 1.02x slower
> kdecore.cc -g -O1          27.026285 -> 27.220654: 1.01x slower          27.280681 -> 27.622676: 1.01x slower
> kdecore.cc -g -O2          43.791668 -> 44.020270: 1.01x slower          43.904934 -> 44.248477: 1.01x slower
> kdecore.cc -g -O3          47.471836 -> 47.651101: 1.00x slower          47.645985 -> 48.005495: 1.01x slower
> kdecore.cc -g -Os          31.678652 -> 31.802829: 1.00x slower          31.741484 -> 32.033478: 1.01x slower
>    empty.c -g -O0            0.012662 -> 0.011932: 1.06x faster            0.012888 -> 0.013143: 1.02x slower
>    empty.c -g -O1            0.012685 -> 0.012558: 1.01x faster            0.013164 -> 0.012790: 1.03x faster
>    empty.c -g -O2            0.012694 -> 0.012846: 1.01x slower            0.012912 -> 0.013175: 1.02x slower
>    empty.c -g -O3            0.012654 -> 0.012699: 1.00x slower            0.012596 -> 0.012792: 1.02x slower
>    empty.c -g -Os            0.013057 -> 0.012766: 1.02x faster            0.012691 -> 0.012885: 1.02x slower
> big-code.c -g -O0            3.292680 -> 3.325748: 1.01x slower            3.292948 -> 3.303049: 1.00x slower
> big-code.c -g -O1          15.701810 -> 15.765014: 1.00x slower          15.714116 -> 15.759254: 1.00x slower
> big-code.c -g -O2          22.575615 -> 22.620187: 1.00x slower          22.567406 -> 22.605435: 1.00x slower
> big-code.c -g -O3          52.423586 -> 52.590075: 1.00x slower          52.421460 -> 52.703835: 1.01x slower
> big-code.c -g -Os          21.153980 -> 21.253598: 1.00x slower          21.146266 -> 21.260138: 1.01x slower
> influence.i -g -O0            0.148229 -> 0.149518: 1.01x slower            0.148672 -> 0.156262: 1.05x slower
> influence.i -g -O1            0.387397 -> 0.389930: 1.01x slower            0.387734 -> 0.396655: 1.02x slower
> influence.i -g -O2            0.587514 -> 0.589604: 1.00x slower            0.588064 -> 0.596510: 1.01x slower
> influence.i -g -O3            1.273561 -> 1.280514: 1.01x slower            1.274599 -> 1.287596: 1.01x slower
> influence.i -g -Os            0.526045 -> 0.527579: 1.00x slower            0.526827 -> 0.535635: 1.02x slower
>
>
> Maximal ggc memory (kb)
>                      Control -> v2                                 Control -> v2+adhocloc+at+every+token
> kdecore.cc -g -O0      650337.000 -> 654435.000: 1.0063x larger      650337.000 -> 711775.000: 1.0945x larger
> kdecore.cc -g -O1      931966.000 -> 940144.000: 1.0088x larger      931951.000 -> 989384.000: 1.0616x larger
> kdecore.cc -g -O2    1125325.000 -> 1133514.000: 1.0073x larger    1125318.000 -> 1182384.000: 1.0507x larger
> kdecore.cc -g -O3    1221408.000 -> 1229596.000: 1.0067x larger    1221410.000 -> 1278658.000: 1.0469x larger
> kdecore.cc -g -Os      867140.000 -> 871235.000: 1.0047x larger      867141.000 -> 928700.000: 1.0710x larger
>    empty.c -g -O0          1189.000 -> 1192.000: 1.0025x larger          1189.000 -> 1193.000: 1.0034x larger
>    empty.c -g -O1          1189.000 -> 1192.000: 1.0025x larger          1189.000 -> 1193.000: 1.0034x larger
>    empty.c -g -O2          1189.000 -> 1192.000: 1.0025x larger          1189.000 -> 1193.000: 1.0034x larger
>    empty.c -g -O3          1189.000 -> 1192.000: 1.0025x larger          1189.000 -> 1193.000: 1.0034x larger
>    empty.c -g -Os          1189.000 -> 1192.000: 1.0025x larger          1189.000 -> 1193.000: 1.0034x larger
> big-code.c -g -O0      166584.000 -> 172731.000: 1.0369x larger      166584.000 -> 172726.000: 1.0369x larger
> big-code.c -g -O1      279793.000 -> 285940.000: 1.0220x larger      279793.000 -> 285935.000: 1.0220x larger
> big-code.c -g -O2      400058.000 -> 406194.000: 1.0153x larger      400058.000 -> 406189.000: 1.0153x larger
> big-code.c -g -O3      903648.000 -> 909750.000: 1.0068x larger      903906.000 -> 910001.000: 1.0067x larger
> big-code.c -g -Os      357060.000 -> 363010.000: 1.0167x larger      357060.000 -> 363005.000: 1.0166x larger
> influence.i -g -O0          9273.000 -> 9719.000: 1.0481x larger         9273.000 -> 13303.000: 1.4346x larger
> influence.i -g -O1        12968.000 -> 13414.000: 1.0344x larger        12968.000 -> 16998.000: 1.3108x larger
> influence.i -g -O2        16386.000 -> 16768.000: 1.0233x larger        16386.000 -> 20352.000: 1.2420x larger
> influence.i -g -O3        35508.000 -> 35763.000: 1.0072x larger        35508.000 -> 39346.000: 1.1081x larger
> influence.i -g -Os        14287.000 -> 14669.000: 1.0267x larger        14287.000 -> 18253.000: 1.2776x larger
>
> Thoughts?

The compile-time and memory-usage impact for the adhocloc at every
token patchkit is quite big.  Remember
that gaining 1% in compile-time is hard and 20-40% memory increase for
influence.i looks too much.

I also wonder why you see differences in memory usage change for
different -O levels.  I think we should
have a pretty "static" line table after parsing?  Thus rather than
percentages I'd like to see absolute changes
(which I'd expect to be the same for all -O levels).

Richard.

> Dave
>
>

Follow-Ups:
- Re: Benchmarks of v2 (was Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2))
  - From: Michael Matz
- Re: Benchmarks of v2 (was Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2))
  - From: David Malcolm

References:
- Benchmarks of v2 (was Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2))
  - From: David Malcolm

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]