This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Preprocessor performance analysis


I've done some investigation of CPP performance, in terms of execution
time and peak memory usage, for 4 versions of GCC I have on my
machine: 2.95.4 and 3.0.1 Debian i386 binaries, and 12-Aug-01 and
30-Sep-01 3.1 mainline i586 binaries.  In quick tests, the i586
binaries tend to have a small performance advantage over i386 ones of
roughly 4% on the machine I used for testing (AMD K6 II / 350 MHz).

I tested time to create preprocessed output on mainline's cppmain.c,
c-decl.c, insn-recog.c, and a testcase of extreme nested function-like
macro expansion taken from GLIBC's macro definition of stpcpy()
[attached to this email for the curious].  This was an attempt to test
a reasonable variety of inputs.

For execution times, the stpcpy test is the time taken to run a shell
script that runs the command 10 times, divided by 10.  For the other
three, it is the time taken to run the shell script, not divided by
10.  I used NJAMD to get the peak memory usage stats.

The reason I chose August 12th as a snapshot is that is the day after
I found the bug in the 3.x series that wasn't freeing up memory like I
thought it was.  I committed the 2-line fix on August 11th.

The Sep 30th snapshot is the current snapshot, which incorporates my
complete rework of memory management in cpplib, and the conversion of
the macro expansion algorithm and the lexer to return pointers to
tokens [i.e. cpp_get_token ()].  It also incorporates Zack's
improvements to the routines that output token spellings, and his
speed-up of identifier lexing.

Here's the results:

Test		2.95.4	  3.0.1	    3.1 (Aug-12)  3.1 (Sep-30)
--------------------------------------------------------------
cppmain.c       1.94s     2.84s      2.80s	  2.58s
		808K	  1.08MB     1.22MB	  1010K

c-decl.c        4.23s     6.94s      6.65s	  5.54s
		2.14MB	  3.59MB     2.32MB	  1.55MB

insn-recog.c    8.50s     17.62s     17.56s	  12.81s
		4.83MB	  9.65MB     1.65MB	  1.43MB

stpcpy-test.c   1.89s     4.77s      4.82s	  2.40s		
		8.62MB	  114.22MB   28.37MB	  948K
			  ^^!!!^^

I'd summarize the above results like so: 3.0.1 is worse in both
respects than 2.95.4, I would say disastrously so.  Memory usage is
high, ridiculously so when there is a lot of macro expansion.  [The
memory usage in 3.0.2 will be more or less like 3.1 (Aug-12) above,
since both incorporate the bug fix.]

3.1 as of August 12 has much better memory usage than 3.0.1,
comparable to 2.95.x with the exception of heavy nested macro
expansion.  It retains the slowness of 3.0.1.

3.1-CURRENT has, in general, by far the smallest memory footprint of
all previous versions of CPP; in some cases memory consumption is but
a fraction of what it used to be.  [I'd say that for normal files, of
which stpcpy-test.c is not one, memory usage is a (small) constant +
memory used to store macro definitions].  It also has the best
performance of the 3.x series by a reasonable margin, placed roughly
half-way between 3.0.x and 2.95.x.

When considering the above, note that the 3.x preprocessor has greater
functionality than 2.95.x, e.g. _Pragma (a small but non-trivial
bottleneck, since every identifier token has to be checked to see if
it is _Pragma), better diagnostics, and fixes numerous minor bugs in
the former.  Producing preprocessed output is an extra step for 3.0.x,
but a natural by-product of preprocessing for 2.95.x, which isn't
performed with integrated CPP, that on this machine takes roughly
25-40% of execution time, depending on the file.

3.0.x also takes a lot of care to produce "pretty" preprocessed
output, in particular with the correct spacing of tokens especially
during macro expansion, with no redundant horizontal or vertical
whitespace.  2.95.x did not make an effort to do this, and would get
spacing wrong for macros (which matters for stringification).  Fixing
spacing issues in 2.95.x would lose it a non-trivial amount of time.

I think I can squeeze a little more performance out of function-like
macro expansion, but we're getting close to the limits of what I can
find.  There are small opportunities for cutting down memory usage
further.  Considering the source files in question, I think the above
table indicates that the biggest room for improvement now is the
lexer, which we are currently unnecessarily penalizing by not allowing
it to step back in the input stream.  We should be able to improve the
above figures (apart from stpcpy-test.c) by re-working the lexer, and
possibly from tweaking the output routines that use stdio.

Enjoy!

Neil.
#define __extension__

#define __stpcpy(dest, src) (__extension__ (__builtin_constant_p (src) ? (__string2_1bptr_p (src) && strlen (src) + 1 <= 8 ? __stpcpy_small (dest, __stpcpy_args (src), strlen (src) + 1) : ((char *) __mempcpy (dest, src, strlen (src) + 1) - 1)) : __stpcpy (dest, src)))
#define stpcpy(dest, src) __stpcpy (dest, src)
#define __stpcpy_args(src) __extension__ __STRING2_SMALL_GET16 (src, 0), __extension__ __STRING2_SMALL_GET16 (src, 4), __extension__ __STRING2_SMALL_GET32 (src, 0), __extension__ __STRING2_SMALL_GET32 (src, 4)

#define __mempcpy(dest, src, n) (__extension__ (__builtin_constant_p (src) && __builtin_constant_p (n) && __string2_1bptr_p (src) && n <= 8 ? __mempcpy_small (dest, __mempcpy_args (src), n) : __mempcpy (dest, src, n)))
#define mempcpy(dest, src, n) __mempcpy (dest, src, n)
#define __mempcpy_args(src) ((char *) (src))[0], ((char *) (src))[2], ((char *) (src))[4], ((char *) (src))[6], __extension__ __STRING2_SMALL_GET16 (src, 0), __extension__ __STRING2_SMALL_GET16 (src, 4), __extension__ __STRING2_SMALL_GET32 (src, 0), __extension__ __STRING2_SMALL_GET32 (src, 4)

#define __STRING2_SMALL_GET16(src, idx) (((__const unsigned char *) (__const char *) (src))[idx + 1] << 8 | ((__const unsigned char *) (__const char *) (src))[idx])

#define __STRING2_SMALL_GET32(src, idx) (((((__const unsigned char *) (__const char *) (src))[idx + 3] << 8 | ((__const unsigned char *) (__const char *) (src))[idx + 2]) << 8 | ((__const unsigned char *) (__const char *) (src))[idx + 1]) << 8 | ((__const unsigned char *) (__const char *) (src))[idx])

stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)
stpcpy (stpcpy (stpcpy (stpcpy (a, b), c), d), e)

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]