RFC: fp printing speedup patch

Jerry Quinn jlquinn@optonline.net
Thu Nov 13 03:20:00 GMT 2003


Paolo Carlini writes:
 > Jerry Quinn wrote:
 > 
 > >We *REALLY* don't want to have a widen loop in the middle of
 > >insert_int or insert_float.  That will kiss the performance gains from
 > >caching goodbye.
 > >
 > Unfortunately, due to that item in the standard we *MUST*... I have
 > carried some preliminary tests: the speed of _M_insert_float is not
 > affected, since we are already doing the widen(array) call, which
 > involves a memcopy and we trade that for a series of essentially nop
 > widen(char) (this for plain chars, of course). The performance of
 > _M_insert_int is not really killed, is affected a bit, indeed... some
 > like 10-15% slower wrt the current very fast version...

I agree that a loop must be used instead of an array call.  My point
was that we don't want do be doing widen() at all during _M_insert_int
or float.  At the moment for int, the widen call happens when
numpunct_cache is constructed.

 > ><rant-query> Why does the standard specify this less efficient
 > >approach to widening the output?  It requires N virtual calls instead of
 > >one call.  Ick!  Anyone know if the intent was for the array version
 > >and char version to give the same results?</rant-query>
 > >
 > Perhaps because of what I wrote above? If you have to properly convert
 > all the chars anyway, in the most common case (plain chars no user
 > provided do_widen) many virtual function calls are still faster than
 > memcopy.

Why?  A memcopy is a single call, and many virtual function calls are
going to cost much more by comparison.  The work happening in these
functions is small, which is why it's an issue.  However, the numpunct
cache should amortize the cost to nothing for a real program.

 > >Second, and probably more important, the cache doesn't get rebuilt
 > >when the new locale is created.  The numpunct<char> cache is created
 > >for the C locale.  When the new locale is created, the numpunct<char>
 > >cache from the C locale is referenced, rather than constructing a new
 > >one using the new ctype<char>.
 > >
 > Here, I lost you... I hope to return to this later today: in _M_insert_float
 > we already have the machinery, only we have to call widen(char) instead
 > of widen(array). Do you mean we have a latent bug unrelated to my PR of
 > yesterday? Which cache? It seems to me that we don't use a cache at all
 > for widen?!?!

__numpunct_cache is used to pay the cost of widening the appropriate
characters once when _M_insert_int executes.  When it is created, it
widens the characters that will be used to render integer numbers.
Future calls to _M_insert_int use the prewidened array.

The bug is that when the new locale is created, the array in
__numpunct_cache doesn't get updated by redoing the widen calls on
this array with the new ctype facet.  As a result, we get integers
rendered using the standard ctype instead of the modified one.

 
 > Paolo.
 > 
 > P.S. What about those numbers for vanilla 3.4?
 > 

I went back and figured out that I had run 3.4 on the unaltered
library, i.e. I forgot to install the modified library.

So, here are the updated numbers.  I'm actually very surprised!  I
didn't think we could do better than 2.95 since 2.95 didn't do a
number of the things required by the standard:

2.95

real    0m4.383s
user    0m4.380s
sys     0m0.000s

3.3.2

real    0m14.521s
user    0m14.310s
sys     0m0.010s

3.4

real    0m7.026s
user    0m7.010s
sys     0m0.000s

3.4+fppatch

real    0m2.884s
user    0m2.880s
sys     0m0.000s

Jerry



More information about the Libstdc++ mailing list