Bug 19642 - streaming doubles is very slow compared to sprintf
Summary: streaming doubles is very slow compared to sprintf
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 3.4.2
: P2 normal
Target Milestone: 3.4.4
Assignee: Paolo Carlini
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-01-26 14:26 UTC by Jörg Richter
Modified: 2005-02-02 16:48 UTC (History)
1 user (show)

See Also:
Host: powerpc-ibm-aix5.2.0.0
Target: powerpc-ibm-aix5.2.0.0
Build: powerpc-ibm-aix5.2.0.0
Known to work:
Known to fail:
Last reconfirmed: 2005-01-26 17:34:48


Attachments
To be tested on AIX (441 bytes, patch)
2005-01-27 12:25 UTC, Paolo Carlini
Details | Diff
Should help even more on AIX, regtested Ok on linux (--enable-clocale=generic) (462 bytes, patch)
2005-01-27 13:37 UTC, Paolo Carlini
Details | Diff
Wrong attachment (meaningless mix of LC_ALL and LC_NUMERIC), this one is Ok (470 bytes, patch)
2005-01-27 14:03 UTC, Paolo Carlini
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jörg Richter 2005-01-26 14:26:51 UTC
$ g++ -v
Configured with: ../gcc-3.4.2/configure --prefix=/tools/pkg/gcc/3.4.2 --enable-
languages=c,c++ --with-ld=/bin/ld --with-as=/bin/as --disable-threads
Thread model: aix
gcc version 3.4.2

/// t.cc

#include <iostream>
#include <cstdio>

using namespace std;

int main( int argc, char** argv )
{
  std::ios_base::sync_with_stdio( false );

  int count = 100000;
  if( argc > 1 )
    for( int i = 0; i < count; ++i )
      std::cout << (double)i;
  else
    for( int i = 0; i < count; ++i )
    {
      char buf[50];
      sprintf( buf, "%g", (double)i );
      std::cout << buf;
    }
  return 0;
}

///

$ g++ -o t t.cc

$ time t > /dev/null
real    0m0.07s
user    0m0.05s
sys     0m0.00s

$ time t cout > /dev/null
real    0m4.22s
user    0m2.60s
sys     0m1.60s

The std::ios_base::sync_with_stdio( false ); seems to have no effect.

Note: On i686-pc-linux-gnu, both cout and sprintf/<<(char*) take the same time.

Using the AIX system g++ 2.9-aix51-020209, there is also no difference (both 
are fast).
Comment 1 Andrew Pinski 2005-01-26 14:34:25 UTC

*** This bug has been marked as a duplicate of 17140 ***
Comment 2 Paolo Carlini 2005-01-26 14:47:07 UTC
Wait a minute... Ok, we have got 17140, but here, if the issue is confirmed,
we are talking about 2 orders of magnitude!?!?!

Indeed, on my P4-2400 (gcc3.4.3, actually, similar bevavior for mainline)
I cannot confirm the disaster:

paolo:~/Work> g++ -o t t.cc
paolo:~/Work> time t > /dev/null
0.150u 0.000s 0:00.15 100.0%    0+0k 0+0io 219pf+0w
paolo:~/Work> time t cout > /dev/null
0.170u 0.000s 0:00.17 100.0%    0+0k 0+0io 223pf+0w

Either something AIX specific or I don't know, right now... Can you
possibly profile?
Comment 3 Jörg Richter 2005-01-26 15:00:18 UTC
I took some backtraces during a longer run of the program and most of the time 
it looked like this: (I removed some templates to make it readable)

__issetuid
load_locale
setlocale
int std::__convert_from_v<double>(...)
std::ostreambuf_iterator<> std::num_put<>::_M_insert_float<double>(...) const
std::num_put<>::do_put(std::ostreambuf_iterator<>, std::ios_base&, char, 
double) const
std::basic_ostream<>::operator<<(double)
main

Comment 4 Paolo Carlini 2005-01-26 15:10:29 UTC
Honestly, this doesn't help much, the percentages are important: suppose, for
some reason, setlocale is very slow on AIX... Is _GLIBCXX_USE_C99 defined, on
AIX? I'm asking because internally __convert_from_v uses snprintf in that case, 
not sprintf, and we want to be sure about that detail.
Comment 5 Jörg Richter 2005-01-26 15:18:40 UTC
Ok, lets say I took 100 backtraces and got 99 times this one. I hope this is 
enough percentage. 

_GLIBCXX_USE_C99 is defined
Comment 6 Paolo Carlini 2005-01-26 15:22:01 UTC
> Ok, lets say I took 100 backtraces and got 99 times this one. I hope this is 
> enough percentage.

;) No, I mean *relative* percentages: setlocal in your profile is on top, higher
than __convert_from_v and everything else. I'm asking: is consuming 1%, 10%, or
99% of the total runtime?

> _GLIBCXX_USE_C99 is defined

In this case, can you possibly check whether, for some reason, snprintf is
slower than sprintf, on AIX?

Thanks

Comment 7 Paolo Carlini 2005-01-26 15:37:31 UTC
For reference, on linux, a typical profile (count = 10000000) begins as
follows, and:
1- The various snprintf are on top.
2- uselocale (the thread safe equivalent of setlocale) is way below
Therefore, we can easily guess that the performance are not so different
from those of snprintf alone.

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 28.65      5.71     5.71                             __printf_fp
 13.50      8.40     2.69                             __mpn_divrem
  7.28      9.85     1.45                             memcpy
  6.62     11.17     1.32                             buffered_vfprintf
  5.67     12.30     1.13                             vfprintf
  5.22     13.34     1.04                             __mpn_mul_1
  2.51     13.84     0.50                             mempcpy
  2.31     14.30     0.46                            
std::ostream::operator<<(double)
  2.21     14.74     0.44                             __mpn_rshift
  2.01     15.14     0.40                            
std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char,
std::ostreambuf_iterator<char, std::char_traits<char> >
>::_M_insert_float<double>(std::ostreambuf_iterator<char, std::char_traits<char>
>, std::ios_base&, char, char, double) const
  2.01     15.54     0.40                             isinf
  1.81     15.90     0.36                             isnan
  1.46     16.19     0.29                             std::basic_filebuf<char,
std::char_traits<char> >::xsputn(char const*, int)
  1.30     16.45     0.26                             __mpn_extract_double
  1.30     16.71     0.26                             uselocale
Comment 8 Jörg Richter 2005-01-26 15:41:06 UTC
> ;) No, I mean *relative* percentages: setlocal in your profile is on top, 
higher
> than __convert_from_v and everything else. I'm asking: is consuming 1%, 10%, 
or
> 99% of the total runtime?

Doesn't a function that is on 99% of all call stacks automatically consume 99% 
of the runtime? Ok, the time can also be spend in the childs like here in 
__issetuid, but this is an implementation detail of setlocale, isn't it?

> In this case, can you possibly check whether, for some reason, snprintf is
> slower than sprintf, on AIX?

I rewrote the original program to use snprintf and got the same time as with 
sprintf.
Comment 9 Paolo Carlini 2005-01-26 15:45:30 UTC
> Doesn't a function that is on 99% of all call stacks automatically consume 99% 
> of the runtime? Ok, the time can also be spend in the childs like here in 
> __issetuid, but this is an implementation detail of setlocale, isn't it?

% in *time*, see the linux example, nothing special.

Ok about snprintf, it's not that.
Comment 10 Jörg Richter 2005-01-26 16:00:48 UTC
If it helps you, here the first lines of prof. But in practice some backtraces 
help me a lot more than a a [g]prof output. Its very fast in finding big 
bottlenecks, like setlocale in this PR.

count = 1000000

Name                 %Time     Seconds     Cumsecs  #Calls   msec/call
.strncpy              31.5        5.36        5.36
.getenv               19.6        3.34        8.70 5000000      0.0007
.mystrncpy             6.2        1.06        9.76 7000000      0.0002
.getgidx.GL            5.5        0.94       10.70
.__mcount              5.3        0.90       11.60
._doprnt               5.2        0.88       12.48 2000001      0.0004
.setlocale             4.6        0.79       13.27 3000000      0.0003
.strchr                4.2        0.72       13.99
.__issetuid            3.4        0.58       14.57 6000000      0.0001
.getuidx.GL            3.4        0.57       15.14
._ZNSolsEd             1.1        0.18       15.32
.main                  0.8        0.13       15.45       1    130.0
.malloc_y_start        0.8        0.13       15.58
.text                  0.7        0.12       15.70
._ZNKSt7num_putIcSt1   0.6        0.11       15.81
.malloc_y              0.6        0.11       15.92 1000007      0.0001
.strcmp                0.6        0.10       16.02
.free_y                0.6        0.10       16.12 1000002      0.0001
.malloc_y_heap         0.5        0.08       16.20 1000007      0.0001
.strcpy                0.4        0.07       16.27
.text                  0.3        0.05       16.32
.sprintf               0.3        0.05       16.37 1000000      0.0001
.__nl_langinfo_std     0.3        0.05       16.42 2000000      0.0000
.mf2x2                 0.2        0.04       16.46  999999      0.0000
.strcmp                0.2        0.04       16.50
._ZNKSt7num_putIcSt1   0.2        0.04       16.54
._ZdaPv.GL             0.2        0.04       16.58
.setlocale.GL          0.2        0.03       16.61
Comment 11 Paolo Carlini 2005-01-26 16:17:24 UTC
> If it helps you, here the first lines of prof. But in practice some backtraces 
> help me a lot more than a a [g]prof output. Its very fast in finding big 
> bottlenecks, like setlocale in this PR.

Ok, thank you very much. Of course, you are very welcome to provide additional
useful info, but I think we have already a rather good picture of the issue.
Indeed, isn't really the same of 17140, which is more about speeding-up the
task currently performed by snprintf, and only secondarily avoiding setlocale
(what we are using for that, on gnu configs like linux, uselocale that is,
is pretty fast). All in all, this means, that whatever we prepare for 17140
had better avoiding *completely* setlocale & co, otherwise, on AIX we are
going to gain *nothing*. I cannot see alternate short term fixes... Maybe
the AIX maintainers could help with an AIX config exploiting any fast
thread-safe uselocale-equivalent available there and then AIX would use
that specific locale model instead of the generic one. Other ideas?
Comment 12 Paolo Carlini 2005-01-26 17:34:31 UTC
FWIW, the issue doesn't affect linux even if configured --enable-clocale=generic
(thus using setlocale). Therefore, on *some* targets setlocale is slow and causes
it, not everywhere.
Comment 13 David Edelsohn 2005-01-26 19:31:05 UTC
AIX does not yet implement uselocale().  I am not aware of any faster 
interface.  Does "C" locale help?
Comment 14 Jörg Richter 2005-01-27 08:08:44 UTC
std::cout.imbue( std::locale( "C" ) );
  nor
std::cout.imbue( std::locale::classic() );
  nor
export LANG=C

does change anything

   Joerg
Comment 15 Paolo Carlini 2005-01-27 11:11:59 UTC
Hi everyone. No, switching by hand to "C" locale cannot help, because, given
the current generic locale model, the setlocale calls are issued *anyway*. Indeed,
we could envisage improving on this...
Comment 16 Paolo Carlini 2005-01-27 12:25:02 UTC
Can you test the attached? Should help, and we can apply it in 3_4-branch, maybe
in mainline too depending on the res of 17140.
Comment 17 Paolo Carlini 2005-01-27 12:25:55 UTC
Created attachment 8081 [details]
To be tested on AIX
Comment 18 Paolo Carlini 2005-01-27 12:31:10 UTC
Notice, however, that as-is, the patch is not perfect from the correctness point
of view: we should also check LC_NUMERIC...
Comment 19 Jörg Richter 2005-01-27 12:39:57 UTC
A quick test shows, that std::setlocale(LC_ALL, NULL) returns "C C C C C C".
But std::setlocale(LC_NUMERIC, NULL) returns "C"

I think it is enough to test LC_NUMERIC.
I'll try to test a modified patch. But this may take a while.
Comment 20 Paolo Carlini 2005-01-27 12:46:51 UTC
Thanks! For the correctness issue, notice that LC_ALL overrides any LC_*,
therefore checking only LC_NUMERIC seems weaker, still, probably effective in
most circumstances...
Comment 21 Jörg Richter 2005-01-27 12:54:00 UTC
I think this runtimes of the different setlocale() calls might be interesting:

(This is for 1000000 calls)

 category   locale  time
 LC_ALL        0    0.70s
 LC_NUMERIC    0    0.29s
 LC_ALL       "C"   1.63s
 LC_NUMERIC   "C"   8.98s
Comment 22 Paolo Carlini 2005-01-27 12:57:53 UTC
Eh! The runtimes are indeed **very** interesting, thanks! And, also, it looks
like I was wrong in the last comment: in case LC_ALL is set, this is automatically
seen in setlocale(LC_NUMERIC, 0), therefore your suggestion of changing my patch
to just use LC_NUMERIC should work well!
Comment 23 Paolo Carlini 2005-01-27 13:37:42 UTC
Created attachment 8082 [details]
Should help even more on AIX, regtested Ok on linux (--enable-clocale=generic)
Comment 24 Paolo Carlini 2005-01-27 14:03:56 UTC
Created attachment 8083 [details]
Wrong attachment (meaningless mix of LC_ALL and LC_NUMERIC), this one is Ok
Comment 25 Jörg Richter 2005-01-27 15:05:39 UTC
With patch draft_19642_3 and count=1000000

$ time t > /dev/null
real   0,64
user   0,60
sys    0,01

$ time t cout > /dev/null
real   1,14
user   1,12
sys    0,00

Now its only 2 times slower than sprintf. Much better than the ~75 times before.

Also note, that when an other locale than "C" is in place the times also get 
better:

without patch: (count=100000)
real   28,27
user   6,34
sys    21,87

with:
real   5,66
user   1,89
sys    3,78

(sprintf:0,07s)

This case is still horrible slow, but 4-5 times better than before.

Good work so far, but I think the final solution is to avoid any frequent 
setlocale() calls.
Comment 26 Paolo Carlini 2005-01-27 15:59:16 UTC
Excellent. Can you please check whether on AIX this is ok ("de_DE" is just
an example, any actually installed named locale should do):

#include <locale.h>
#include <string.h>
#include <assert.h>

int main()
{
  char* save = setlocale(LC_ALL, "de_DE");
  assert( !save || !strcmp(setlocale(LC_NUMERIC, NULL), "de_DE"));

  return 0;
}

???

In short, we should make sure that any change to LC_ALL is reflected to
LC_NUMERIC. I have checked that this is the case with glibc and also
FreeBSD but we should check all the supported OSes. Thanks again!
Comment 27 Benjamin Kosnik 2005-01-27 16:38:05 UTC
Paolo, just a note. This issue has been raised before on the mailing list. (Or
bugzilla, sorry so vague.) If you search stable/AIX/setlocale you might find
something.

There's known problems with setlocale not being "stable" on AIX. 

You're right, the correct long term fix is something like Jerry's work in 17140.

-benjamin
Comment 28 Paolo Carlini 2005-01-27 16:56:14 UTC
> There's known problems with setlocale not being "stable" on AIX.

Thanks for the pointer. I can find only some rather vague hints, around,
however... "(un)stable"?!? Or 'just' plain slow?!? ;) ;)

> You're right, the correct long term fix is something like Jerry's work in
> 17140.

Beyond doubt. Avoiding completely *any* sort of setlocale/uselocale would fix
those issues completely. However, short and mid-term, the latest attached
patch (*_3) does a good job and seems safe (already checked on glibc (generic)
+ FreeBSD + Solaris). 
Comment 29 Jörg Richter 2005-01-28 07:54:31 UTC
(In reply to comment #26)
> Excellent. Can you please check whether on AIX this is ok ("de_DE" is just
> an example, any actually installed named locale should do):
> [code]
> In short, we should make sure that any change to LC_ALL is reflected to
> LC_NUMERIC. I have checked that this is the case with glibc and also
> FreeBSD but we should check all the supported OSes. Thanks again!

I've checked it with all installed locales I've found. It works for all.
Comment 30 Paolo Carlini 2005-01-28 09:58:43 UTC
Thanks. We are now reasonably confident that the patch is safe, besides
providing a good performance improvement. Most probably we shall apply
it soon, I'll take care of the next steps. Thanks again.
Comment 31 GCC Commits 2005-01-30 14:10:18 UTC
Subject: Bug 19642

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	paolo@gcc.gnu.org	2005-01-30 14:09:59

Modified files:
	libstdc++-v3   : ChangeLog 
	libstdc++-v3/config/locale/generic: c_locale.h 

Log message:
	2005-01-30  Paolo Carlini  <pcarlini@suse.de>
	
	PR libstdc++/19642
	* config/locale/generic/c_locale.h (__convert_from_v): Switch only
	LC_NUMERIC, and only when actually != "C".

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/ChangeLog.diff?cvsroot=gcc&r1=1.2880&r2=1.2881
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/config/locale/generic/c_locale.h.diff?cvsroot=gcc&r1=1.13&r2=1.14

Comment 32 GCC Commits 2005-02-02 16:47:00 UTC
Subject: Bug 19642

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	gcc-3_4-branch
Changes by:	paolo@gcc.gnu.org	2005-02-02 16:46:40

Modified files:
	libstdc++-v3   : ChangeLog 
	libstdc++-v3/config/locale/generic: c_locale.h 

Log message:
	2005-02-02  Paolo Carlini  <pcarlini@suse.de>
	
	PR libstdc++/19642
	* config/locale/generic/c_locale.h (__convert_from_v): Switch only
	LC_NUMERIC, and only when actually != "C".

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.2224.2.213&r2=1.2224.2.214
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/config/locale/generic/c_locale.h.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.11.10.2&r2=1.11.10.3

Comment 33 Paolo Carlini 2005-02-02 16:48:12 UTC
This specific issue is fixed for 4.0.0 and 3.4.4.