Bug 14563 - new/delete much slower than malloc/free because of sjlj exceptions
Summary: new/delete much slower than malloc/free because of sjlj exceptions
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 3.3.3
: P2 normal
Target Milestone: 4.3.0
Assignee: Not yet assigned to anyone
URL:
Keywords: EH, missed-optimization, sjlj-eh
: 18414 (view as bug list)
Depends on:
Blocks:
 
Reported: 2004-03-12 23:36 UTC by Paul Thomas
Modified: 2021-02-21 17:41 UTC (History)
5 users (show)

See Also:
Host:
Target: sjlj targets
Build:
Known to work: 3.2
Known to fail: 3.3.3, 3.4.2
Last reconfirmed: 2005-12-07 04:58:14


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Thomas 2004-03-12 23:36:06 UTC
I am sorry if this report is imprecise and probably unhelpful but several
contributers to octave lists have failed to identify where the problem lies.

The problem is simply stated and is unique to builds of octave-2.1.xx under
latest version of Cygwin: interpreted octave programmes run 6-7 times slower
when compiled and linked with gcc-3.3.3 and its libraries, compared with 3.2.3.

Note that (i) This is problem is not present in Linux and (ii) is not dependent
on which continent one is in!  ie. This problem is reproducible.

Beyond this, a variety of optimisation levels, architecture settings and so on
were tried, to no avail.
Comment 1 Andrew Pinski 2004-03-13 07:24:16 UTC
Can you provide a testcase where 3.3.3 is slower than 3.2.3?
Comment 2 Paul Thomas 2004-03-13 08:06:19 UTC
Subject: Re:  octave built under Cygwin very slow

I am not in a position to do so right now; the system that has Cygwin 
loaded on it is at work - I am 100% linux at home.  Unfortunately, I am 
out of town next week, so cannot get back to this until a week Monday. 
 I have cc'd this to Ben Diedrich in the hope that he can help you.

An obvious question, is what form would you like the test cases in?  The 
output from octave benchmarks or the octave binaries?  If it is the 
former, you will be able to get started on the octave mailing list:

http://www.octave.org/mailing-lists/help-octave/2004/325
http://www.octave.org/mailing-lists/help-octave/2004/337
http://www.octave.org/mailing-lists/help-octave/2004/339    <--- 
contains configuration details
http://www.octave.org/mailing-lists/help-octave/2004/389    <--- ditto

If you want the binaries, that will take a little longer....  It might 
be more practical, if you have the time, to build octave yourself.  We 
used octave-2.1.50 for our attempts to pin-point the problem because the 
"good" binary from octave-forge used this version.  However, 2.1.53 and 
2.1.56 both were slow when built with gcc-3.3.3

Ben, did you keep the deffective binaries?

Paul Thomas

pinskia at gcc dot gnu dot org wrote:

>------- Additional Comments From pinskia at gcc dot gnu dot org  2004-03-13 07:24 -------
>Can you provide a testcase where 3.3.3 is slower than 3.2.3?
>
>  
>


Comment 3 Wolfgang Bangerth 2004-03-14 20:33:01 UTC
One of the things you could do is to compile the octave version you 
use with both 3.2.3 and 3.3.3, in each case with profiling information 
(i.e. with -pg) and check the output to see whether there is one single 
function that now takes significantly longer (and thus moved up in 
the output of gprof, in the list of functions sorted by execution time). 
 
If there should be one such function, it might be worth to compile just this 
one function (or file) with the other compiler, to verify that it indeed 
is the problem. This way we could at least localize the problem a little 
better. This would also be of great value to us, since we have no clue 
about the octave functions, and it is hard for us to look at profiling 
output without knowing what all these functions do etc. 
 
If you have identified the function that has the slowdown (assuming that it 
is a single function), then one would rip it out of the program and place 
it (and whatever else it needs) into a small file where main() simply calls 
this function a number of times with dummy arguments. This way we would 
have a simpler and much smaller testcase. 
 
W. 
Comment 4 Paul Thomas 2004-03-24 09:52:06 UTC
Subject: Re:  octave built under Cygwin very slow

bangerth at dealii dot org wrote:

>------- Additional Comments From bangerth at dealii dot org  2004-03-14 20:33 -------
>One of the things you could do is to compile the octave version you 
>use with both 3.2.3 and 3.3.3, in each case with profiling information 
>(i.e. with -pg) and check the output to see whether there is one single 
>function that now takes significantly longer (and thus moved up in 
>the output of gprof, in the list of functions sorted by execution time). 
> 
>If there should be one such function, it might be worth to compile just this 
>one function (or file) with the other compiler, to verify that it indeed 
>is the problem. This way we could at least localize the problem a little 
>better. This would also be of great value to us, since we have no clue 
>about the octave functions, and it is hard for us to look at profiling 
>output without knowing what all these functions do etc. 
> 
>If you have identified the function that has the slowdown (assuming that it 
>is a single function), then one would rip it out of the program and place 
>it (and whatever else it needs) into a small file where main() simply calls 
>this function a number of times with dummy arguments. This way we would 
>have a simpler and much smaller testcase. 
> 
>W. 
>
>  
>
Ben Diedrich and I have done as you suggested.  It has to be said that 
neither of us have tried profiling anything, let alone a code as large 
and as complicated as octave.  I did the gcc-3.2 and Ben did the 
gcc-3.3. Our machines are nearly identical, both with Windows 2000 and 
with the same version of Cygwin.

We ran a small test programme in octave that shows a factor of 
approximately 6 difference in execution time, when octave is built with 
gcc-3.2 (~1.8-2s) or gcc-3.3 (10-12s).  When compiled and linked with 
-pg set, the execution time increased to 9s with gcc-3.2 (Ben, what is 
the corresponding figure for gcc-3.3?).  Switching off the -O2 flags, 
increased the time by a further 3s but resulted in a much more detailed 
profile.  In the enclosed, we increased the number of loops by a factor 
of 10, in order to get a reasonable resolution on the less frequently 
visited functions.  So remember that the corresponding wall-clock time 
is 90s for cycle 2.

I have added the most pertinent part of the graphical output.  If you 
want to see the entirity of the gprof output, I can easily forward it to 
you for both  builds.

To first order, there is an enormous amount of unaccounted time but both 
builds are more or less identical in their time in octave functions. 
 The only significant difference that I can see is the appearance in the 
gcc-3.3 version of 

                                                 <spontaneous>
[7]     22.3    5.62    0.00                 _Unwind_SjLj_Register [7]

                                                 <spontaneous>
[9]     12.4    3.11    0.00                 _Unwind_SjLj_Unregister [9]

which together take a significant amout of time.  What are these calls 
and could they be the culprits?

I will be spending a bit of time over the next few days to get a better 
feel for profiling and its relationship to the execution times without 
-pg set.  Whilst the time spent in various routines does change in 
correct proportion to the content of the octave code, the absolute 
magnitudes are way out.

I hope that this helps to give you a clue.  Any advice that you can 
offer on the profiling would be gratefully received.

Paul Thomas
Profiles for octave-2.1.50 built using (i) gcc-3.2 and (ii) gcc-3.3

Paul Thomas and Ben Diedrich   03/23/04

Both run with test programme, entered from octave command line:

a = cputime ; tot = 0 ; x = [1:1e6] ; for i = 1:1e6 ; tot = tot + x(i) ; end ; disp(cputime-a)

The individual lines in the graphical output can be identified with features in the test programme and the times correspond
well, in proportion, with timing tests done in octave.  The absolute times are way out. 

granularity: each sample hit covers 4 byte(s) for 0.04% of 25.18 seconds

index % time    self  children    called     name

>>>>>Built with gcc-3.2 
		
[3]     70.7    2.33    6.14       4+16000183 <cycle 2 as a whole> [3]
                0.27    2.15 1000017             tree_index_expression::rvalue(int) <cycle 2> [8]
                0.29    1.06 3000033             tree_identifier::rvalue(int) <cycle 2> [12]
                0.24    0.58 1000015             tree_simple_assignment::rvalue() <cycle 2> [17]
                0.22    0.51 1000022             tree_statement::eval(bool, int, bool) <cycle 2> [19]
                0.19    0.47 1000009+4           tree_binary_expression::rvalue() <cycle 2> [20]
                0.28    0.24 1000003             tree_argument_list::convert_to_const_vector(octave_value const*) <cycle 2> [23]
                0.21    0.26 3000032             tree_identifier::rvalue() <cycle 2> [27]
                0.14    0.23 1000015             tree_simple_assignment::rvalue(int) <cycle 2> [32]
                0.17    0.13 1000006             tree_statement_list::eval(bool, int) <cycle 2> [36]
                0.09    0.17 1000005             make_value_list(tree_argument_list*, string_vector const&, octave_value const*) <cycle 2> [40]
                0.15    0.09 1000014             tree_index_expression::rvalue() <cycle 2> [45]
                0.02    0.19 1000000             tree_simple_for_command::do_for_loop_once(octave_lvalue&, octave_value const&, bool&) <cycle 2> [47]
                0.06    0.06       1             tree_simple_for_command::eval() <cycle 2> [64]
                0.00    0.00       2             octave_user_function::do_multi_index_op(int, octave_value_list const&) <cycle 2> [184]
                0.00    0.00       3             octave_value::do_multi_index_op(int, octave_value_list const&) <cycle 2> [208]
                0.00    0.00       2             tree_parameter_list::convert_to_const_vector(tree_va_return_list*) <cycle 2> [242]
                0.00    0.00       2             tree_if_command::eval() <cycle 2> [1079]
                0.00    0.00       2             tree_if_command_list::eval() <cycle 2> [1084]
                0.00    0.00       2             tree_if_clause::eval() <cycle 2> [1077]
---

>>>>>Built with gcc-3.3

[3]     42.7    3.07    7.68       3+16000180 <cycle 2 as a whole> [3]
                0.36    2.66 1000017             tree_index_expression::rvalue(int) <cycle 2> [10]
                0.36    1.27 3000032             tree_identifier::rvalue(int) <cycle 2> [14]
                0.41    0.88 1000015             tree_simple_assignment::rvalue() <cycle 2> [17]
                0.25    0.66 1000021             tree_statement::eval(bool, int, bool) <cycle 2> [21]
                0.13    0.64 1000009+4           tree_binary_expression::rvalue() <cycle 2> [25]
                0.26    0.31 1000015             tree_simple_assignment::rvalue(int) <cycle 2> [29]
                0.29    0.28 1000003             tree_argument_list::convert_to_const_vector(octave_value const*) <cycle 2> [30]
                0.27    0.29 3000032             tree_identifier::rvalue() <cycle 2> [31]
                0.30    0.13 1000005             tree_statement_list::eval(bool, int) <cycle 2> [35]
                0.17    0.17 1000005             make_value_list(tree_argument_list*, string_vector const&, octave_value const*) <cycle 2> [44]
                0.05    0.24 1000000             tree_simple_for_command::do_for_loop_once(octave_lvalue&, octave_value const&, bool&) <cycle 2> [49]
                0.10    0.10 1000014             tree_index_expression::rvalue() <cycle 2> [63]
                0.12    0.06       1             tree_simple_for_command::eval() <cycle 2> [69]
                0.00    0.00       2             octave_user_function::do_multi_index_op(int, octave_value_list const&) <cycle 2> [186]
                0.00    0.00       2             tree_parameter_list::convert_to_const_vector(tree_va_return_list*) <cycle 2> [233]
                0.00    0.00       2             octave_value::do_multi_index_op(int, octave_value_list const&) <cycle 2> [1058]
                0.00    0.00       2             tree_if_command::eval() <cycle 2> [1068]
                0.00    0.00       2             tree_if_command_list::eval() <cycle 2> [1074]
                0.00    0.00       2             tree_if_clause::eval() <cycle 2> [1066]


>>>>Lines that do not appear in the version built with gcc-3.2

                                                 <spontaneous>
[7]     22.3    5.62    0.00                 _Unwind_SjLj_Register [7]

                                                 <spontaneous>
[9]     12.4    3.11    0.00                 _Unwind_SjLj_Unregister [9]

                                                 <spontaneous>
[11]    10.2    2.56    0.00                 operator new(unsigned int) [11]
Comment 5 Wolfgang Bangerth 2004-03-24 15:57:43 UTC
First, thanks for you efforts! 
 
The Unwind_SjLj_* functions have to do with exceptions. Danny, I CC: 
you because here's a cygwin question: is sjlj the default on windows, 
and are you aware of any significant changes in this area that could 
affect this? 
 
Paul&Ben: even if the problem is in this function, am I correct with 
my math that these functions only account for at most about 1/4 of 
the run-time? If that is the case, then they can't make up for a 
six-fold increase in run-time... 
 
W. 
Comment 6 Paul Thomas 2004-03-24 16:38:43 UTC
Subject: Re:  octave built under Cygwin very slow

bangerth at dealii dot org wrote:

>------- Additional Comments From bangerth at dealii dot org  2004-03-24 15:57 -------
>First, thanks for you efforts! 
> 
>The Unwind_SjLj_* functions have to do with exceptions. Danny, I CC: 
>you because here's a cygwin question: is sjlj the default on windows, 
>and are you aware of any significant changes in this area that could 
>affect this? 
> 
>Paul&Ben: even if the problem is in this function, am I correct with 
>my math that these functions only account for at most about 1/4 of 
>the run-time? If that is the case, then they can't make up for a 
>six-fold increase in run-time... 
> 
>W. 
>
>  
>
This is why I am questioning the calibration of the profiling - none of 
the times add up.  Is it reasonable that the faster of the two builds be 
bumped up from 20 to 90s runtime with profiling?  If so, should I expect 
to see the total profile time add up to the original 20s?  In fact, the 
total that I can find in the profiling is about 12seconds, including 
octave start-up.  Even if the latter is negligible, I am missing 40% of 
the unprofiled execution time and 80% or so of the wall-clock time.

Am I right in thinking that setting -fno-exceptions will suppress sjlj 
if it is the default?  Perhaps we should try that as an experiment?

Paul T

Comment 7 Wolfgang Bangerth 2004-03-24 17:03:36 UTC
sjsl exceptions is one way to implement exception handling. On most 
systems, we have moved to more efficient ways, such as dwarf2 
unwinding. However, if you don't use exceptions, trying things out 
with -fno-exceptions may be an interesting experiment anyway. 
 
Regarding the times not adding up: I think this is usual for gprof. In 
fact, gprof is not a very good tool anyway, but it is the one that is 
most widely available. There are more accurate ones, which I have never 
used myself, though, so I can't say anything about them. I think I 
remember people being quite fond of oprof. It may also be the case 
that valgrind can produce some sort of information, but I don't know about 
that exactly. 
 
W. 
Comment 8 Ben.Diedrich@noaa.gov 2004-03-24 21:07:41 UTC
Subject: Re:  octave built under Cygwin very slow

Paul Thomas wrote:

> We ran a small test programme in octave that shows a factor of
> approximately 6 difference in execution time, when octave is built with
> gcc-3.2 (~1.8-2s) or gcc-3.3 (10-12s).  When compiled and linked with
> -pg set, the execution time increased to 9s with gcc-3.2 (Ben, what is
> the corresponding figure for gcc-3.3?).  Switching off the -O2 flags,
> increased the time by a further 3s but resulted in a much more detailed
> profile.  In the enclosed, we increased the number of loops by a factor
> of 10, in order to get a reasonable resolution on the less frequently
> visited functions.  So remember that the corresponding wall-clock time
> is 90s for cycle 2.
>

The execution time of Octave 2.1.50 with profiling turned on and GCC 3.3.1 was ~280 seconds. That is with the test command:
a = cputime ; tot = 0 ; x = [1:1e6] ; for i = 1:1e6 ; tot = tot + x(i) ; end ; disp(cputime-a)

The time is ~28 seconds with 1e5 for loop steps.

Ben

Comment 9 Danny Smith 2004-03-24 22:58:11 UTC
I am a bit unclear what version of gcc was used for the "fast" precompiled 
octave.  Was it really gcc-3.2.3 or gcc-3.2-3 (the third cygwin update of gcc-
3.2.0)?

What does gcc -v say for the gcc that built the "fast" octave. 

The cygwin gcc-3.2 distro's (dated about August 2002) had a local patch that 
enabled Dwarf2 exceptions.  This worked fine except when functions throwing 
exceptions were used as callbacks by win32api functions. So the experimented 
was terminated and the EH model was reverted to sjlj in later binary distro's 
of gcc.

If this is really a difference between sjlj and Dwarf2, I think it is time to 
revisit Dwarf2 support on windows targets.

Danny
Comment 10 Paul Thomas 2004-03-25 06:40:31 UTC
Subject: Re:  octave built under Cygwin very slow

Danny,

I am away from base right now and do not have access to any of the 
installations - Ben Diedrich can supply you with the version number for 
the "good" gcc; ie. that of the octave-forge binary distribution.  For 
your information, Ben tells me  that the slow build, with profiling, 
runs slightly more than three times more slowly than the fast build with 
profiling.  Since the only significant difference in the profiles is the 
presence in the slow build of sjlj calls,..... j'accuse!  Otherwise, the 
problem must lie in something outside the scope of the profiling.

If "W" does not automatically get this, could you ensure that it is 
forwarded to him, please?

Paul Thomas

PS Thank you both for your rapid responses to this problem; it is 
something that has been perplexing us a lot.

dannysmith at users dot sourceforge dot net wrote:

>------- Additional Comments From dannysmith at users dot sourceforge dot net  2004-03-24 22:58 -------
>I am a bit unclear what version of gcc was used for the "fast" precompiled 
>octave.  Was it really gcc-3.2.3 or gcc-3.2-3 (the third cygwin update of gcc-
>3.2.0)?
>
>What does gcc -v say for the gcc that built the "fast" octave. 
>
>The cygwin gcc-3.2 distro's (dated about August 2002) had a local patch that 
>enabled Dwarf2 exceptions.  This worked fine except when functions throwing 
>exceptions were used as callbacks by win32api functions. So the experimented 
>was terminated and the EH model was reverted to sjlj in later binary distro's 
>of gcc.
>
>If this is really a difference between sjlj and Dwarf2, I think it is time to 
>revisit Dwarf2 support on windows targets.
>
>Danny
>
>  
>


Comment 11 Wolfgang Bangerth 2004-03-25 13:43:35 UTC
Danny, this was exactly the feedback I was hoping for from you! :-)  
Let's wait what we find out about the version string and whether  
-fno-exceptions changes something. Is there a way to change the  
exception model short of recompiling everything?  
  
Thanks  
  W. (= Wolfgang, but too tired of writing this out every time; and 
        besides, everyone seems to know me on this list anyway :-)  
Comment 12 Ben.Diedrich@noaa.gov 2004-03-25 14:15:42 UTC
Subject: Re:  octave built under Cygwin very slow

Paul Thomas wrote:

> Danny,
>
> I am away from base right now and do not have access to any of the
> installations - Ben Diedrich can supply you with the version number for
> the "good" gcc; ie. that of the octave-forge binary distribution.  For
> your information, Ben tells me  that the slow build, with profiling,
> runs slightly more than three times more slowly than the fast build with
> profiling.  Since the only significant difference in the profiles is the
> presence in the slow build of sjlj calls,..... j'accuse!  Otherwise, the
> problem must lie in something outside the scope of the profiling.
>
> If "W" does not automatically get this, could you ensure that it is
> forwarded to him, please?
>
> Paul Thomas
>
> PS Thank you both for your rapid responses to this problem; it is
> something that has been perplexing us a lot.

Here are the results of 'gcc -v' for the compiler that results in a fast octave. Note that it includes the
option '--disable-sjlj-exceptions'.

$ gcc -v
Reading specs from /usr/lib/gcc-lib/i686-pc-cygwin/3.2/specs
Configured with: /netrel/src/gcc-3.2-3/configure --enable-languages=c,c++,f77,java --enable-libgcj
--enable-threads=posix --with-system-zlib --enable-nls --without-included-gettext --enable-interpreter
--disable-sjlj-exceptions --disable-version-specific-runtime-libs --enable-shared --build=i686-pc-linux
--host=i686-pc-cygwin --target=i686-pc-cygwin --enable-haifa --prefix=/usr --exec-prefix=/usr
--sysconfdir=/etc --libdir=/usr/lib --includedir=/nonexistent/include --libexecdir=/usr/sbin
Thread model: posix
gcc version 3.2 20020927 (prerelease)

Here are the results for the compiler that gives a slower octave. I noticed that this one has the option
'--enable-sjlj-exceptions':

$ gcc -v
Reading specs from /usr/lib/gcc-lib/i686-pc-cygwin/3.3.1/specs
Configured with: /GCC/gcc-3.3.1-3/configure --with-gcc --with-gnu-ld --with-gnu-as --prefix=/usr
--exec-prefix=/usr --sysconfdir=/etc --libdir=/usr/lib --libexecdir=/usr/sbin --mandir=/usr/share/man
--infodir=/usr/share/info --enable-languages=c,ada,c++,f77,pascal,java,objc --enable-libgcj
--enable-threads=posix --with-system-zlib --enable-nls --without-included-gettext --enable-interpreter
--enable-sjlj-exceptions --disable-version-specific-runtime-libs --enable-shared --disable-win32-registry
--enable-java-gc=boehm --disable-hash-synchronization --verbose --target=i686-pc-cygwin
--host=i686-pc-cygwin --build=i686-pc-cygwin
Thread model: posix
gcc version 3.3.1 (cygming special)

Ben



Comment 13 Paul Thomas 2004-03-25 14:17:06 UTC
Subject: Re:  octave built under Cygwin very slow

Danny and Wolfgang,

Sorry but neither Ben or I are frequentees of your list so we did not 
know.  Hello Wolfgang!

I hope that Ben is in today, to get the version string to you.

As for the -fno-exceptions, what will happen if we do this, when...

"Octave is now using exceptions for error handling rather than 
setjmp-longjmp.   See libcruft/misc/quit.h for details. "  (from Paul 
Kienzle)?   You can take a look at quit.h by going to 
http://pareto.uab.es/mcreel/OctaveClassReference/html/index.html and 
looking it up in the file list.

If the build is going to fail,  I do not think anybody should waste 
their time trying since it is quite a long process.

Paul T


bangerth at dealii dot org wrote:

>------- Additional Comments From bangerth at dealii dot org  2004-03-25 13:43 -------
>Danny, this was exactly the feedback I was hoping for from you! :-)  
>Let's wait what we find out about the version string and whether  
>-fno-exceptions changes something. Is there a way to change the  
>exception model short of recompiling everything?  
>  
>Thanks  
>  W. (= Wolfgang, but too tired of writing this out every time; and 
>        besides, everyone seems to know me on this list anyway :-)  
>
>  
>


Comment 14 Paul Thomas 2004-03-25 14:25:50 UTC
Subject: Re:  octave built under Cygwin very slow

Danny and Wolfgang,

It seems that I was a bit too quick off the mark just now - thanks Ben!

So, the obvious question is how to do a build with sjlj disabled that is 
consistent with the way in which octave handles exceptions.  Does 
-disable-sjlj-exceptions do it without throwing up unrequited references 
during the linking?

Just to satisfy my curiosity and I would think, Ben's, just what is sjlj 
(baby-talk, please) and is this yet another thing that I should know and 
care about?

Paul T

Ben Diedrich wrote:

>Paul Thomas wrote:
>
>  
>
>>Danny,
>>
>>I am away from base right now and do not have access to any of the
>>installations - Ben Diedrich can supply you with the version number for
>>the "good" gcc; ie. that of the octave-forge binary distribution.  For
>>your information, Ben tells me  that the slow build, with profiling,
>>runs slightly more than three times more slowly than the fast build with
>>profiling.  Since the only significant difference in the profiles is the
>>presence in the slow build of sjlj calls,..... j'accuse!  Otherwise, the
>>problem must lie in something outside the scope of the profiling.
>>
>>If "W" does not automatically get this, could you ensure that it is
>>forwarded to him, please?
>>
>>Paul Thomas
>>
>>PS Thank you both for your rapid responses to this problem; it is
>>something that has been perplexing us a lot.
>>    
>>
>
>Here are the results of 'gcc -v' for the compiler that results in a fast octave. Note that it includes the
>option '--disable-sjlj-exceptions'.
>
>$ gcc -v
>Reading specs from /usr/lib/gcc-lib/i686-pc-cygwin/3.2/specs
>Configured with: /netrel/src/gcc-3.2-3/configure --enable-languages=c,c++,f77,java --enable-libgcj
>--enable-threads=posix --with-system-zlib --enable-nls --without-included-gettext --enable-interpreter
>--disable-sjlj-exceptions --disable-version-specific-runtime-libs --enable-shared --build=i686-pc-linux
>--host=i686-pc-cygwin --target=i686-pc-cygwin --enable-haifa --prefix=/usr --exec-prefix=/usr
>--sysconfdir=/etc --libdir=/usr/lib --includedir=/nonexistent/include --libexecdir=/usr/sbin
>Thread model: posix
>gcc version 3.2 20020927 (prerelease)
>
>Here are the results for the compiler that gives a slower octave. I noticed that this one has the option
>'--enable-sjlj-exceptions':
>
>$ gcc -v
>Reading specs from /usr/lib/gcc-lib/i686-pc-cygwin/3.3.1/specs
>Configured with: /GCC/gcc-3.3.1-3/configure --with-gcc --with-gnu-ld --with-gnu-as --prefix=/usr
>--exec-prefix=/usr --sysconfdir=/etc --libdir=/usr/lib --libexecdir=/usr/sbin --mandir=/usr/share/man
>--infodir=/usr/share/info --enable-languages=c,ada,c++,f77,pascal,java,objc --enable-libgcj
>--enable-threads=posix --with-system-zlib --enable-nls --without-included-gettext --enable-interpreter
>--enable-sjlj-exceptions --disable-version-specific-runtime-libs --enable-shared --disable-win32-registry
>--enable-java-gc=boehm --disable-hash-synchronization --verbose --target=i686-pc-cygwin
>--host=i686-pc-cygwin --build=i686-pc-cygwin
>Thread model: posix
>gcc version 3.3.1 (cygming special)
>
>Ben
>
>
>
>
>  
>


Comment 15 Wolfgang Bangerth 2004-03-25 14:36:54 UTC
SJLJ stands for "setjmp/longjmp". I'm not an expert in this field 
(as I know virtually nothing about the gcc interiors anyway, I'm 
just the bug database dude), but here's the idea: when you call 
a function that may or may not throw an exception, and the calling 
function needs to run destructors of local objects in case an exception 
is thrown, you need to put down the address of the cleanup code somewhere. 
One way to do this is to set this address via setjmp, and throwing an 
exception then transfers control to this place via longjmp. This is expensive 
since you have to call setjmp every time a cleanup is necessary. 
 
The other possibility is to use lookup tables that the compiler generates 
statically, so this is cheap at run-time, but incurs some code overhead. If 
you generate an exception, you have to somehow look up where to transfer 
execution. Don't ask me how exactly this works, but it is to my best 
knowledge how dwarf2 exception unwinding works. Corrections on this topic 
my more knowledgable people are certainly welcome. 
 
Now back to the question how we can figure out what the problem is: if 
using -fno-exceptions doesn't work, is there a possibility you repeat 
your experiments with an octave version prior to the introduction of 
exceptions? 
 
W. 
Comment 16 Ben.Diedrich@noaa.gov 2004-03-25 15:37:06 UTC
Subject: Re:  octave built under Cygwin very slow

I can confirm that compiling octave 2.1.50 under gcc-3.3.1 with the g++ flag '-fno-exceptions' fails to compile
quit.cc with the following error:

g++ -c -pg -O2 -I. -I../.. -I../../liboctave -I../../src -I../../libcruft/misc  -I../../glob -I../../glob
-DHAVE_CONFIG_H -mieee-fp -pg -O2 -fno-exceptions quit.cc -o quit.o
quit.cc: In function `void octave_throw_interrupt_exception()':
quit.cc:105: error: exception handling disabled, use -fexceptions to enable

Ben


Comment 17 Paul Thomas 2004-03-25 16:41:07 UTC
Subject: Re:  octave built under Cygwin very slow

Hey come on!  For the "bug database dude" your grown-up talk is 
impressive enough to a toddler like me. Anyway,  I thank you for the 
explanation of setjmp/longjmp - it has been on my list of things to gen 
up on for a while.  It crosses my mind in discussing this that it is a 
bit aberrant to keep doing this within a loop of interpreted code, isn't it?

Anyhow, you will have seen Ben's message about trying a build with 
-fno-exceptions; as I suspected it did not get past quit.h.  I will put 
the question about the timing of the introduction of exceptions onto the 
list.  This perhaps will be the best route to settle the question as to 
whether or not sjlj is the root cause of the slow-down.

Thanks again

Paul Thomas

bangerth at dealii dot org wrote:

>------- Additional Comments From bangerth at dealii dot org  2004-03-25 14:36 -------
>SJLJ stands for "setjmp/longjmp". I'm not an expert in this field 
>(as I know virtually nothing about the gcc interiors anyway, I'm 
>just the bug database dude), but here's the idea: when you call 
>a function that may or may not throw an exception, and the calling 
>function needs to run destructors of local objects in case an exception 
>is thrown, you need to put down the address of the cleanup code somewhere. 
>One way to do this is to set this address via setjmp, and throwing an 
>exception then transfers control to this place via longjmp. This is expensive 
>since you have to call setjmp every time a cleanup is necessary. 
> 
>The other possibility is to use lookup tables that the compiler generates 
>statically, so this is cheap at run-time, but incurs some code overhead. If 
>you generate an exception, you have to somehow look up where to transfer 
>execution. Don't ask me how exactly this works, but it is to my best 
>knowledge how dwarf2 exception unwinding works. Corrections on this topic 
>my more knowledgable people are certainly welcome. 
> 
>Now back to the question how we can figure out what the problem is: if 
>using -fno-exceptions doesn't work, is there a possibility you repeat 
>your experiments with an octave version prior to the introduction of 
>exceptions? 
> 
>W. 
>
>  
>


Comment 18 Paul Thomas 2004-03-28 21:18:59 UTC
Subject: Re: [Fwd:  octave built under Cygwin very slow]

Well, we seem to have got rid of the smoked fish (sorry, red herring) 
and now have a smoking howitzer......

Paul,

It strikes me that not only is new/delete slow for cygwin331 but that 
malloc/delete must also take most of the execution time for the octave 
tests.  These seem to be totally excluded from the profiling.

I have added the Intel, Visual C and gcc331 times for Windows XP on an 
Athlon 1700

Paul T

PS I would have added the exit but I was going to bash ctrl-c is 
anything went wrong with the allocation.

Paul Kienzle wrote:

> Tests of malloc and new [] for cygwin and mingw 3.2 and 3.3 and linux 
> gcc 3.3.
> Someone please fill in numbers for 'native' windows compilers, such as
> visual C and Intel.
>
> === Times, running under msys on a Windows 2000 PII-300 system
>
> System        real        user        sys
> mingw333    17.936    0.030    0.040
> cygwin331    72.394    0.020    0.060
> Cmingw333    12.277    0.010    0.060
> Ccygwin331    24.355    0.030    0.050
>
> System        real        user        sys
> mingw323    18.837    0.020    0.040
> mingw32        14.160    0.010    0.060
> cygwin32        15.933    0.020    0.050
> Cmingw32    12.668    0.030    0.040
> Ccygwin32    14.410    0.010    0.080

Paul Thomas adds...

   === Elapsed times running under Windows XP on an Athlon 1700

  System              execution time (octave> 
tic;system('./malloctest.exe');toc
  intel                   2.19
  VC                    2.17
  cygwin331        19.86
  Cintel                2.58
  CVC                 2.37
  Ccygwin331     4.34   
 

>
> === Times, running under bash on a Debian PII-400 system
>
> System        real        user        sys
> linux332         4.808    4.800    0.010
> Clinux332     3.162    3.160    0.000
>
> === Versions
>
> mingw32        3.2 (mingw special 20020817-1)
> mingw323    3.2.3 (mingw special 20030504-1)
> mingw333    3.3.3 (mingw special)
> cygwin32        3.2 (20020927 prerelease), linked against stdc++.dll
> cygwin331    3.3.1-3 (cygming special)
> linux332        3.3.2 20030908 (Debian prerelease)
>
> === C++ Compiled with g++ -O2.  Run under msys.
> // Author Paul Thomas
> #include <iostream>
> using namespace std;
>
> int main()
> {
>   for (int iloop = 0; iloop < 10000000; iloop++)
>   {
>     double *myarray;
>     if ((myarray = new double [1]) == NULL)
>         cout << "unable to allocate my array at iloop=" << iloop << endl;
>     delete [] myarray;
>   }
>   cout << "done looping" << endl;
>   return 0;
> }
>
> === C Compiled with gcc -O2.  Run under msys.
> /* modified from C++ by Paul Kienzle */
> #include <stdio.h>
> int main()
> {
>   int iloop;
>   for (iloop = 0; iloop < 10000000; iloop++)
>   {
>     double *myarray = (double *)malloc(sizeof(double));
>     if (myarray== NULL) { printf("alloc failed\n"); exit(1); }
>     else free (myarray);
>   }
>   return 0;
> }
>
>


Comment 19 pkienzle@users.sf.net 2004-03-28 22:28:11 UTC
Subject: Re: [Fwd:  octave built under Cygwin very slow]

I'm putting my executable bundle on:

	http://myfilelocker.comcast.net/pkienzle/new.tar.gz

It is easier to compare times if they are on the same machine.

There are two subdirectories: new 32 and 33, each with their
own cygwin1.dll.

 From msys, so long as cygwin is not running, you should
be able to say:

	time 32/cygwin32
	time 32/Ccygwin32
	time 33/cygwin331
	time 33/Ccygwin331
	time 32/mingw32.exe
	time 32/Cmingw32.exe
	time 33/mingw333.exe
	time 33/Cmingw333.exe
	time 33/mingw323.exe

I tried alloc.c with lcc, and it was slower than mingw32 so
I didn't bother recording the time.

Paul Kienzle
pkienzle@users.sf.net

On Mar 28, 2004, at 4:18 PM, Paul Thomas wrote:

> Well, we seem to have got rid of the smoked fish (sorry, red herring) 
> and now have a smoking howitzer......
>
> Paul,
>
> It strikes me that not only is new/delete slow for cygwin331 but that 
> malloc/delete must also take most of the execution time for the octave 
> tests.  These seem to be totally excluded from the profiling.
>
> I have added the Intel, Visual C and gcc331 times for Windows XP on an 
> Athlon 1700
>
> Paul T
>
> PS I would have added the exit but I was going to bash ctrl-c is 
> anything went wrong with the allocation.
>
> Paul Kienzle wrote:
>
>> Tests of malloc and new [] for cygwin and mingw 3.2 and 3.3 and linux 
>> gcc 3.3.
>> Someone please fill in numbers for 'native' windows compilers, such as
>> visual C and Intel.
>>
>> === Times, running under msys on a Windows 2000 PII-300 system
>>
>> System        real        user        sys
>> mingw333    17.936    0.030    0.040
>> cygwin331    72.394    0.020    0.060
>> Cmingw333    12.277    0.010    0.060
>> Ccygwin331    24.355    0.030    0.050
>>
>> System        real        user        sys
>> mingw323    18.837    0.020    0.040
>> mingw32        14.160    0.010    0.060
>> cygwin32        15.933    0.020    0.050
>> Cmingw32    12.668    0.030    0.040
>> Ccygwin32    14.410    0.010    0.080
>
> Paul Thomas adds...
>
>   === Elapsed times running under Windows XP on an Athlon 1700
>
>  System              execution time (octave> 
> tic;system('./malloctest.exe');toc
>  intel                   2.19
>  VC                    2.17
>  cygwin331        19.86
>  Cintel                2.58
>  CVC                 2.37
>  Ccygwin331     4.34
>>
>> === Times, running under bash on a Debian PII-400 system
>>
>> System        real        user        sys
>> linux332         4.808    4.800    0.010
>> Clinux332     3.162    3.160    0.000
>>
>> === Versions
>>
>> mingw32        3.2 (mingw special 20020817-1)
>> mingw323    3.2.3 (mingw special 20030504-1)
>> mingw333    3.3.3 (mingw special)
>> cygwin32        3.2 (20020927 prerelease), linked against stdc++.dll
>> cygwin331    3.3.1-3 (cygming special)
>> linux332        3.3.2 20030908 (Debian prerelease)
>>
>> === C++ Compiled with g++ -O2.  Run under msys.
>> // Author Paul Thomas
>> #include <iostream>
>> using namespace std;
>>
>> int main()
>> {
>>   for (int iloop = 0; iloop < 10000000; iloop++)
>>   {
>>     double *myarray;
>>     if ((myarray = new double [1]) == NULL)
>>         cout << "unable to allocate my array at iloop=" << iloop << 
>> endl;
>>     delete [] myarray;
>>   }
>>   cout << "done looping" << endl;
>>   return 0;
>> }
>>
>> === C Compiled with gcc -O2.  Run under msys.
>> /* modified from C++ by Paul Kienzle */
>> #include <stdio.h>
>> int main()
>> {
>>   int iloop;
>>   for (iloop = 0; iloop < 10000000; iloop++)
>>   {
>>     double *myarray = (double *)malloc(sizeof(double));
>>     if (myarray== NULL) { printf("alloc failed\n"); exit(1); }
>>     else free (myarray);
>>   }
>>   return 0;
>> }
>>
>>
>
>

Comment 20 Paul Thomas 2004-03-31 00:21:19 UTC
Subject: Re:  octave built under Cygwin very slow

I realise from the silence that it cannot have been apparent from the 
last forward that we have understood where the problem lies with 
gcc-3.3.1 (cygming special).  It has nothing to do with sjlj, inspite of 
the profiling.  A significant difference has creapt in between new and 
malloc.  Normally, on just about every system that we have tested, new 
is about 50% slower than malloc.  In gcc 3.3.1-3 (cygming special) it is 
about 6-10 times slower. We have no idea why.

The following scrap of code demonstrates it (have used -O2 for compilation):

#include <iostream>
#include <stdio.h>
#include <time.h>
#include <vector>

using namespace std;

int main()
{
  long t1 = clock();
  for (int iloop = 0; iloop < 10000000; iloop++)
  {
     double *myarray;
     if ((myarray = new double [1]) == NULL)
     {
        cout << "unable to allocate my array at iloop=" << iloop <<  endl;
        exit(1);
     }
     delete [] myarray;
 }
 long t2 = clock();
 double delt1 = (double)( t2 - t1 )/ (double)(CLOCKS_PER_SEC);
 cout << "done looping time 1=" << delt1 << endl;
 long t3 = clock();

 for (int iloop = 0; iloop < 10000000; iloop++)
 {
   double *myarray = (double *)malloc(sizeof(double));
   if (myarray== NULL) { printf("alloc failed\n"); exit(1); }
   else free (myarray);
 }
 long t4 = clock();
 double delt2 = (double)( t4 - t3 )/ (double)(CLOCKS_PER_SEC);
 cout << "done looping time 2=" << delt2 << endl;

return 0;
}

Best regards

Paul Thomas



bangerth at dealii dot org wrote:

>------- Additional Comments From bangerth at dealii dot org  2004-03-25 14:36 -------
>SJLJ stands for "setjmp/longjmp". I'm not an expert in this field 
>(as I know virtually nothing about the gcc interiors anyway, I'm 
>just the bug database dude), but here's the idea: when you call 
>a function that may or may not throw an exception, and the calling 
>function needs to run destructors of local objects in case an exception 
>is thrown, you need to put down the address of the cleanup code somewhere. 
>One way to do this is to set this address via setjmp, and throwing an 
>exception then transfers control to this place via longjmp. This is expensive 
>since you have to call setjmp every time a cleanup is necessary. 
> 
>The other possibility is to use lookup tables that the compiler generates 
>statically, so this is cheap at run-time, but incurs some code overhead. If 
>you generate an exception, you have to somehow look up where to transfer 
>execution. Don't ask me how exactly this works, but it is to my best 
>knowledge how dwarf2 exception unwinding works. Corrections on this topic 
>my more knowledgable people are certainly welcome. 
> 
>Now back to the question how we can figure out what the problem is: if 
>using -fno-exceptions doesn't work, is there a possibility you repeat 
>your experiments with an octave version prior to the introduction of 
>exceptions? 
> 
>W. 
>
>  
>


Comment 21 Paul Thomas 2004-04-02 17:43:17 UTC
Subject: Re:  octave built under Cygwin very slow

Wolfgang and Danny,

Did you get our recent correspondence on this?  We cracked the source 
problem and have provided sample demo code.  Do you want that I send it 
again?

Paul Thomas

Comment 22 Wolfgang Bangerth 2004-04-02 19:55:47 UTC
I can confirm this on linux, too. Here is what I get: 
--------------- 
g/x> /home/bangerth/bin/gcc-3.2.3/bin/c++ -O2 x.cc 
g/x> ./a.out  
done looping time 1=0.98 
done looping time 2=0.64 
g/x> /home/bangerth/bin/gcc-3.3.4-pre/bin/c++ -O2 x.cc 
g/x> ./a.out  
done looping time 1=0.98 
done looping time 2=0.64 
g/x> /home/bangerth/bin/gcc-3.4-pre/bin/c++ -O2 x.cc 
g/x> ./a.out  
done looping time 1=0.99 
done looping time 2=0.67 
g/x> /home/bangerth/bin/gcc-3.5-pre/bin/c++ -O2 x.cc 
g/x> ./a.out  
done looping time 1=0.97 
done looping time 2=0.67 
g/x> icc -O2 x.cc 
g/x> ./a.out  
done looping time 1=0.97 
done looping time 2=0.7 
----------------------- 
 
I find this very startling. 
 
This PR has gone quite a distance from its original problem -- would you 
mind closing this one, opening another one with title "new is 50% slower 
than malloc" or something similar, in component "libstdc++", and post  
your testcase and if you want my results above there? This way we would 
have a clean slate again, and would know what to focus on. Leave a mark 
in the new PR that this came out of PR 14563. 
 
Thanks 
 Wolfgang 
Comment 23 Danny Smith 2004-04-02 20:35:01 UTC
This is what I get on mingw32, 3.4.0 20040327 (prerelease)

Average of 12 runs, which gave very consistent results.

Built with -enable-sjlj
malloc:  2.1885 
new:     2.8319


Built with -disable-sjlj (startup code modified to allow 
Dwarf2 EH to work)
malloc:  2.2017
new:     2.3318

FWIW, the DW2 built exe (260kb) was also smaller than the sjlj exe (290kb).
This is with static libgcc and libstdc++.

Danny 
Comment 24 Paolo Carlini 2004-04-02 20:41:05 UTC
Just curious: what happens for the scalar version

  myarray = new double;
  ..
  ..
  delete myarray;

(which seems more appropriate for a single double)??
Also, please remove that 'if(myarray = new double [1]) == NULL)',
I really can't bear it ;)
Comment 25 Wolfgang Bangerth 2004-04-02 20:44:31 UTC
Re: 'if(myarray = new double [1]) == NULL)' 
Yes, me too :-) You need a rather old compiler (or libstdc++) 
that ever goes into the if-branch... 
 
W. 
Comment 26 Paul Thomas 2004-04-03 09:07:16 UTC
Subject: Re:  octave built under Cygwin very slow

This is a multi-part message in MIME format.
Comment 27 Timo Keller 2004-04-03 17:18:28 UTC
I inlined all allocation operators and they inproved from 2.393s to 1.922s (C
allocation style: 2.013s). Note that I also changed the test program to allocate
an array of 100 unsigned ints.
The problem with inlining them is that this can only work if <new> is included,
so please don't understand this as a patch, but as an idea/explanation why new
is slower than malloc.

Reading specs from /usr/local/lib/gcc/i686-pc-cygwin/3.5-tree-ssa/specs
Configured with: ./configure --disable-libmudflap --without-libbanshee
--disable-checking --enable-languages=c,c++ --disable-threads : (reconfigured) 
: (reconfigured) ./configure --disable-libmudflap
 --without-libbanshee --disable-checking --enable-languages=c,c++
--disable-threads : (reconfigured)
  : (reconfigured) ./configure --disable-libmudflap --without-libbanshee
--disable-checking --enable-languages=c,c++ --disable-threads : (reconfigured) 
: (reconfigured) ./configure --disable-libmudflap --without-libbanshee
--disable-checking --enable-languages=c,c++ --disable-threads
Thread model: single
gcc version 3.5-tree-ssa 20040403 (merged 20040331)

#include <iostream>
#include <stdio.h>
#include <time.h>
using namespace std;

int main()
{
	const size_t array_size = 100;
	const unsigned loop_count = 1000000;
	long t1 = clock();
	for (unsigned iloop = 0; iloop < loop_count; iloop++)
	{
		unsigned *myarray = new unsigned [array_size];
		delete [] myarray;
	}
	long t2 = clock();
	double delt1 = (double)( t2 - t1 )/ (double)(CLOCKS_PER_SEC);
	cout << "done looping time 1=" << delt1 << endl;
	long t3 = clock();

	for (unsigned iloop = 0; iloop < loop_count; iloop++)
	{
		unsigned *myarray = (unsigned *)malloc(array_size * sizeof(unsigned));
		if (myarray== NULL) { printf("alloc failed\n"); exit(1); }
		else free (myarray);
	}
	long t4 = clock();
	double delt2 = (double)( t4 - t3 )/ (double)(CLOCKS_PER_SEC);
	cout << "done looping time 2=" << delt2 << endl;

	return 0;
}



Index: gcc/libstdc++-v3/libsupc++/del_op.cc
===================================================================
RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/del_op.cc,v
retrieving revision 1.2.22.1
diff -u -r1.2.22.1 del_op.cc
--- gcc/libstdc++-v3/libsupc++/del_op.cc	3 Jun 2003 16:53:00 -0000	1.2.22.1
+++ gcc/libstdc++-v3/libsupc++/del_op.cc	3 Apr 2004 17:11:53 -0000
@@ -30,11 +30,3 @@
 
 #include "new"
 
-extern "C" void free (void *);
-
-void
-operator delete (void *ptr) throw ()
-{
-  if (ptr)
-    free (ptr);
-}
Index: gcc/libstdc++-v3/libsupc++/del_opnt.cc
===================================================================
RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/del_opnt.cc,v
retrieving revision 1.2.22.1
diff -u -r1.2.22.1 del_opnt.cc
--- gcc/libstdc++-v3/libsupc++/del_opnt.cc	3 Jun 2003 16:53:00 -0000	1.2.22.1
+++ gcc/libstdc++-v3/libsupc++/del_opnt.cc	3 Apr 2004 17:11:53 -0000
@@ -30,11 +30,3 @@
 
 #include "new"
 
-extern "C" void free (void *);
-
-void
-operator delete (void *ptr, const std::nothrow_t&) throw ()
-{
-  if (ptr)
-    free (ptr);
-}
Index: gcc/libstdc++-v3/libsupc++/del_opv.cc
===================================================================
RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/del_opv.cc,v
retrieving revision 1.2.22.1
diff -u -r1.2.22.1 del_opv.cc
--- gcc/libstdc++-v3/libsupc++/del_opv.cc	3 Jun 2003 16:53:00 -0000	1.2.22.1
+++ gcc/libstdc++-v3/libsupc++/del_opv.cc	3 Apr 2004 17:11:53 -0000
@@ -29,9 +29,3 @@
 // the GNU General Public License.
 
 #include "new"
-
-void
-operator delete[] (void *ptr) throw ()
-{
-  ::operator delete (ptr);
-}
Index: gcc/libstdc++-v3/libsupc++/del_opvnt.cc
===================================================================
RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/del_opvnt.cc,v
retrieving revision 1.2.22.1
diff -u -r1.2.22.1 del_opvnt.cc
--- gcc/libstdc++-v3/libsupc++/del_opvnt.cc	3 Jun 2003 16:53:00 -0000	1.2.22.1
+++ gcc/libstdc++-v3/libsupc++/del_opvnt.cc	3 Apr 2004 17:11:53 -0000
@@ -29,9 +29,3 @@
 // the GNU General Public License.
 
 #include "new"
-
-void
-operator delete[] (void *ptr, const std::nothrow_t&) throw ()
-{
-  ::operator delete (ptr);
-}
Index: gcc/libstdc++-v3/libsupc++/new
===================================================================
RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new,v
retrieving revision 1.10.2.5
diff -u -r1.10.2.5 new
--- gcc/libstdc++-v3/libsupc++/new	21 Jul 2003 13:54:08 -0000	1.10.2.5
+++ gcc/libstdc++-v3/libsupc++/new	3 Apr 2004 17:11:53 -0000
@@ -39,6 +39,7 @@
 #define _NEW
 
 #include <cstddef>
+#include <cstdlib>
 #include <exception>
 
 extern "C++" {
@@ -68,6 +69,10 @@
   new_handler set_new_handler(new_handler) throw();
 } // namespace std
 
+
+void* __operator_new(std::size_t) throw (std::bad_alloc);
+void* __operator_new_nothrow(std::size_t) throw ();
+
 //@{
 /** These are replaceable signatures:
  *  - normal single new and delete (no arguments, throw @c bad_alloc on error)
@@ -79,14 +84,55 @@
  *  Placement new and delete signatures (take a memory address argument,
  *  does nothing) may not be replaced by a user's program.
 */
-void* operator new(std::size_t) throw (std::bad_alloc);
-void* operator new[](std::size_t) throw (std::bad_alloc);
-void operator delete(void*) throw();
-void operator delete[](void*) throw();
-void* operator new(std::size_t, const std::nothrow_t&) throw();
-void* operator new[](std::size_t, const std::nothrow_t&) throw();
-void operator delete(void*, const std::nothrow_t&) throw();
-void operator delete[](void*, const std::nothrow_t&) throw();
+inline void* operator new(std::size_t sz) throw (std::bad_alloc)
+{
+  /* malloc (0) is unpredictable; avoid it.  */
+  if (sz == 0)
+    sz = 1;
+  void *p = std::malloc (sz);
+  if (!p)
+    p = __operator_new(sz);
+
+  return p;
+}
+inline void* operator new[] (std::size_t sz) throw (std::bad_alloc)
+{
+  return ::operator new(sz);
+}
+inline void operator delete (void *ptr) throw ()
+{
+  if (ptr)
+    std::free (ptr);
+}
+inline void operator delete[] (void *ptr) throw ()
+{
+  ::operator delete (ptr);
+}
+
+inline void* operator new (std::size_t sz, const std::nothrow_t&) throw()
+{
+  /* malloc (0) is unpredictable; avoid it.  */
+  if (sz == 0)
+    sz = 1;
+  void *p = std::malloc (sz);
+  if (!p)
+    p = __operator_new_nothrow(sz);
+
+  return p;
+}
+inline void* operator new[] (std::size_t sz, const std::nothrow_t& nothrow) throw()
+{
+  return ::operator new(sz, nothrow);
+}
+inline void operator delete (void *ptr, const std::nothrow_t&) throw ()
+{
+  if (ptr)
+    std::free (ptr);
+}
+inline void operator delete[] (void *ptr, const std::nothrow_t&) throw ()
+{
+  ::operator delete (ptr);
+}
 
 // Default placement versions of operator new.
 inline void* operator new(std::size_t, void* __p) throw() { return __p; }
Index: gcc/libstdc++-v3/libsupc++/new_op.cc
===================================================================
RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new_op.cc,v
retrieving revision 1.5.2.1
diff -u -r1.5.2.1 new_op.cc
--- gcc/libstdc++-v3/libsupc++/new_op.cc	3 Jun 2003 16:53:00 -0000	1.5.2.1
+++ gcc/libstdc++-v3/libsupc++/new_op.cc	3 Apr 2004 17:11:53 -0000
@@ -37,27 +37,22 @@
 
 extern new_handler __new_handler;
 
-void *
-operator new (std::size_t sz) throw (std::bad_alloc)
+void* __operator_new(std::size_t sz) throw (std::bad_alloc)
 {
   void *p;
-
-  /* malloc (0) is unpredictable; avoid it.  */
-  if (sz == 0)
-    sz = 1;
-  p = (void *) malloc (sz);
-  while (p == 0)
+  do
     {
-      new_handler handler = __new_handler;
+      std::new_handler handler = __new_handler;
       if (! handler)
 #ifdef __EXCEPTIONS
-	throw bad_alloc();
+        throw std::bad_alloc();
 #else
         std::abort();
 #endif
       handler ();
-      p = (void *) malloc (sz);
+      p = std::malloc (sz);
     }
+  while (!p);
 
   return p;
 }
Index: gcc/libstdc++-v3/libsupc++/new_opnt.cc
===================================================================
RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new_opnt.cc,v
retrieving revision 1.3.22.1
diff -u -r1.3.22.1 new_opnt.cc
--- gcc/libstdc++-v3/libsupc++/new_opnt.cc	3 Jun 2003 16:53:00 -0000	1.3.22.1
+++ gcc/libstdc++-v3/libsupc++/new_opnt.cc	3 Apr 2004 17:11:53 -0000
@@ -36,31 +36,26 @@
 extern "C" void *malloc (std::size_t);
 extern new_handler __new_handler;
 
-void *
-operator new (std::size_t sz, const std::nothrow_t&) throw()
+void* __operator_new_nothrow(std::size_t sz) throw ()
 {
   void *p;
-
-  /* malloc (0) is unpredictable; avoid it.  */
-  if (sz == 0)
-    sz = 1;
-  p = (void *) malloc (sz);
-  while (p == 0)
+  do
     {
-      new_handler handler = __new_handler;
+      std::new_handler handler = __new_handler;
       if (! handler)
-	return 0;
+	    return 0;
       try
-	{
-	  handler ();
-	}
-      catch (bad_alloc &)
-	{
-	  return 0;
-	}
+	    {
+	      handler ();
+	    }
+      catch (std::bad_alloc &)
+	    {
+	      return 0;
+	    }
 
-      p = (void *) malloc (sz);
+      p = std::malloc (sz);
     }
+  while (!p);
 
   return p;
 }
Index: gcc/libstdc++-v3/libsupc++/new_opv.cc
===================================================================
RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new_opv.cc,v
retrieving revision 1.3.22.1
diff -u -r1.3.22.1 new_opv.cc
--- gcc/libstdc++-v3/libsupc++/new_opv.cc	3 Jun 2003 16:53:00 -0000	1.3.22.1
+++ gcc/libstdc++-v3/libsupc++/new_opv.cc	3 Apr 2004 17:11:53 -0000
@@ -30,8 +30,3 @@
 
 #include "new"
 
-void *
-operator new[] (std::size_t sz) throw (std::bad_alloc)
-{
-  return ::operator new(sz);
-}
Index: gcc/libstdc++-v3/libsupc++/new_opvnt.cc
===================================================================
RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new_opvnt.cc,v
retrieving revision 1.3.22.1
diff -u -r1.3.22.1 new_opvnt.cc
--- gcc/libstdc++-v3/libsupc++/new_opvnt.cc	3 Jun 2003 16:53:00 -0000	1.3.22.1
+++ gcc/libstdc++-v3/libsupc++/new_opvnt.cc	3 Apr 2004 17:11:53 -0000
@@ -30,8 +30,3 @@
 
 #include "new"
 
-void *
-operator new[] (std::size_t sz, const std::nothrow_t& nothrow) throw()
-{
-  return ::operator new(sz, nothrow);
-}
Comment 28 Paul Thomas 2004-04-03 17:54:26 UTC
Subject: Re:  octave built under Cygwin very slow

Hi,

It's good to know that you are enagaging with this one.  I sent a 
message this morning that some how got deleted in the process of being 
sent; it's main content was not to forget that the most extreme 
manifestation of the difference is with g++ 3.3.1 (cygming special) for 
which

new/delete takes 1900ns/loop
malloc/erase          400ns/loop  

This is how we detected this in the first place.

For reference, on the same Athlon 1700, 3.2.2 20030222 (RH 3.2.2-5)

Gives

new/delete              140ns/loop
malloc/erase            100ns/loop

Unfortunately, I just this morning deleted the g++ 3.3.1 and replaced it 
with 3.2 (which does not show such aberrant behaviour, by the way) , so 
I cannot test your patch!

Paul Thomas


epanelelytha at kellertimo dot de wrote:

>------- Additional Comments From epanelelytha at kellertimo dot de  2004-04-03 17:18 -------
>I inlined all allocation operators and they inproved from 2.393s to 1.922s (C
>allocation style: 2.013s). Note that I also changed the test program to allocate
>an array of 100 unsigned ints.
>The problem with inlining them is that this can only work if <new> is included,
>so please don't understand this as a patch, but as an idea/explanation why new
>is slower than malloc.
>
>Reading specs from /usr/local/lib/gcc/i686-pc-cygwin/3.5-tree-ssa/specs
>Configured with: ./configure --disable-libmudflap --without-libbanshee
>--disable-checking --enable-languages=c,c++ --disable-threads : (reconfigured) 
>: (reconfigured) ./configure --disable-libmudflap
> --without-libbanshee --disable-checking --enable-languages=c,c++
>--disable-threads : (reconfigured)
>  : (reconfigured) ./configure --disable-libmudflap --without-libbanshee
>--disable-checking --enable-languages=c,c++ --disable-threads : (reconfigured) 
>: (reconfigured) ./configure --disable-libmudflap --without-libbanshee
>--disable-checking --enable-languages=c,c++ --disable-threads
>Thread model: single
>gcc version 3.5-tree-ssa 20040403 (merged 20040331)
>
>#include <iostream>
>#include <stdio.h>
>#include <time.h>
>using namespace std;
>
>int main()
>{
>	const size_t array_size = 100;
>	const unsigned loop_count = 1000000;
>	long t1 = clock();
>	for (unsigned iloop = 0; iloop < loop_count; iloop++)
>	{
>		unsigned *myarray = new unsigned [array_size];
>		delete [] myarray;
>	}
>	long t2 = clock();
>	double delt1 = (double)( t2 - t1 )/ (double)(CLOCKS_PER_SEC);
>	cout << "done looping time 1=" << delt1 << endl;
>	long t3 = clock();
>
>	for (unsigned iloop = 0; iloop < loop_count; iloop++)
>	{
>		unsigned *myarray = (unsigned *)malloc(array_size * sizeof(unsigned));
>		if (myarray== NULL) { printf("alloc failed\n"); exit(1); }
>		else free (myarray);
>	}
>	long t4 = clock();
>	double delt2 = (double)( t4 - t3 )/ (double)(CLOCKS_PER_SEC);
>	cout << "done looping time 2=" << delt2 << endl;
>
>	return 0;
>}
>
>
>
>Index: gcc/libstdc++-v3/libsupc++/del_op.cc
>===================================================================
>RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/del_op.cc,v
>retrieving revision 1.2.22.1
>diff -u -r1.2.22.1 del_op.cc
>--- gcc/libstdc++-v3/libsupc++/del_op.cc	3 Jun 2003 16:53:00 -0000	1.2.22.1
>+++ gcc/libstdc++-v3/libsupc++/del_op.cc	3 Apr 2004 17:11:53 -0000
>@@ -30,11 +30,3 @@
> 
> #include "new"
> 
>-extern "C" void free (void *);
>-
>-void
>-operator delete (void *ptr) throw ()
>-{
>-  if (ptr)
>-    free (ptr);
>-}
>Index: gcc/libstdc++-v3/libsupc++/del_opnt.cc
>===================================================================
>RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/del_opnt.cc,v
>retrieving revision 1.2.22.1
>diff -u -r1.2.22.1 del_opnt.cc
>--- gcc/libstdc++-v3/libsupc++/del_opnt.cc	3 Jun 2003 16:53:00 -0000	1.2.22.1
>+++ gcc/libstdc++-v3/libsupc++/del_opnt.cc	3 Apr 2004 17:11:53 -0000
>@@ -30,11 +30,3 @@
> 
> #include "new"
> 
>-extern "C" void free (void *);
>-
>-void
>-operator delete (void *ptr, const std::nothrow_t&) throw ()
>-{
>-  if (ptr)
>-    free (ptr);
>-}
>Index: gcc/libstdc++-v3/libsupc++/del_opv.cc
>===================================================================
>RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/del_opv.cc,v
>retrieving revision 1.2.22.1
>diff -u -r1.2.22.1 del_opv.cc
>--- gcc/libstdc++-v3/libsupc++/del_opv.cc	3 Jun 2003 16:53:00 -0000	1.2.22.1
>+++ gcc/libstdc++-v3/libsupc++/del_opv.cc	3 Apr 2004 17:11:53 -0000
>@@ -29,9 +29,3 @@
> // the GNU General Public License.
> 
> #include "new"
>-
>-void
>-operator delete[] (void *ptr) throw ()
>-{
>-  ::operator delete (ptr);
>-}
>Index: gcc/libstdc++-v3/libsupc++/del_opvnt.cc
>===================================================================
>RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/del_opvnt.cc,v
>retrieving revision 1.2.22.1
>diff -u -r1.2.22.1 del_opvnt.cc
>--- gcc/libstdc++-v3/libsupc++/del_opvnt.cc	3 Jun 2003 16:53:00 -0000	1.2.22.1
>+++ gcc/libstdc++-v3/libsupc++/del_opvnt.cc	3 Apr 2004 17:11:53 -0000
>@@ -29,9 +29,3 @@
> // the GNU General Public License.
> 
> #include "new"
>-
>-void
>-operator delete[] (void *ptr, const std::nothrow_t&) throw ()
>-{
>-  ::operator delete (ptr);
>-}
>Index: gcc/libstdc++-v3/libsupc++/new
>===================================================================
>RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new,v
>retrieving revision 1.10.2.5
>diff -u -r1.10.2.5 new
>--- gcc/libstdc++-v3/libsupc++/new	21 Jul 2003 13:54:08 -0000	1.10.2.5
>+++ gcc/libstdc++-v3/libsupc++/new	3 Apr 2004 17:11:53 -0000
>@@ -39,6 +39,7 @@
> #define _NEW
> 
> #include <cstddef>
>+#include <cstdlib>
> #include <exception>
> 
> extern "C++" {
>@@ -68,6 +69,10 @@
>   new_handler set_new_handler(new_handler) throw();
> } // namespace std
> 
>+
>+void* __operator_new(std::size_t) throw (std::bad_alloc);
>+void* __operator_new_nothrow(std::size_t) throw ();
>+
> //@{
> /** These are replaceable signatures:
>  *  - normal single new and delete (no arguments, throw @c bad_alloc on error)
>@@ -79,14 +84,55 @@
>  *  Placement new and delete signatures (take a memory address argument,
>  *  does nothing) may not be replaced by a user's program.
> */
>-void* operator new(std::size_t) throw (std::bad_alloc);
>-void* operator new[](std::size_t) throw (std::bad_alloc);
>-void operator delete(void*) throw();
>-void operator delete[](void*) throw();
>-void* operator new(std::size_t, const std::nothrow_t&) throw();
>-void* operator new[](std::size_t, const std::nothrow_t&) throw();
>-void operator delete(void*, const std::nothrow_t&) throw();
>-void operator delete[](void*, const std::nothrow_t&) throw();
>+inline void* operator new(std::size_t sz) throw (std::bad_alloc)
>+{
>+  /* malloc (0) is unpredictable; avoid it.  */
>+  if (sz == 0)
>+    sz = 1;
>+  void *p = std::malloc (sz);
>+  if (!p)
>+    p = __operator_new(sz);
>+
>+  return p;
>+}
>+inline void* operator new[] (std::size_t sz) throw (std::bad_alloc)
>+{
>+  return ::operator new(sz);
>+}
>+inline void operator delete (void *ptr) throw ()
>+{
>+  if (ptr)
>+    std::free (ptr);
>+}
>+inline void operator delete[] (void *ptr) throw ()
>+{
>+  ::operator delete (ptr);
>+}
>+
>+inline void* operator new (std::size_t sz, const std::nothrow_t&) throw()
>+{
>+  /* malloc (0) is unpredictable; avoid it.  */
>+  if (sz == 0)
>+    sz = 1;
>+  void *p = std::malloc (sz);
>+  if (!p)
>+    p = __operator_new_nothrow(sz);
>+
>+  return p;
>+}
>+inline void* operator new[] (std::size_t sz, const std::nothrow_t& nothrow) throw()
>+{
>+  return ::operator new(sz, nothrow);
>+}
>+inline void operator delete (void *ptr, const std::nothrow_t&) throw ()
>+{
>+  if (ptr)
>+    std::free (ptr);
>+}
>+inline void operator delete[] (void *ptr, const std::nothrow_t&) throw ()
>+{
>+  ::operator delete (ptr);
>+}
> 
> // Default placement versions of operator new.
> inline void* operator new(std::size_t, void* __p) throw() { return __p; }
>Index: gcc/libstdc++-v3/libsupc++/new_op.cc
>===================================================================
>RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new_op.cc,v
>retrieving revision 1.5.2.1
>diff -u -r1.5.2.1 new_op.cc
>--- gcc/libstdc++-v3/libsupc++/new_op.cc	3 Jun 2003 16:53:00 -0000	1.5.2.1
>+++ gcc/libstdc++-v3/libsupc++/new_op.cc	3 Apr 2004 17:11:53 -0000
>@@ -37,27 +37,22 @@
> 
> extern new_handler __new_handler;
> 
>-void *
>-operator new (std::size_t sz) throw (std::bad_alloc)
>+void* __operator_new(std::size_t sz) throw (std::bad_alloc)
> {
>   void *p;
>-
>-  /* malloc (0) is unpredictable; avoid it.  */
>-  if (sz == 0)
>-    sz = 1;
>-  p = (void *) malloc (sz);
>-  while (p == 0)
>+  do
>     {
>-      new_handler handler = __new_handler;
>+      std::new_handler handler = __new_handler;
>       if (! handler)
> #ifdef __EXCEPTIONS
>-	throw bad_alloc();
>+        throw std::bad_alloc();
> #else
>         std::abort();
> #endif
>       handler ();
>-      p = (void *) malloc (sz);
>+      p = std::malloc (sz);
>     }
>+  while (!p);
> 
>   return p;
> }
>Index: gcc/libstdc++-v3/libsupc++/new_opnt.cc
>===================================================================
>RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new_opnt.cc,v
>retrieving revision 1.3.22.1
>diff -u -r1.3.22.1 new_opnt.cc
>--- gcc/libstdc++-v3/libsupc++/new_opnt.cc	3 Jun 2003 16:53:00 -0000	1.3.22.1
>+++ gcc/libstdc++-v3/libsupc++/new_opnt.cc	3 Apr 2004 17:11:53 -0000
>@@ -36,31 +36,26 @@
> extern "C" void *malloc (std::size_t);
> extern new_handler __new_handler;
> 
>-void *
>-operator new (std::size_t sz, const std::nothrow_t&) throw()
>+void* __operator_new_nothrow(std::size_t sz) throw ()
> {
>   void *p;
>-
>-  /* malloc (0) is unpredictable; avoid it.  */
>-  if (sz == 0)
>-    sz = 1;
>-  p = (void *) malloc (sz);
>-  while (p == 0)
>+  do
>     {
>-      new_handler handler = __new_handler;
>+      std::new_handler handler = __new_handler;
>       if (! handler)
>-	return 0;
>+	    return 0;
>       try
>-	{
>-	  handler ();
>-	}
>-      catch (bad_alloc &)
>-	{
>-	  return 0;
>-	}
>+	    {
>+	      handler ();
>+	    }
>+      catch (std::bad_alloc &)
>+	    {
>+	      return 0;
>+	    }
> 
>-      p = (void *) malloc (sz);
>+      p = std::malloc (sz);
>     }
>+  while (!p);
> 
>   return p;
> }
>Index: gcc/libstdc++-v3/libsupc++/new_opv.cc
>===================================================================
>RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new_opv.cc,v
>retrieving revision 1.3.22.1
>diff -u -r1.3.22.1 new_opv.cc
>--- gcc/libstdc++-v3/libsupc++/new_opv.cc	3 Jun 2003 16:53:00 -0000	1.3.22.1
>+++ gcc/libstdc++-v3/libsupc++/new_opv.cc	3 Apr 2004 17:11:53 -0000
>@@ -30,8 +30,3 @@
> 
> #include "new"
> 
>-void *
>-operator new[] (std::size_t sz) throw (std::bad_alloc)
>-{
>-  return ::operator new(sz);
>-}
>Index: gcc/libstdc++-v3/libsupc++/new_opvnt.cc
>===================================================================
>RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new_opvnt.cc,v
>retrieving revision 1.3.22.1
>diff -u -r1.3.22.1 new_opvnt.cc
>--- gcc/libstdc++-v3/libsupc++/new_opvnt.cc	3 Jun 2003 16:53:00 -0000	1.3.22.1
>+++ gcc/libstdc++-v3/libsupc++/new_opvnt.cc	3 Apr 2004 17:11:53 -0000
>@@ -30,8 +30,3 @@
> 
> #include "new"
> 
>-void *
>-operator new[] (std::size_t sz, const std::nothrow_t& nothrow) throw()
>-{
>-  return ::operator new(sz, nothrow);
>-}
>
>  
>


Comment 29 Timo Keller 2004-04-03 18:00:27 UTC
(In reply to comment #28)
> Subject: Re:  octave built under Cygwin very slow
> 
> Hi,
> 
> It's good to know that you are enagaging with this one.  I sent a 
> message this morning that some how got deleted in the process of being 
> sent; it's main content was not to forget that the most extreme 
> manifestation of the difference is with g++ 3.3.1 (cygming special) for 
> which
> 
> new/delete takes 1900ns/loop
> malloc/erase          400ns/loop  
> 
> This is how we detected this in the first place.
> 
> For reference, on the same Athlon 1700, 3.2.2 20030222 (RH 3.2.2-5)
> 
> Gives
> 
> new/delete              140ns/loop
> malloc/erase            100ns/loop
> 
> Unfortunately, I just this morning deleted the g++ 3.3.1 and replaced it 
> with 3.2 (which does not show such aberrant behaviour, by the way) , so 
> I cannot test your patch!
But you can still test it with 3.2 (I'm using 3.5-tree-ssa 20040403).
Comment 30 Paul Thomas 2004-04-03 18:24:26 UTC
Subject: Re:  octave built under Cygwin very slow

Sorry, yes, you are right, I can test it but not to see it's effect on 
that gruesome cyming special.

Paul

epanelelytha at kellertimo dot de wrote:

>------- Additional Comments From epanelelytha at kellertimo dot de  2004-04-03 18:00 -------
>(In reply to comment #28)
>  
>
>>Subject: Re:  octave built under Cygwin very slow
>>
>>Hi,
>>
>>It's good to know that you are enagaging with this one.  I sent a 
>>message this morning that some how got deleted in the process of being 
>>sent; it's main content was not to forget that the most extreme 
>>manifestation of the difference is with g++ 3.3.1 (cygming special) for 
>>which
>>
>>new/delete takes 1900ns/loop
>>malloc/erase          400ns/loop  
>>
>>This is how we detected this in the first place.
>>
>>For reference, on the same Athlon 1700, 3.2.2 20030222 (RH 3.2.2-5)
>>
>>Gives
>>
>>new/delete              140ns/loop
>>malloc/erase            100ns/loop
>>
>>Unfortunately, I just this morning deleted the g++ 3.3.1 and replaced it 
>>with 3.2 (which does not show such aberrant behaviour, by the way) , so 
>>I cannot test your patch!
>>    
>>
>But you can still test it with 3.2 (I'm using 3.5-tree-ssa 20040403).
>
>  
>


Comment 31 Andrew Pinski 2004-07-12 14:50:01 UTC
No feedback in 3 months
Comment 32 Paul Thomas 2004-07-12 19:21:03 UTC
Subject: Re:  new/delete much slower than malloc/free

Dear All,

I do apologise,  I missed the necessity to feedback and was supposing that
the necessary fixes would feed through to the next gcc release bundled in
with Cygwin.  As far as I was concerned, the problem was sufficiently
"fixed" by reverting to 3.2.  Obviously, I will take a look at the patches
and see how they can be applied to octave.  If any blinding flashes of
inspiration occur, I will report back to you asap.

One thing that I get no sense of from the thread is why the Mingw/Cygwin
gcc-3.3 is so very bad, even in comparison with what you guys were finding.
Perhaps, I should try the tests with -sjlj disabled first, since the
profiling at first mislead us (well, me, at least) into believing that the
problem lay entirely there?  Anyway, I will find a "victim" machine onto
which to install Cygwin and gcc-3.3 (I might as well make the problem as bad
as possible!).

Best regards and thanks for your efforts.

Paul Thomas

----- Original Message ----- 
From: "pinskia at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org>
To: <paulthomas2@wanadoo.fr>
Sent: Monday, July 12, 2004 4:50 PM
Subject: [Bug libstdc++/14563] new/delete much slower than malloc/free


>
> ------- Additional Comments From pinskia at gcc dot gnu dot org
2004-07-12 14:50 -------
> No feedback in 3 months
>
> -- 
>            What    |Removed                     |Added
> --------------------------------------------------------------------------
--
>              Status|WAITING                     |RESOLVED
>          Resolution|                            |INVALID
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14563
>
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.

Comment 33 Wolfgang Bangerth 2004-07-12 20:55:15 UTC
This doesn't seem to be resolved at all, so let's not close it. 
Comment 34 Paul Thomas 2004-07-13 04:17:17 UTC
Subject: Re:  new/delete much slower than malloc/free

'tis what I thought!  What would you like me to do?

Paul T

----- Original Message ----- 
From: "bangerth at dealii dot org" <gcc-bugzilla@gcc.gnu.org>
To: <paulthomas2@wanadoo.fr>
Sent: Monday, July 12, 2004 10:55 PM
Subject: [Bug libstdc++/14563] new/delete much slower than malloc/free


>
> ------- Additional Comments From bangerth at dealii dot org  2004-07-12
20:55 -------
> This doesn't seem to be resolved at all, so let's not close it.
>
> -- 
>            What    |Removed                     |Added
> --------------------------------------------------------------------------
--
>              Status|RESOLVED                    |UNCONFIRMED
>          Resolution|INVALID                     |
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14563
>
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.

Comment 35 ron hylton 2004-07-28 02:50:17 UTC
I've been experiencing a severe performance problem on Cygwin with gcc 3.4.2 in 
which programs compiled with enable-sjlj-exceptions are 6 times slower than 
disable-sjlj-exceptions.  After reading the thread here I experimented with the 
new/malloc test case posted previously but modified so that the whole thing is 
wrapped in a try block.  Here are the results, all run on the same machine (2Ghz 
P4) with array_size = 100, loop_count = 3000000.

Intel Windows C++
done looping time 1=1.14
done looping time 2=1.125

VMWare SUSE Linux 9.1 gcc 3.4.2 (default sjlj-exceptions, presumably disable-)
done looping time 1=0.75
done looping time 2=0.62

Cygwin gcc 3.4.2 disable-sjlj-exceptions
done looping time 1=5.516
done looping time 2=5.265

Cygwin gcc 3.4.2 enable-sjlj-exceptions
done looping time 1=16.953
done looping time 2=5.328

Cygwin distribution gcc 3.3.1 (enable-sjlj-exceptions)
done looping time 1=17.016
done looping time 2=5.328
 
There seem to be 2 problems with Cygwin.  First, both new & malloc are 5 or 6 
times slower than on Linux or using Intel.  Second, enable-sjlj-exceptions slows 
down new by another factor of 3 on top of this.

The full configuration for the Cygwin enable case is 

Configured with: ../gcc/configure --prefix=/gcc-3.4 --with-gcc --with-gnu-ld --
with-gnu-as --enable-languages=c,c++,f77 --enable-libgcj --enable-threads=posix 
--with-system-zlib --enable-nls --without-included-gettext --enable-interpreter 
--enable-sjlj-exceptions --disable-version-specific-runtime-libs --enable-shared 
--disable-win32-registry --enable-java-gc=boehm --disable-hash-synchronization -
-verbose --target=i686-pc-cygwin --host=i686-pc-cygwin --build=i686-pc-cygwin
Thread model: posix
gcc version 3.4.2 20040720 (prerelease)

My own applications are very array-intensive so it's not immediately obvious 
that new is the culprit in my case, but there's probably a connection on the 
sjlj problem.

Ron Hylton
Comment 36 ron hylton 2004-07-28 03:57:10 UTC
One more case:

Cygwin distribution gcc 3.3.1 -mno-cygwin
done looping time 1=1.766
done looping time 2=1.406
Comment 37 Andrew Pinski 2004-07-28 06:03:30 UTC
I think basically you are messed up untill Cygwin switches to dwarf2 exceptions.
Comment 38 ron hylton 2004-07-29 04:23:39 UTC
Andrew,  I was afraid of that.  Thanks for confirming it.
Comment 39 Paul Thomas 2004-08-08 09:24:07 UTC
Subject: Re:  new/delete much slower than malloc/free

Ron,

I am just back from California and found this in my in-tray.  I had 
great difficulty with my french ISP, whilst I was there, and had a 
number of wierd replies like this one.  Is there some comment on this 
problem from "Andrew" that I missed or were you replying to my comment 
about the problem being absent in gcc-3.2?

I will give your -mno-cygwin a go with gcc-3.2, as soon as I have dealt 
with an overloaded in-tray.

Paul

ron_hylton at hotmail dot com wrote:

>------- Additional Comments From ron_hylton at hotmail dot com  2004-07-29 04:23 -------
>Andrew,  I was afraid of that.  Thanks for confirming it.
>
>  
>


Comment 40 Giovanni Bajo 2004-11-10 08:21:19 UTC
Ron, can you please attach your testcase that shows the problem to this PR?

This PR is a regression on cygwin because the speed is back with 3.2.
Comment 41 Giovanni Bajo 2004-11-10 08:21:36 UTC
*** Bug 18414 has been marked as a duplicate of this bug. ***
Comment 42 Danny Smith 2004-11-10 09:10:33 UTC
No its not a regression.  GCC-3.2  built with sjlj shows the same problem.  
The "fast" version of GCC-3.2 that OP referenced was a "cygming-special" that 
had Dwarf-2 EH enabled.  As I indicated ealier, this experiment was dropped 
because of problems with Win32 API callbacks and DW-2 EH

Danny
Comment 43 ron hylton 2004-11-10 16:20:09 UTC
(In reply to comment #40)
> Ron, can you please attach your testcase that shows the problem to this PR?
> 
> This PR is a regression on cygwin because the speed is back with 3.2.

This is the test case I was using:

#include <iostream>
#include <stdio.h>
#include <time.h>
#include <string>
using namespace std;

int main()
{
	int array_size = 100;
	int loop_count = 3000000;
	try
	{
		long t1 = clock();
		for (int iloop = 0; iloop < loop_count; iloop++)
		{
			int *myarray = new int [array_size];
			delete [] myarray;
		}
		long t2 = clock();
		double delt1 = (double)( t2 - t1 )/ (double)(CLOCKS_PER_SEC);
		cout << "done looping time 1=" << delt1 << endl;
		long t3 = clock();

		for (int jloop = 0; jloop < loop_count; jloop++)
		{
			int *myarray = (int *)malloc(array_size * sizeof(int));
			if (myarray== NULL) { printf("alloc failed\n"); exit(1); }
			else free (myarray);
		}
		long t4 = clock();
		double delt2 = (double)( t4 - t3 )/ (double)(CLOCKS_PER_SEC);
		cout << "done looping time 2=" << delt2 << endl;
	}
	catch (...)
	{
		cout << "exception" << std::endl;
		return 1;
	}

	return 0;
}
Comment 44 Kenneth Duda 2004-11-10 17:05:19 UTC
(In reply to comment #40)
> Ron, can you please attach your testcase that shows the problem to this PR?
> This PR is a regression on cygwin because the speed is back with 3.2.

Here's a test case for you...
   -Ken

-------------------------------------------------------


// Uncomment one of these defines.
// With the first define uncommented, I get 3.293 usec per "operator new" use.
// With the second define uncommented, I get 1.019 usec per "operator new" use.
// A high price to pay for having one's exceptions properly declared!

//#define THROW throw (std::bad_alloc)
#define THROW


// These definitions are taken straight from libstdc++.


#include "new"
#include <exception_defines.h>

using std::new_handler;
using std::bad_alloc;

extern "C" void *malloc (std::size_t);
extern new_handler __new_handler;

void *
operator new (std::size_t sz) THROW
{
  void *p;

  /* malloc (0) is unpredictable; avoid it.  */
  if (sz == 0)
    sz = 1;
  p = (void *) malloc (sz);
  while (p == 0)
    {
      new_handler handler = __new_handler;
      if (! handler)
#ifdef __EXCEPTIONS
	throw bad_alloc();
#else
        std::abort();
#endif
      handler ();
      p = (void *) malloc (sz);
    }

  return p;
}

void *
operator new[] (std::size_t sz) THROW
{
  return ::operator new(sz);
}




#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <assert.h>

typedef unsigned long long u64;
typedef u64 Usec;

#ifdef WIN32

#include <Windows.h>

inline Usec Now()
{
   DWORD ticks = GetTickCount();
   return ((Usec) ticks) * 1000;
}

#else

#include <sys/types.h>
#include <sys/time.h>

inline Usec Now()
{
   struct timeval tv;
   if( gettimeofday( &tv, 0 ) ) {
      perror( "gettimeofday" );
      exit( 1 );
   }
   return ((Usec) tv.tv_sec) * 1000000 + tv.tv_usec;
}

#endif

using namespace std;



main()
{
  int sizeMin = 4;
  int sizeMax = 100;
  int allocsOutstanding = 1000;
  int reps = 1000;
  int allocsPerRep = 1000;

  int sizeRange = sizeMax - sizeMin;
  char ** ptrs = (char **) malloc( sizeof( char * ) * allocsOutstanding );
  memset( ptrs, 0, sizeof( char * ) * allocsOutstanding );

  Usec start = Now();
  
  int m = reps;
  while( m-- ) {
    int n = allocsPerRep;
    while( n-- ) {
      int r = rand();
      int index = r % allocsOutstanding;
      char * p = ptrs[index];
      delete[] p;
      //      free( p );
      int size = (r % sizeRange) + sizeMin;
      p = new char[ size ];
      //      p = (char *) malloc( size );
      ptrs[index] = p;
    }
  }

  Usec stop = Now();
  double t = ((double) stop - start) / ((double) allocsPerRep * reps);
  printf( "cost of new + delete is about %0.3f usec\n", t );
  fflush( stdout );
}
Comment 45 Paul Thomas 2004-11-13 11:02:44 UTC
Subject: Re:  new/delete much slower than malloc/free because of sjlj exceptions

> Here's a test case for you...
>    -Ken

That's interesting....

Using your test case:
(i) gcc 3.2 20020927 ( prerelease) both versions take 0.62micro-sec/new
(ii) gcc 3.1.1 (cygming special) I get 2.1 and 0.66micro-sec/new
(iii) gcc 4.0.0 20041010 (experimental) I get 0.62 and 0.59micro-sec/new

This latter was a tad unexpected - I built in from a snapshot on one of the
German mirror sites.  Does this imply that I have picked up Dwarf2 as a
default?

Going back to the beginning of this rather long thread, you will note that
it was building octave that first exposed this problem.  I think that octave
is calling new too many times anyway, for certain types of code, and had
started hanging counters on an overloaded new operator.  It would not be a
big deal to substitute your version and to compare the performance with
THROW defined or not.

Give me a few days, the build takes a few hours under Cygwin and I have some
concreting to do this weekend.... *sigh*

Regards

Paul Thomas

Comment 46 ken.duda@gmail.com 2004-11-14 17:03:48 UTC
Subject: Re:  new/delete much slower than malloc/free because of sjlj exceptions

Thanks, Paul.  Let me know if I can help in any way.  I appeneded the
output of "gcc -v".

   -Ken

===============================================

Reading specs from /usr/lib/gcc-lib/i686-pc-cygwin/3.3.3/specs
Configured with: /gcc/gcc-3.3.3-3/configure --verbose --prefix=/usr
--exec-prefix=/usr --sysconfdir=/etc --libdir=/usr/lib
--libexecdir=/usr/lib --mandir=/usr/share/man
--infodir=/usr/share/info
--enable-languages=c,ada,c++,d,f77,java,objc,pascal --enable-nls
--without-included-gettext --enable-libgcj --with-system-zlib
--enable-interpreter --enable-threads=posix --enable-java-gc=boehm
--enable-sjlj-exceptions --disable-version-specific-runtime-libs
--disable-win32-registry
Thread model: posix
gcc version 3.3.3 (cygwin special)

====================================================

On 13 Nov 2004 11:03:05 -0000, paulthomas2 at wanadoo dot fr
<gcc-bugzilla@gcc.gnu.org> wrote:
> 
> That's interesting....
> 
> Using your test case:
> (i) gcc 3.2 20020927 ( prerelease) both versions take 0.62micro-sec/new
> (ii) gcc 3.1.1 (cygming special) I get 2.1 and 0.66micro-sec/new
> (iii) gcc 4.0.0 20041010 (experimental) I get 0.62 and 0.59micro-sec/new
> 
> This latter was a tad unexpected - I built in from a snapshot on one of the
> German mirror sites.  Does this imply that I have picked up Dwarf2 as a
> default?
Comment 47 Paul Thomas 2004-11-14 18:04:06 UTC
Subject: Re:  new/delete much slower than malloc/free because
 of sjlj exceptions


Ken,

Did you miss the question?

Paul

>>(iii) gcc 4.0.0 20041010 (experimental) I get 0.62 and 0.59micro-sec/new
>>
>>This latter was a tad unexpected - I built in from a snapshot on one of the
>>German mirror sites.  Does this imply that I have picked up Dwarf2 as a
>>default?
>>    
>>
>
>
>  
>



Comment 48 ken.duda@gmail.com 2004-11-14 22:40:50 UTC
Subject: Re:  new/delete much slower than malloc/free because of sjlj exceptions

> Did you miss the question?

Umm, apparently I did.. the only thing I see in the bug log that looks
like a question is this:

> Does this imply that I have picked up Dwarf2 as a default?

I don't know the answer.  The only thing I can say that might be
related is that there are assembly statements in my output like "call
__Unwind_SjLj_Register"; that (with the --enable-sjlj-exceptions) has
led me to believe I'm using SjLj exceptions.

Again, let me know if there's anything I can help with.

   -Ken



On 14 Nov 2004 18:04:07 -0000, paulthomas2 at wanadoo dot fr
<gcc-bugzilla@gcc.gnu.org> wrote:
> 
> ------- Additional Comments From paulthomas2 at wanadoo dot fr  2004-11-14 18:04 -------
> Subject: Re:  new/delete much slower than malloc/free because
> of sjlj exceptions
> 
> 
> Ken,
> 
> Did you miss the question?
> 
> Paul
> 
>
Comment 49 Andrew Pinski 2005-05-12 14:54:07 UTC
If you used the non throw new, it would become faster.
Comment 50 Danny Smith 2007-06-14 03:21:32 UTC
(In reply to comment #37)
> I think basically you are messed up untill Cygwin switches to dwarf2
> exceptions.
> 
This is now (=gcc 4.3) possible by adding --disable-sjlj-exceptions to configure.
Can we close with milestone gcc-4.3.0?
Danny
Comment 51 Steven Bosscher 2012-07-30 23:29:36 UTC
As per Danny's suggestion in comment #50 (impressive...)
Comment 52 Jackie Rosen 2014-02-16 13:13:25 UTC Comment hidden (spam)