I am sorry if this report is imprecise and probably unhelpful but several contributers to octave lists have failed to identify where the problem lies. The problem is simply stated and is unique to builds of octave-2.1.xx under latest version of Cygwin: interpreted octave programmes run 6-7 times slower when compiled and linked with gcc-3.3.3 and its libraries, compared with 3.2.3. Note that (i) This is problem is not present in Linux and (ii) is not dependent on which continent one is in! ie. This problem is reproducible. Beyond this, a variety of optimisation levels, architecture settings and so on were tried, to no avail.
Can you provide a testcase where 3.3.3 is slower than 3.2.3?
Subject: Re: octave built under Cygwin very slow I am not in a position to do so right now; the system that has Cygwin loaded on it is at work - I am 100% linux at home. Unfortunately, I am out of town next week, so cannot get back to this until a week Monday. I have cc'd this to Ben Diedrich in the hope that he can help you. An obvious question, is what form would you like the test cases in? The output from octave benchmarks or the octave binaries? If it is the former, you will be able to get started on the octave mailing list: http://www.octave.org/mailing-lists/help-octave/2004/325 http://www.octave.org/mailing-lists/help-octave/2004/337 http://www.octave.org/mailing-lists/help-octave/2004/339 <--- contains configuration details http://www.octave.org/mailing-lists/help-octave/2004/389 <--- ditto If you want the binaries, that will take a little longer.... It might be more practical, if you have the time, to build octave yourself. We used octave-2.1.50 for our attempts to pin-point the problem because the "good" binary from octave-forge used this version. However, 2.1.53 and 2.1.56 both were slow when built with gcc-3.3.3 Ben, did you keep the deffective binaries? Paul Thomas pinskia at gcc dot gnu dot org wrote: >------- Additional Comments From pinskia at gcc dot gnu dot org 2004-03-13 07:24 ------- >Can you provide a testcase where 3.3.3 is slower than 3.2.3? > > >
One of the things you could do is to compile the octave version you use with both 3.2.3 and 3.3.3, in each case with profiling information (i.e. with -pg) and check the output to see whether there is one single function that now takes significantly longer (and thus moved up in the output of gprof, in the list of functions sorted by execution time). If there should be one such function, it might be worth to compile just this one function (or file) with the other compiler, to verify that it indeed is the problem. This way we could at least localize the problem a little better. This would also be of great value to us, since we have no clue about the octave functions, and it is hard for us to look at profiling output without knowing what all these functions do etc. If you have identified the function that has the slowdown (assuming that it is a single function), then one would rip it out of the program and place it (and whatever else it needs) into a small file where main() simply calls this function a number of times with dummy arguments. This way we would have a simpler and much smaller testcase. W.
Subject: Re: octave built under Cygwin very slow bangerth at dealii dot org wrote: >------- Additional Comments From bangerth at dealii dot org 2004-03-14 20:33 ------- >One of the things you could do is to compile the octave version you >use with both 3.2.3 and 3.3.3, in each case with profiling information >(i.e. with -pg) and check the output to see whether there is one single >function that now takes significantly longer (and thus moved up in >the output of gprof, in the list of functions sorted by execution time). > >If there should be one such function, it might be worth to compile just this >one function (or file) with the other compiler, to verify that it indeed >is the problem. This way we could at least localize the problem a little >better. This would also be of great value to us, since we have no clue >about the octave functions, and it is hard for us to look at profiling >output without knowing what all these functions do etc. > >If you have identified the function that has the slowdown (assuming that it >is a single function), then one would rip it out of the program and place >it (and whatever else it needs) into a small file where main() simply calls >this function a number of times with dummy arguments. This way we would >have a simpler and much smaller testcase. > >W. > > > Ben Diedrich and I have done as you suggested. It has to be said that neither of us have tried profiling anything, let alone a code as large and as complicated as octave. I did the gcc-3.2 and Ben did the gcc-3.3. Our machines are nearly identical, both with Windows 2000 and with the same version of Cygwin. We ran a small test programme in octave that shows a factor of approximately 6 difference in execution time, when octave is built with gcc-3.2 (~1.8-2s) or gcc-3.3 (10-12s). When compiled and linked with -pg set, the execution time increased to 9s with gcc-3.2 (Ben, what is the corresponding figure for gcc-3.3?). Switching off the -O2 flags, increased the time by a further 3s but resulted in a much more detailed profile. In the enclosed, we increased the number of loops by a factor of 10, in order to get a reasonable resolution on the less frequently visited functions. So remember that the corresponding wall-clock time is 90s for cycle 2. I have added the most pertinent part of the graphical output. If you want to see the entirity of the gprof output, I can easily forward it to you for both builds. To first order, there is an enormous amount of unaccounted time but both builds are more or less identical in their time in octave functions. The only significant difference that I can see is the appearance in the gcc-3.3 version of <spontaneous> [7] 22.3 5.62 0.00 _Unwind_SjLj_Register [7] <spontaneous> [9] 12.4 3.11 0.00 _Unwind_SjLj_Unregister [9] which together take a significant amout of time. What are these calls and could they be the culprits? I will be spending a bit of time over the next few days to get a better feel for profiling and its relationship to the execution times without -pg set. Whilst the time spent in various routines does change in correct proportion to the content of the octave code, the absolute magnitudes are way out. I hope that this helps to give you a clue. Any advice that you can offer on the profiling would be gratefully received. Paul Thomas Profiles for octave-2.1.50 built using (i) gcc-3.2 and (ii) gcc-3.3 Paul Thomas and Ben Diedrich 03/23/04 Both run with test programme, entered from octave command line: a = cputime ; tot = 0 ; x = [1:1e6] ; for i = 1:1e6 ; tot = tot + x(i) ; end ; disp(cputime-a) The individual lines in the graphical output can be identified with features in the test programme and the times correspond well, in proportion, with timing tests done in octave. The absolute times are way out. granularity: each sample hit covers 4 byte(s) for 0.04% of 25.18 seconds index % time self children called name >>>>>Built with gcc-3.2 [3] 70.7 2.33 6.14 4+16000183 <cycle 2 as a whole> [3] 0.27 2.15 1000017 tree_index_expression::rvalue(int) <cycle 2> [8] 0.29 1.06 3000033 tree_identifier::rvalue(int) <cycle 2> [12] 0.24 0.58 1000015 tree_simple_assignment::rvalue() <cycle 2> [17] 0.22 0.51 1000022 tree_statement::eval(bool, int, bool) <cycle 2> [19] 0.19 0.47 1000009+4 tree_binary_expression::rvalue() <cycle 2> [20] 0.28 0.24 1000003 tree_argument_list::convert_to_const_vector(octave_value const*) <cycle 2> [23] 0.21 0.26 3000032 tree_identifier::rvalue() <cycle 2> [27] 0.14 0.23 1000015 tree_simple_assignment::rvalue(int) <cycle 2> [32] 0.17 0.13 1000006 tree_statement_list::eval(bool, int) <cycle 2> [36] 0.09 0.17 1000005 make_value_list(tree_argument_list*, string_vector const&, octave_value const*) <cycle 2> [40] 0.15 0.09 1000014 tree_index_expression::rvalue() <cycle 2> [45] 0.02 0.19 1000000 tree_simple_for_command::do_for_loop_once(octave_lvalue&, octave_value const&, bool&) <cycle 2> [47] 0.06 0.06 1 tree_simple_for_command::eval() <cycle 2> [64] 0.00 0.00 2 octave_user_function::do_multi_index_op(int, octave_value_list const&) <cycle 2> [184] 0.00 0.00 3 octave_value::do_multi_index_op(int, octave_value_list const&) <cycle 2> [208] 0.00 0.00 2 tree_parameter_list::convert_to_const_vector(tree_va_return_list*) <cycle 2> [242] 0.00 0.00 2 tree_if_command::eval() <cycle 2> [1079] 0.00 0.00 2 tree_if_command_list::eval() <cycle 2> [1084] 0.00 0.00 2 tree_if_clause::eval() <cycle 2> [1077] --- >>>>>Built with gcc-3.3 [3] 42.7 3.07 7.68 3+16000180 <cycle 2 as a whole> [3] 0.36 2.66 1000017 tree_index_expression::rvalue(int) <cycle 2> [10] 0.36 1.27 3000032 tree_identifier::rvalue(int) <cycle 2> [14] 0.41 0.88 1000015 tree_simple_assignment::rvalue() <cycle 2> [17] 0.25 0.66 1000021 tree_statement::eval(bool, int, bool) <cycle 2> [21] 0.13 0.64 1000009+4 tree_binary_expression::rvalue() <cycle 2> [25] 0.26 0.31 1000015 tree_simple_assignment::rvalue(int) <cycle 2> [29] 0.29 0.28 1000003 tree_argument_list::convert_to_const_vector(octave_value const*) <cycle 2> [30] 0.27 0.29 3000032 tree_identifier::rvalue() <cycle 2> [31] 0.30 0.13 1000005 tree_statement_list::eval(bool, int) <cycle 2> [35] 0.17 0.17 1000005 make_value_list(tree_argument_list*, string_vector const&, octave_value const*) <cycle 2> [44] 0.05 0.24 1000000 tree_simple_for_command::do_for_loop_once(octave_lvalue&, octave_value const&, bool&) <cycle 2> [49] 0.10 0.10 1000014 tree_index_expression::rvalue() <cycle 2> [63] 0.12 0.06 1 tree_simple_for_command::eval() <cycle 2> [69] 0.00 0.00 2 octave_user_function::do_multi_index_op(int, octave_value_list const&) <cycle 2> [186] 0.00 0.00 2 tree_parameter_list::convert_to_const_vector(tree_va_return_list*) <cycle 2> [233] 0.00 0.00 2 octave_value::do_multi_index_op(int, octave_value_list const&) <cycle 2> [1058] 0.00 0.00 2 tree_if_command::eval() <cycle 2> [1068] 0.00 0.00 2 tree_if_command_list::eval() <cycle 2> [1074] 0.00 0.00 2 tree_if_clause::eval() <cycle 2> [1066] >>>>Lines that do not appear in the version built with gcc-3.2 <spontaneous> [7] 22.3 5.62 0.00 _Unwind_SjLj_Register [7] <spontaneous> [9] 12.4 3.11 0.00 _Unwind_SjLj_Unregister [9] <spontaneous> [11] 10.2 2.56 0.00 operator new(unsigned int) [11]
First, thanks for you efforts! The Unwind_SjLj_* functions have to do with exceptions. Danny, I CC: you because here's a cygwin question: is sjlj the default on windows, and are you aware of any significant changes in this area that could affect this? Paul&Ben: even if the problem is in this function, am I correct with my math that these functions only account for at most about 1/4 of the run-time? If that is the case, then they can't make up for a six-fold increase in run-time... W.
Subject: Re: octave built under Cygwin very slow bangerth at dealii dot org wrote: >------- Additional Comments From bangerth at dealii dot org 2004-03-24 15:57 ------- >First, thanks for you efforts! > >The Unwind_SjLj_* functions have to do with exceptions. Danny, I CC: >you because here's a cygwin question: is sjlj the default on windows, >and are you aware of any significant changes in this area that could >affect this? > >Paul&Ben: even if the problem is in this function, am I correct with >my math that these functions only account for at most about 1/4 of >the run-time? If that is the case, then they can't make up for a >six-fold increase in run-time... > >W. > > > This is why I am questioning the calibration of the profiling - none of the times add up. Is it reasonable that the faster of the two builds be bumped up from 20 to 90s runtime with profiling? If so, should I expect to see the total profile time add up to the original 20s? In fact, the total that I can find in the profiling is about 12seconds, including octave start-up. Even if the latter is negligible, I am missing 40% of the unprofiled execution time and 80% or so of the wall-clock time. Am I right in thinking that setting -fno-exceptions will suppress sjlj if it is the default? Perhaps we should try that as an experiment? Paul T
sjsl exceptions is one way to implement exception handling. On most systems, we have moved to more efficient ways, such as dwarf2 unwinding. However, if you don't use exceptions, trying things out with -fno-exceptions may be an interesting experiment anyway. Regarding the times not adding up: I think this is usual for gprof. In fact, gprof is not a very good tool anyway, but it is the one that is most widely available. There are more accurate ones, which I have never used myself, though, so I can't say anything about them. I think I remember people being quite fond of oprof. It may also be the case that valgrind can produce some sort of information, but I don't know about that exactly. W.
Subject: Re: octave built under Cygwin very slow Paul Thomas wrote: > We ran a small test programme in octave that shows a factor of > approximately 6 difference in execution time, when octave is built with > gcc-3.2 (~1.8-2s) or gcc-3.3 (10-12s). When compiled and linked with > -pg set, the execution time increased to 9s with gcc-3.2 (Ben, what is > the corresponding figure for gcc-3.3?). Switching off the -O2 flags, > increased the time by a further 3s but resulted in a much more detailed > profile. In the enclosed, we increased the number of loops by a factor > of 10, in order to get a reasonable resolution on the less frequently > visited functions. So remember that the corresponding wall-clock time > is 90s for cycle 2. > The execution time of Octave 2.1.50 with profiling turned on and GCC 3.3.1 was ~280 seconds. That is with the test command: a = cputime ; tot = 0 ; x = [1:1e6] ; for i = 1:1e6 ; tot = tot + x(i) ; end ; disp(cputime-a) The time is ~28 seconds with 1e5 for loop steps. Ben
I am a bit unclear what version of gcc was used for the "fast" precompiled octave. Was it really gcc-3.2.3 or gcc-3.2-3 (the third cygwin update of gcc- 3.2.0)? What does gcc -v say for the gcc that built the "fast" octave. The cygwin gcc-3.2 distro's (dated about August 2002) had a local patch that enabled Dwarf2 exceptions. This worked fine except when functions throwing exceptions were used as callbacks by win32api functions. So the experimented was terminated and the EH model was reverted to sjlj in later binary distro's of gcc. If this is really a difference between sjlj and Dwarf2, I think it is time to revisit Dwarf2 support on windows targets. Danny
Subject: Re: octave built under Cygwin very slow Danny, I am away from base right now and do not have access to any of the installations - Ben Diedrich can supply you with the version number for the "good" gcc; ie. that of the octave-forge binary distribution. For your information, Ben tells me that the slow build, with profiling, runs slightly more than three times more slowly than the fast build with profiling. Since the only significant difference in the profiles is the presence in the slow build of sjlj calls,..... j'accuse! Otherwise, the problem must lie in something outside the scope of the profiling. If "W" does not automatically get this, could you ensure that it is forwarded to him, please? Paul Thomas PS Thank you both for your rapid responses to this problem; it is something that has been perplexing us a lot. dannysmith at users dot sourceforge dot net wrote: >------- Additional Comments From dannysmith at users dot sourceforge dot net 2004-03-24 22:58 ------- >I am a bit unclear what version of gcc was used for the "fast" precompiled >octave. Was it really gcc-3.2.3 or gcc-3.2-3 (the third cygwin update of gcc- >3.2.0)? > >What does gcc -v say for the gcc that built the "fast" octave. > >The cygwin gcc-3.2 distro's (dated about August 2002) had a local patch that >enabled Dwarf2 exceptions. This worked fine except when functions throwing >exceptions were used as callbacks by win32api functions. So the experimented >was terminated and the EH model was reverted to sjlj in later binary distro's >of gcc. > >If this is really a difference between sjlj and Dwarf2, I think it is time to >revisit Dwarf2 support on windows targets. > >Danny > > >
Danny, this was exactly the feedback I was hoping for from you! :-) Let's wait what we find out about the version string and whether -fno-exceptions changes something. Is there a way to change the exception model short of recompiling everything? Thanks W. (= Wolfgang, but too tired of writing this out every time; and besides, everyone seems to know me on this list anyway :-)
Subject: Re: octave built under Cygwin very slow Paul Thomas wrote: > Danny, > > I am away from base right now and do not have access to any of the > installations - Ben Diedrich can supply you with the version number for > the "good" gcc; ie. that of the octave-forge binary distribution. For > your information, Ben tells me that the slow build, with profiling, > runs slightly more than three times more slowly than the fast build with > profiling. Since the only significant difference in the profiles is the > presence in the slow build of sjlj calls,..... j'accuse! Otherwise, the > problem must lie in something outside the scope of the profiling. > > If "W" does not automatically get this, could you ensure that it is > forwarded to him, please? > > Paul Thomas > > PS Thank you both for your rapid responses to this problem; it is > something that has been perplexing us a lot. Here are the results of 'gcc -v' for the compiler that results in a fast octave. Note that it includes the option '--disable-sjlj-exceptions'. $ gcc -v Reading specs from /usr/lib/gcc-lib/i686-pc-cygwin/3.2/specs Configured with: /netrel/src/gcc-3.2-3/configure --enable-languages=c,c++,f77,java --enable-libgcj --enable-threads=posix --with-system-zlib --enable-nls --without-included-gettext --enable-interpreter --disable-sjlj-exceptions --disable-version-specific-runtime-libs --enable-shared --build=i686-pc-linux --host=i686-pc-cygwin --target=i686-pc-cygwin --enable-haifa --prefix=/usr --exec-prefix=/usr --sysconfdir=/etc --libdir=/usr/lib --includedir=/nonexistent/include --libexecdir=/usr/sbin Thread model: posix gcc version 3.2 20020927 (prerelease) Here are the results for the compiler that gives a slower octave. I noticed that this one has the option '--enable-sjlj-exceptions': $ gcc -v Reading specs from /usr/lib/gcc-lib/i686-pc-cygwin/3.3.1/specs Configured with: /GCC/gcc-3.3.1-3/configure --with-gcc --with-gnu-ld --with-gnu-as --prefix=/usr --exec-prefix=/usr --sysconfdir=/etc --libdir=/usr/lib --libexecdir=/usr/sbin --mandir=/usr/share/man --infodir=/usr/share/info --enable-languages=c,ada,c++,f77,pascal,java,objc --enable-libgcj --enable-threads=posix --with-system-zlib --enable-nls --without-included-gettext --enable-interpreter --enable-sjlj-exceptions --disable-version-specific-runtime-libs --enable-shared --disable-win32-registry --enable-java-gc=boehm --disable-hash-synchronization --verbose --target=i686-pc-cygwin --host=i686-pc-cygwin --build=i686-pc-cygwin Thread model: posix gcc version 3.3.1 (cygming special) Ben
Subject: Re: octave built under Cygwin very slow Danny and Wolfgang, Sorry but neither Ben or I are frequentees of your list so we did not know. Hello Wolfgang! I hope that Ben is in today, to get the version string to you. As for the -fno-exceptions, what will happen if we do this, when... "Octave is now using exceptions for error handling rather than setjmp-longjmp. See libcruft/misc/quit.h for details. " (from Paul Kienzle)? You can take a look at quit.h by going to http://pareto.uab.es/mcreel/OctaveClassReference/html/index.html and looking it up in the file list. If the build is going to fail, I do not think anybody should waste their time trying since it is quite a long process. Paul T bangerth at dealii dot org wrote: >------- Additional Comments From bangerth at dealii dot org 2004-03-25 13:43 ------- >Danny, this was exactly the feedback I was hoping for from you! :-) >Let's wait what we find out about the version string and whether >-fno-exceptions changes something. Is there a way to change the >exception model short of recompiling everything? > >Thanks > W. (= Wolfgang, but too tired of writing this out every time; and > besides, everyone seems to know me on this list anyway :-) > > >
Subject: Re: octave built under Cygwin very slow Danny and Wolfgang, It seems that I was a bit too quick off the mark just now - thanks Ben! So, the obvious question is how to do a build with sjlj disabled that is consistent with the way in which octave handles exceptions. Does -disable-sjlj-exceptions do it without throwing up unrequited references during the linking? Just to satisfy my curiosity and I would think, Ben's, just what is sjlj (baby-talk, please) and is this yet another thing that I should know and care about? Paul T Ben Diedrich wrote: >Paul Thomas wrote: > > > >>Danny, >> >>I am away from base right now and do not have access to any of the >>installations - Ben Diedrich can supply you with the version number for >>the "good" gcc; ie. that of the octave-forge binary distribution. For >>your information, Ben tells me that the slow build, with profiling, >>runs slightly more than three times more slowly than the fast build with >>profiling. Since the only significant difference in the profiles is the >>presence in the slow build of sjlj calls,..... j'accuse! Otherwise, the >>problem must lie in something outside the scope of the profiling. >> >>If "W" does not automatically get this, could you ensure that it is >>forwarded to him, please? >> >>Paul Thomas >> >>PS Thank you both for your rapid responses to this problem; it is >>something that has been perplexing us a lot. >> >> > >Here are the results of 'gcc -v' for the compiler that results in a fast octave. Note that it includes the >option '--disable-sjlj-exceptions'. > >$ gcc -v >Reading specs from /usr/lib/gcc-lib/i686-pc-cygwin/3.2/specs >Configured with: /netrel/src/gcc-3.2-3/configure --enable-languages=c,c++,f77,java --enable-libgcj >--enable-threads=posix --with-system-zlib --enable-nls --without-included-gettext --enable-interpreter >--disable-sjlj-exceptions --disable-version-specific-runtime-libs --enable-shared --build=i686-pc-linux >--host=i686-pc-cygwin --target=i686-pc-cygwin --enable-haifa --prefix=/usr --exec-prefix=/usr >--sysconfdir=/etc --libdir=/usr/lib --includedir=/nonexistent/include --libexecdir=/usr/sbin >Thread model: posix >gcc version 3.2 20020927 (prerelease) > >Here are the results for the compiler that gives a slower octave. I noticed that this one has the option >'--enable-sjlj-exceptions': > >$ gcc -v >Reading specs from /usr/lib/gcc-lib/i686-pc-cygwin/3.3.1/specs >Configured with: /GCC/gcc-3.3.1-3/configure --with-gcc --with-gnu-ld --with-gnu-as --prefix=/usr >--exec-prefix=/usr --sysconfdir=/etc --libdir=/usr/lib --libexecdir=/usr/sbin --mandir=/usr/share/man >--infodir=/usr/share/info --enable-languages=c,ada,c++,f77,pascal,java,objc --enable-libgcj >--enable-threads=posix --with-system-zlib --enable-nls --without-included-gettext --enable-interpreter >--enable-sjlj-exceptions --disable-version-specific-runtime-libs --enable-shared --disable-win32-registry >--enable-java-gc=boehm --disable-hash-synchronization --verbose --target=i686-pc-cygwin >--host=i686-pc-cygwin --build=i686-pc-cygwin >Thread model: posix >gcc version 3.3.1 (cygming special) > >Ben > > > > > >
SJLJ stands for "setjmp/longjmp". I'm not an expert in this field (as I know virtually nothing about the gcc interiors anyway, I'm just the bug database dude), but here's the idea: when you call a function that may or may not throw an exception, and the calling function needs to run destructors of local objects in case an exception is thrown, you need to put down the address of the cleanup code somewhere. One way to do this is to set this address via setjmp, and throwing an exception then transfers control to this place via longjmp. This is expensive since you have to call setjmp every time a cleanup is necessary. The other possibility is to use lookup tables that the compiler generates statically, so this is cheap at run-time, but incurs some code overhead. If you generate an exception, you have to somehow look up where to transfer execution. Don't ask me how exactly this works, but it is to my best knowledge how dwarf2 exception unwinding works. Corrections on this topic my more knowledgable people are certainly welcome. Now back to the question how we can figure out what the problem is: if using -fno-exceptions doesn't work, is there a possibility you repeat your experiments with an octave version prior to the introduction of exceptions? W.
Subject: Re: octave built under Cygwin very slow I can confirm that compiling octave 2.1.50 under gcc-3.3.1 with the g++ flag '-fno-exceptions' fails to compile quit.cc with the following error: g++ -c -pg -O2 -I. -I../.. -I../../liboctave -I../../src -I../../libcruft/misc -I../../glob -I../../glob -DHAVE_CONFIG_H -mieee-fp -pg -O2 -fno-exceptions quit.cc -o quit.o quit.cc: In function `void octave_throw_interrupt_exception()': quit.cc:105: error: exception handling disabled, use -fexceptions to enable Ben
Subject: Re: octave built under Cygwin very slow Hey come on! For the "bug database dude" your grown-up talk is impressive enough to a toddler like me. Anyway, I thank you for the explanation of setjmp/longjmp - it has been on my list of things to gen up on for a while. It crosses my mind in discussing this that it is a bit aberrant to keep doing this within a loop of interpreted code, isn't it? Anyhow, you will have seen Ben's message about trying a build with -fno-exceptions; as I suspected it did not get past quit.h. I will put the question about the timing of the introduction of exceptions onto the list. This perhaps will be the best route to settle the question as to whether or not sjlj is the root cause of the slow-down. Thanks again Paul Thomas bangerth at dealii dot org wrote: >------- Additional Comments From bangerth at dealii dot org 2004-03-25 14:36 ------- >SJLJ stands for "setjmp/longjmp". I'm not an expert in this field >(as I know virtually nothing about the gcc interiors anyway, I'm >just the bug database dude), but here's the idea: when you call >a function that may or may not throw an exception, and the calling >function needs to run destructors of local objects in case an exception >is thrown, you need to put down the address of the cleanup code somewhere. >One way to do this is to set this address via setjmp, and throwing an >exception then transfers control to this place via longjmp. This is expensive >since you have to call setjmp every time a cleanup is necessary. > >The other possibility is to use lookup tables that the compiler generates >statically, so this is cheap at run-time, but incurs some code overhead. If >you generate an exception, you have to somehow look up where to transfer >execution. Don't ask me how exactly this works, but it is to my best >knowledge how dwarf2 exception unwinding works. Corrections on this topic >my more knowledgable people are certainly welcome. > >Now back to the question how we can figure out what the problem is: if >using -fno-exceptions doesn't work, is there a possibility you repeat >your experiments with an octave version prior to the introduction of >exceptions? > >W. > > >
Subject: Re: [Fwd: octave built under Cygwin very slow] Well, we seem to have got rid of the smoked fish (sorry, red herring) and now have a smoking howitzer...... Paul, It strikes me that not only is new/delete slow for cygwin331 but that malloc/delete must also take most of the execution time for the octave tests. These seem to be totally excluded from the profiling. I have added the Intel, Visual C and gcc331 times for Windows XP on an Athlon 1700 Paul T PS I would have added the exit but I was going to bash ctrl-c is anything went wrong with the allocation. Paul Kienzle wrote: > Tests of malloc and new [] for cygwin and mingw 3.2 and 3.3 and linux > gcc 3.3. > Someone please fill in numbers for 'native' windows compilers, such as > visual C and Intel. > > === Times, running under msys on a Windows 2000 PII-300 system > > System real user sys > mingw333 17.936 0.030 0.040 > cygwin331 72.394 0.020 0.060 > Cmingw333 12.277 0.010 0.060 > Ccygwin331 24.355 0.030 0.050 > > System real user sys > mingw323 18.837 0.020 0.040 > mingw32 14.160 0.010 0.060 > cygwin32 15.933 0.020 0.050 > Cmingw32 12.668 0.030 0.040 > Ccygwin32 14.410 0.010 0.080 Paul Thomas adds... === Elapsed times running under Windows XP on an Athlon 1700 System execution time (octave> tic;system('./malloctest.exe');toc intel 2.19 VC 2.17 cygwin331 19.86 Cintel 2.58 CVC 2.37 Ccygwin331 4.34 > > === Times, running under bash on a Debian PII-400 system > > System real user sys > linux332 4.808 4.800 0.010 > Clinux332 3.162 3.160 0.000 > > === Versions > > mingw32 3.2 (mingw special 20020817-1) > mingw323 3.2.3 (mingw special 20030504-1) > mingw333 3.3.3 (mingw special) > cygwin32 3.2 (20020927 prerelease), linked against stdc++.dll > cygwin331 3.3.1-3 (cygming special) > linux332 3.3.2 20030908 (Debian prerelease) > > === C++ Compiled with g++ -O2. Run under msys. > // Author Paul Thomas > #include <iostream> > using namespace std; > > int main() > { > for (int iloop = 0; iloop < 10000000; iloop++) > { > double *myarray; > if ((myarray = new double [1]) == NULL) > cout << "unable to allocate my array at iloop=" << iloop << endl; > delete [] myarray; > } > cout << "done looping" << endl; > return 0; > } > > === C Compiled with gcc -O2. Run under msys. > /* modified from C++ by Paul Kienzle */ > #include <stdio.h> > int main() > { > int iloop; > for (iloop = 0; iloop < 10000000; iloop++) > { > double *myarray = (double *)malloc(sizeof(double)); > if (myarray== NULL) { printf("alloc failed\n"); exit(1); } > else free (myarray); > } > return 0; > } > >
Subject: Re: [Fwd: octave built under Cygwin very slow] I'm putting my executable bundle on: http://myfilelocker.comcast.net/pkienzle/new.tar.gz It is easier to compare times if they are on the same machine. There are two subdirectories: new 32 and 33, each with their own cygwin1.dll. From msys, so long as cygwin is not running, you should be able to say: time 32/cygwin32 time 32/Ccygwin32 time 33/cygwin331 time 33/Ccygwin331 time 32/mingw32.exe time 32/Cmingw32.exe time 33/mingw333.exe time 33/Cmingw333.exe time 33/mingw323.exe I tried alloc.c with lcc, and it was slower than mingw32 so I didn't bother recording the time. Paul Kienzle pkienzle@users.sf.net On Mar 28, 2004, at 4:18 PM, Paul Thomas wrote: > Well, we seem to have got rid of the smoked fish (sorry, red herring) > and now have a smoking howitzer...... > > Paul, > > It strikes me that not only is new/delete slow for cygwin331 but that > malloc/delete must also take most of the execution time for the octave > tests. These seem to be totally excluded from the profiling. > > I have added the Intel, Visual C and gcc331 times for Windows XP on an > Athlon 1700 > > Paul T > > PS I would have added the exit but I was going to bash ctrl-c is > anything went wrong with the allocation. > > Paul Kienzle wrote: > >> Tests of malloc and new [] for cygwin and mingw 3.2 and 3.3 and linux >> gcc 3.3. >> Someone please fill in numbers for 'native' windows compilers, such as >> visual C and Intel. >> >> === Times, running under msys on a Windows 2000 PII-300 system >> >> System real user sys >> mingw333 17.936 0.030 0.040 >> cygwin331 72.394 0.020 0.060 >> Cmingw333 12.277 0.010 0.060 >> Ccygwin331 24.355 0.030 0.050 >> >> System real user sys >> mingw323 18.837 0.020 0.040 >> mingw32 14.160 0.010 0.060 >> cygwin32 15.933 0.020 0.050 >> Cmingw32 12.668 0.030 0.040 >> Ccygwin32 14.410 0.010 0.080 > > Paul Thomas adds... > > === Elapsed times running under Windows XP on an Athlon 1700 > > System execution time (octave> > tic;system('./malloctest.exe');toc > intel 2.19 > VC 2.17 > cygwin331 19.86 > Cintel 2.58 > CVC 2.37 > Ccygwin331 4.34 >> >> === Times, running under bash on a Debian PII-400 system >> >> System real user sys >> linux332 4.808 4.800 0.010 >> Clinux332 3.162 3.160 0.000 >> >> === Versions >> >> mingw32 3.2 (mingw special 20020817-1) >> mingw323 3.2.3 (mingw special 20030504-1) >> mingw333 3.3.3 (mingw special) >> cygwin32 3.2 (20020927 prerelease), linked against stdc++.dll >> cygwin331 3.3.1-3 (cygming special) >> linux332 3.3.2 20030908 (Debian prerelease) >> >> === C++ Compiled with g++ -O2. Run under msys. >> // Author Paul Thomas >> #include <iostream> >> using namespace std; >> >> int main() >> { >> for (int iloop = 0; iloop < 10000000; iloop++) >> { >> double *myarray; >> if ((myarray = new double [1]) == NULL) >> cout << "unable to allocate my array at iloop=" << iloop << >> endl; >> delete [] myarray; >> } >> cout << "done looping" << endl; >> return 0; >> } >> >> === C Compiled with gcc -O2. Run under msys. >> /* modified from C++ by Paul Kienzle */ >> #include <stdio.h> >> int main() >> { >> int iloop; >> for (iloop = 0; iloop < 10000000; iloop++) >> { >> double *myarray = (double *)malloc(sizeof(double)); >> if (myarray== NULL) { printf("alloc failed\n"); exit(1); } >> else free (myarray); >> } >> return 0; >> } >> >> > >
Subject: Re: octave built under Cygwin very slow I realise from the silence that it cannot have been apparent from the last forward that we have understood where the problem lies with gcc-3.3.1 (cygming special). It has nothing to do with sjlj, inspite of the profiling. A significant difference has creapt in between new and malloc. Normally, on just about every system that we have tested, new is about 50% slower than malloc. In gcc 3.3.1-3 (cygming special) it is about 6-10 times slower. We have no idea why. The following scrap of code demonstrates it (have used -O2 for compilation): #include <iostream> #include <stdio.h> #include <time.h> #include <vector> using namespace std; int main() { long t1 = clock(); for (int iloop = 0; iloop < 10000000; iloop++) { double *myarray; if ((myarray = new double [1]) == NULL) { cout << "unable to allocate my array at iloop=" << iloop << endl; exit(1); } delete [] myarray; } long t2 = clock(); double delt1 = (double)( t2 - t1 )/ (double)(CLOCKS_PER_SEC); cout << "done looping time 1=" << delt1 << endl; long t3 = clock(); for (int iloop = 0; iloop < 10000000; iloop++) { double *myarray = (double *)malloc(sizeof(double)); if (myarray== NULL) { printf("alloc failed\n"); exit(1); } else free (myarray); } long t4 = clock(); double delt2 = (double)( t4 - t3 )/ (double)(CLOCKS_PER_SEC); cout << "done looping time 2=" << delt2 << endl; return 0; } Best regards Paul Thomas bangerth at dealii dot org wrote: >------- Additional Comments From bangerth at dealii dot org 2004-03-25 14:36 ------- >SJLJ stands for "setjmp/longjmp". I'm not an expert in this field >(as I know virtually nothing about the gcc interiors anyway, I'm >just the bug database dude), but here's the idea: when you call >a function that may or may not throw an exception, and the calling >function needs to run destructors of local objects in case an exception >is thrown, you need to put down the address of the cleanup code somewhere. >One way to do this is to set this address via setjmp, and throwing an >exception then transfers control to this place via longjmp. This is expensive >since you have to call setjmp every time a cleanup is necessary. > >The other possibility is to use lookup tables that the compiler generates >statically, so this is cheap at run-time, but incurs some code overhead. If >you generate an exception, you have to somehow look up where to transfer >execution. Don't ask me how exactly this works, but it is to my best >knowledge how dwarf2 exception unwinding works. Corrections on this topic >my more knowledgable people are certainly welcome. > >Now back to the question how we can figure out what the problem is: if >using -fno-exceptions doesn't work, is there a possibility you repeat >your experiments with an octave version prior to the introduction of >exceptions? > >W. > > >
Subject: Re: octave built under Cygwin very slow Wolfgang and Danny, Did you get our recent correspondence on this? We cracked the source problem and have provided sample demo code. Do you want that I send it again? Paul Thomas
I can confirm this on linux, too. Here is what I get: --------------- g/x> /home/bangerth/bin/gcc-3.2.3/bin/c++ -O2 x.cc g/x> ./a.out done looping time 1=0.98 done looping time 2=0.64 g/x> /home/bangerth/bin/gcc-3.3.4-pre/bin/c++ -O2 x.cc g/x> ./a.out done looping time 1=0.98 done looping time 2=0.64 g/x> /home/bangerth/bin/gcc-3.4-pre/bin/c++ -O2 x.cc g/x> ./a.out done looping time 1=0.99 done looping time 2=0.67 g/x> /home/bangerth/bin/gcc-3.5-pre/bin/c++ -O2 x.cc g/x> ./a.out done looping time 1=0.97 done looping time 2=0.67 g/x> icc -O2 x.cc g/x> ./a.out done looping time 1=0.97 done looping time 2=0.7 ----------------------- I find this very startling. This PR has gone quite a distance from its original problem -- would you mind closing this one, opening another one with title "new is 50% slower than malloc" or something similar, in component "libstdc++", and post your testcase and if you want my results above there? This way we would have a clean slate again, and would know what to focus on. Leave a mark in the new PR that this came out of PR 14563. Thanks Wolfgang
This is what I get on mingw32, 3.4.0 20040327 (prerelease) Average of 12 runs, which gave very consistent results. Built with -enable-sjlj malloc: 2.1885 new: 2.8319 Built with -disable-sjlj (startup code modified to allow Dwarf2 EH to work) malloc: 2.2017 new: 2.3318 FWIW, the DW2 built exe (260kb) was also smaller than the sjlj exe (290kb). This is with static libgcc and libstdc++. Danny
Just curious: what happens for the scalar version myarray = new double; .. .. delete myarray; (which seems more appropriate for a single double)?? Also, please remove that 'if(myarray = new double [1]) == NULL)', I really can't bear it ;)
Re: 'if(myarray = new double [1]) == NULL)' Yes, me too :-) You need a rather old compiler (or libstdc++) that ever goes into the if-branch... W.
Subject: Re: octave built under Cygwin very slow This is a multi-part message in MIME format.
I inlined all allocation operators and they inproved from 2.393s to 1.922s (C allocation style: 2.013s). Note that I also changed the test program to allocate an array of 100 unsigned ints. The problem with inlining them is that this can only work if <new> is included, so please don't understand this as a patch, but as an idea/explanation why new is slower than malloc. Reading specs from /usr/local/lib/gcc/i686-pc-cygwin/3.5-tree-ssa/specs Configured with: ./configure --disable-libmudflap --without-libbanshee --disable-checking --enable-languages=c,c++ --disable-threads : (reconfigured) : (reconfigured) ./configure --disable-libmudflap --without-libbanshee --disable-checking --enable-languages=c,c++ --disable-threads : (reconfigured) : (reconfigured) ./configure --disable-libmudflap --without-libbanshee --disable-checking --enable-languages=c,c++ --disable-threads : (reconfigured) : (reconfigured) ./configure --disable-libmudflap --without-libbanshee --disable-checking --enable-languages=c,c++ --disable-threads Thread model: single gcc version 3.5-tree-ssa 20040403 (merged 20040331) #include <iostream> #include <stdio.h> #include <time.h> using namespace std; int main() { const size_t array_size = 100; const unsigned loop_count = 1000000; long t1 = clock(); for (unsigned iloop = 0; iloop < loop_count; iloop++) { unsigned *myarray = new unsigned [array_size]; delete [] myarray; } long t2 = clock(); double delt1 = (double)( t2 - t1 )/ (double)(CLOCKS_PER_SEC); cout << "done looping time 1=" << delt1 << endl; long t3 = clock(); for (unsigned iloop = 0; iloop < loop_count; iloop++) { unsigned *myarray = (unsigned *)malloc(array_size * sizeof(unsigned)); if (myarray== NULL) { printf("alloc failed\n"); exit(1); } else free (myarray); } long t4 = clock(); double delt2 = (double)( t4 - t3 )/ (double)(CLOCKS_PER_SEC); cout << "done looping time 2=" << delt2 << endl; return 0; } Index: gcc/libstdc++-v3/libsupc++/del_op.cc =================================================================== RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/del_op.cc,v retrieving revision 1.2.22.1 diff -u -r1.2.22.1 del_op.cc --- gcc/libstdc++-v3/libsupc++/del_op.cc 3 Jun 2003 16:53:00 -0000 1.2.22.1 +++ gcc/libstdc++-v3/libsupc++/del_op.cc 3 Apr 2004 17:11:53 -0000 @@ -30,11 +30,3 @@ #include "new" -extern "C" void free (void *); - -void -operator delete (void *ptr) throw () -{ - if (ptr) - free (ptr); -} Index: gcc/libstdc++-v3/libsupc++/del_opnt.cc =================================================================== RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/del_opnt.cc,v retrieving revision 1.2.22.1 diff -u -r1.2.22.1 del_opnt.cc --- gcc/libstdc++-v3/libsupc++/del_opnt.cc 3 Jun 2003 16:53:00 -0000 1.2.22.1 +++ gcc/libstdc++-v3/libsupc++/del_opnt.cc 3 Apr 2004 17:11:53 -0000 @@ -30,11 +30,3 @@ #include "new" -extern "C" void free (void *); - -void -operator delete (void *ptr, const std::nothrow_t&) throw () -{ - if (ptr) - free (ptr); -} Index: gcc/libstdc++-v3/libsupc++/del_opv.cc =================================================================== RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/del_opv.cc,v retrieving revision 1.2.22.1 diff -u -r1.2.22.1 del_opv.cc --- gcc/libstdc++-v3/libsupc++/del_opv.cc 3 Jun 2003 16:53:00 -0000 1.2.22.1 +++ gcc/libstdc++-v3/libsupc++/del_opv.cc 3 Apr 2004 17:11:53 -0000 @@ -29,9 +29,3 @@ // the GNU General Public License. #include "new" - -void -operator delete[] (void *ptr) throw () -{ - ::operator delete (ptr); -} Index: gcc/libstdc++-v3/libsupc++/del_opvnt.cc =================================================================== RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/del_opvnt.cc,v retrieving revision 1.2.22.1 diff -u -r1.2.22.1 del_opvnt.cc --- gcc/libstdc++-v3/libsupc++/del_opvnt.cc 3 Jun 2003 16:53:00 -0000 1.2.22.1 +++ gcc/libstdc++-v3/libsupc++/del_opvnt.cc 3 Apr 2004 17:11:53 -0000 @@ -29,9 +29,3 @@ // the GNU General Public License. #include "new" - -void -operator delete[] (void *ptr, const std::nothrow_t&) throw () -{ - ::operator delete (ptr); -} Index: gcc/libstdc++-v3/libsupc++/new =================================================================== RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new,v retrieving revision 1.10.2.5 diff -u -r1.10.2.5 new --- gcc/libstdc++-v3/libsupc++/new 21 Jul 2003 13:54:08 -0000 1.10.2.5 +++ gcc/libstdc++-v3/libsupc++/new 3 Apr 2004 17:11:53 -0000 @@ -39,6 +39,7 @@ #define _NEW #include <cstddef> +#include <cstdlib> #include <exception> extern "C++" { @@ -68,6 +69,10 @@ new_handler set_new_handler(new_handler) throw(); } // namespace std + +void* __operator_new(std::size_t) throw (std::bad_alloc); +void* __operator_new_nothrow(std::size_t) throw (); + //@{ /** These are replaceable signatures: * - normal single new and delete (no arguments, throw @c bad_alloc on error) @@ -79,14 +84,55 @@ * Placement new and delete signatures (take a memory address argument, * does nothing) may not be replaced by a user's program. */ -void* operator new(std::size_t) throw (std::bad_alloc); -void* operator new[](std::size_t) throw (std::bad_alloc); -void operator delete(void*) throw(); -void operator delete[](void*) throw(); -void* operator new(std::size_t, const std::nothrow_t&) throw(); -void* operator new[](std::size_t, const std::nothrow_t&) throw(); -void operator delete(void*, const std::nothrow_t&) throw(); -void operator delete[](void*, const std::nothrow_t&) throw(); +inline void* operator new(std::size_t sz) throw (std::bad_alloc) +{ + /* malloc (0) is unpredictable; avoid it. */ + if (sz == 0) + sz = 1; + void *p = std::malloc (sz); + if (!p) + p = __operator_new(sz); + + return p; +} +inline void* operator new[] (std::size_t sz) throw (std::bad_alloc) +{ + return ::operator new(sz); +} +inline void operator delete (void *ptr) throw () +{ + if (ptr) + std::free (ptr); +} +inline void operator delete[] (void *ptr) throw () +{ + ::operator delete (ptr); +} + +inline void* operator new (std::size_t sz, const std::nothrow_t&) throw() +{ + /* malloc (0) is unpredictable; avoid it. */ + if (sz == 0) + sz = 1; + void *p = std::malloc (sz); + if (!p) + p = __operator_new_nothrow(sz); + + return p; +} +inline void* operator new[] (std::size_t sz, const std::nothrow_t& nothrow) throw() +{ + return ::operator new(sz, nothrow); +} +inline void operator delete (void *ptr, const std::nothrow_t&) throw () +{ + if (ptr) + std::free (ptr); +} +inline void operator delete[] (void *ptr, const std::nothrow_t&) throw () +{ + ::operator delete (ptr); +} // Default placement versions of operator new. inline void* operator new(std::size_t, void* __p) throw() { return __p; } Index: gcc/libstdc++-v3/libsupc++/new_op.cc =================================================================== RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new_op.cc,v retrieving revision 1.5.2.1 diff -u -r1.5.2.1 new_op.cc --- gcc/libstdc++-v3/libsupc++/new_op.cc 3 Jun 2003 16:53:00 -0000 1.5.2.1 +++ gcc/libstdc++-v3/libsupc++/new_op.cc 3 Apr 2004 17:11:53 -0000 @@ -37,27 +37,22 @@ extern new_handler __new_handler; -void * -operator new (std::size_t sz) throw (std::bad_alloc) +void* __operator_new(std::size_t sz) throw (std::bad_alloc) { void *p; - - /* malloc (0) is unpredictable; avoid it. */ - if (sz == 0) - sz = 1; - p = (void *) malloc (sz); - while (p == 0) + do { - new_handler handler = __new_handler; + std::new_handler handler = __new_handler; if (! handler) #ifdef __EXCEPTIONS - throw bad_alloc(); + throw std::bad_alloc(); #else std::abort(); #endif handler (); - p = (void *) malloc (sz); + p = std::malloc (sz); } + while (!p); return p; } Index: gcc/libstdc++-v3/libsupc++/new_opnt.cc =================================================================== RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new_opnt.cc,v retrieving revision 1.3.22.1 diff -u -r1.3.22.1 new_opnt.cc --- gcc/libstdc++-v3/libsupc++/new_opnt.cc 3 Jun 2003 16:53:00 -0000 1.3.22.1 +++ gcc/libstdc++-v3/libsupc++/new_opnt.cc 3 Apr 2004 17:11:53 -0000 @@ -36,31 +36,26 @@ extern "C" void *malloc (std::size_t); extern new_handler __new_handler; -void * -operator new (std::size_t sz, const std::nothrow_t&) throw() +void* __operator_new_nothrow(std::size_t sz) throw () { void *p; - - /* malloc (0) is unpredictable; avoid it. */ - if (sz == 0) - sz = 1; - p = (void *) malloc (sz); - while (p == 0) + do { - new_handler handler = __new_handler; + std::new_handler handler = __new_handler; if (! handler) - return 0; + return 0; try - { - handler (); - } - catch (bad_alloc &) - { - return 0; - } + { + handler (); + } + catch (std::bad_alloc &) + { + return 0; + } - p = (void *) malloc (sz); + p = std::malloc (sz); } + while (!p); return p; } Index: gcc/libstdc++-v3/libsupc++/new_opv.cc =================================================================== RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new_opv.cc,v retrieving revision 1.3.22.1 diff -u -r1.3.22.1 new_opv.cc --- gcc/libstdc++-v3/libsupc++/new_opv.cc 3 Jun 2003 16:53:00 -0000 1.3.22.1 +++ gcc/libstdc++-v3/libsupc++/new_opv.cc 3 Apr 2004 17:11:53 -0000 @@ -30,8 +30,3 @@ #include "new" -void * -operator new[] (std::size_t sz) throw (std::bad_alloc) -{ - return ::operator new(sz); -} Index: gcc/libstdc++-v3/libsupc++/new_opvnt.cc =================================================================== RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new_opvnt.cc,v retrieving revision 1.3.22.1 diff -u -r1.3.22.1 new_opvnt.cc --- gcc/libstdc++-v3/libsupc++/new_opvnt.cc 3 Jun 2003 16:53:00 -0000 1.3.22.1 +++ gcc/libstdc++-v3/libsupc++/new_opvnt.cc 3 Apr 2004 17:11:53 -0000 @@ -30,8 +30,3 @@ #include "new" -void * -operator new[] (std::size_t sz, const std::nothrow_t& nothrow) throw() -{ - return ::operator new(sz, nothrow); -}
Subject: Re: octave built under Cygwin very slow Hi, It's good to know that you are enagaging with this one. I sent a message this morning that some how got deleted in the process of being sent; it's main content was not to forget that the most extreme manifestation of the difference is with g++ 3.3.1 (cygming special) for which new/delete takes 1900ns/loop malloc/erase 400ns/loop This is how we detected this in the first place. For reference, on the same Athlon 1700, 3.2.2 20030222 (RH 3.2.2-5) Gives new/delete 140ns/loop malloc/erase 100ns/loop Unfortunately, I just this morning deleted the g++ 3.3.1 and replaced it with 3.2 (which does not show such aberrant behaviour, by the way) , so I cannot test your patch! Paul Thomas epanelelytha at kellertimo dot de wrote: >------- Additional Comments From epanelelytha at kellertimo dot de 2004-04-03 17:18 ------- >I inlined all allocation operators and they inproved from 2.393s to 1.922s (C >allocation style: 2.013s). Note that I also changed the test program to allocate >an array of 100 unsigned ints. >The problem with inlining them is that this can only work if <new> is included, >so please don't understand this as a patch, but as an idea/explanation why new >is slower than malloc. > >Reading specs from /usr/local/lib/gcc/i686-pc-cygwin/3.5-tree-ssa/specs >Configured with: ./configure --disable-libmudflap --without-libbanshee >--disable-checking --enable-languages=c,c++ --disable-threads : (reconfigured) >: (reconfigured) ./configure --disable-libmudflap > --without-libbanshee --disable-checking --enable-languages=c,c++ >--disable-threads : (reconfigured) > : (reconfigured) ./configure --disable-libmudflap --without-libbanshee >--disable-checking --enable-languages=c,c++ --disable-threads : (reconfigured) >: (reconfigured) ./configure --disable-libmudflap --without-libbanshee >--disable-checking --enable-languages=c,c++ --disable-threads >Thread model: single >gcc version 3.5-tree-ssa 20040403 (merged 20040331) > >#include <iostream> >#include <stdio.h> >#include <time.h> >using namespace std; > >int main() >{ > const size_t array_size = 100; > const unsigned loop_count = 1000000; > long t1 = clock(); > for (unsigned iloop = 0; iloop < loop_count; iloop++) > { > unsigned *myarray = new unsigned [array_size]; > delete [] myarray; > } > long t2 = clock(); > double delt1 = (double)( t2 - t1 )/ (double)(CLOCKS_PER_SEC); > cout << "done looping time 1=" << delt1 << endl; > long t3 = clock(); > > for (unsigned iloop = 0; iloop < loop_count; iloop++) > { > unsigned *myarray = (unsigned *)malloc(array_size * sizeof(unsigned)); > if (myarray== NULL) { printf("alloc failed\n"); exit(1); } > else free (myarray); > } > long t4 = clock(); > double delt2 = (double)( t4 - t3 )/ (double)(CLOCKS_PER_SEC); > cout << "done looping time 2=" << delt2 << endl; > > return 0; >} > > > >Index: gcc/libstdc++-v3/libsupc++/del_op.cc >=================================================================== >RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/del_op.cc,v >retrieving revision 1.2.22.1 >diff -u -r1.2.22.1 del_op.cc >--- gcc/libstdc++-v3/libsupc++/del_op.cc 3 Jun 2003 16:53:00 -0000 1.2.22.1 >+++ gcc/libstdc++-v3/libsupc++/del_op.cc 3 Apr 2004 17:11:53 -0000 >@@ -30,11 +30,3 @@ > > #include "new" > >-extern "C" void free (void *); >- >-void >-operator delete (void *ptr) throw () >-{ >- if (ptr) >- free (ptr); >-} >Index: gcc/libstdc++-v3/libsupc++/del_opnt.cc >=================================================================== >RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/del_opnt.cc,v >retrieving revision 1.2.22.1 >diff -u -r1.2.22.1 del_opnt.cc >--- gcc/libstdc++-v3/libsupc++/del_opnt.cc 3 Jun 2003 16:53:00 -0000 1.2.22.1 >+++ gcc/libstdc++-v3/libsupc++/del_opnt.cc 3 Apr 2004 17:11:53 -0000 >@@ -30,11 +30,3 @@ > > #include "new" > >-extern "C" void free (void *); >- >-void >-operator delete (void *ptr, const std::nothrow_t&) throw () >-{ >- if (ptr) >- free (ptr); >-} >Index: gcc/libstdc++-v3/libsupc++/del_opv.cc >=================================================================== >RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/del_opv.cc,v >retrieving revision 1.2.22.1 >diff -u -r1.2.22.1 del_opv.cc >--- gcc/libstdc++-v3/libsupc++/del_opv.cc 3 Jun 2003 16:53:00 -0000 1.2.22.1 >+++ gcc/libstdc++-v3/libsupc++/del_opv.cc 3 Apr 2004 17:11:53 -0000 >@@ -29,9 +29,3 @@ > // the GNU General Public License. > > #include "new" >- >-void >-operator delete[] (void *ptr) throw () >-{ >- ::operator delete (ptr); >-} >Index: gcc/libstdc++-v3/libsupc++/del_opvnt.cc >=================================================================== >RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/del_opvnt.cc,v >retrieving revision 1.2.22.1 >diff -u -r1.2.22.1 del_opvnt.cc >--- gcc/libstdc++-v3/libsupc++/del_opvnt.cc 3 Jun 2003 16:53:00 -0000 1.2.22.1 >+++ gcc/libstdc++-v3/libsupc++/del_opvnt.cc 3 Apr 2004 17:11:53 -0000 >@@ -29,9 +29,3 @@ > // the GNU General Public License. > > #include "new" >- >-void >-operator delete[] (void *ptr, const std::nothrow_t&) throw () >-{ >- ::operator delete (ptr); >-} >Index: gcc/libstdc++-v3/libsupc++/new >=================================================================== >RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new,v >retrieving revision 1.10.2.5 >diff -u -r1.10.2.5 new >--- gcc/libstdc++-v3/libsupc++/new 21 Jul 2003 13:54:08 -0000 1.10.2.5 >+++ gcc/libstdc++-v3/libsupc++/new 3 Apr 2004 17:11:53 -0000 >@@ -39,6 +39,7 @@ > #define _NEW > > #include <cstddef> >+#include <cstdlib> > #include <exception> > > extern "C++" { >@@ -68,6 +69,10 @@ > new_handler set_new_handler(new_handler) throw(); > } // namespace std > >+ >+void* __operator_new(std::size_t) throw (std::bad_alloc); >+void* __operator_new_nothrow(std::size_t) throw (); >+ > //@{ > /** These are replaceable signatures: > * - normal single new and delete (no arguments, throw @c bad_alloc on error) >@@ -79,14 +84,55 @@ > * Placement new and delete signatures (take a memory address argument, > * does nothing) may not be replaced by a user's program. > */ >-void* operator new(std::size_t) throw (std::bad_alloc); >-void* operator new[](std::size_t) throw (std::bad_alloc); >-void operator delete(void*) throw(); >-void operator delete[](void*) throw(); >-void* operator new(std::size_t, const std::nothrow_t&) throw(); >-void* operator new[](std::size_t, const std::nothrow_t&) throw(); >-void operator delete(void*, const std::nothrow_t&) throw(); >-void operator delete[](void*, const std::nothrow_t&) throw(); >+inline void* operator new(std::size_t sz) throw (std::bad_alloc) >+{ >+ /* malloc (0) is unpredictable; avoid it. */ >+ if (sz == 0) >+ sz = 1; >+ void *p = std::malloc (sz); >+ if (!p) >+ p = __operator_new(sz); >+ >+ return p; >+} >+inline void* operator new[] (std::size_t sz) throw (std::bad_alloc) >+{ >+ return ::operator new(sz); >+} >+inline void operator delete (void *ptr) throw () >+{ >+ if (ptr) >+ std::free (ptr); >+} >+inline void operator delete[] (void *ptr) throw () >+{ >+ ::operator delete (ptr); >+} >+ >+inline void* operator new (std::size_t sz, const std::nothrow_t&) throw() >+{ >+ /* malloc (0) is unpredictable; avoid it. */ >+ if (sz == 0) >+ sz = 1; >+ void *p = std::malloc (sz); >+ if (!p) >+ p = __operator_new_nothrow(sz); >+ >+ return p; >+} >+inline void* operator new[] (std::size_t sz, const std::nothrow_t& nothrow) throw() >+{ >+ return ::operator new(sz, nothrow); >+} >+inline void operator delete (void *ptr, const std::nothrow_t&) throw () >+{ >+ if (ptr) >+ std::free (ptr); >+} >+inline void operator delete[] (void *ptr, const std::nothrow_t&) throw () >+{ >+ ::operator delete (ptr); >+} > > // Default placement versions of operator new. > inline void* operator new(std::size_t, void* __p) throw() { return __p; } >Index: gcc/libstdc++-v3/libsupc++/new_op.cc >=================================================================== >RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new_op.cc,v >retrieving revision 1.5.2.1 >diff -u -r1.5.2.1 new_op.cc >--- gcc/libstdc++-v3/libsupc++/new_op.cc 3 Jun 2003 16:53:00 -0000 1.5.2.1 >+++ gcc/libstdc++-v3/libsupc++/new_op.cc 3 Apr 2004 17:11:53 -0000 >@@ -37,27 +37,22 @@ > > extern new_handler __new_handler; > >-void * >-operator new (std::size_t sz) throw (std::bad_alloc) >+void* __operator_new(std::size_t sz) throw (std::bad_alloc) > { > void *p; >- >- /* malloc (0) is unpredictable; avoid it. */ >- if (sz == 0) >- sz = 1; >- p = (void *) malloc (sz); >- while (p == 0) >+ do > { >- new_handler handler = __new_handler; >+ std::new_handler handler = __new_handler; > if (! handler) > #ifdef __EXCEPTIONS >- throw bad_alloc(); >+ throw std::bad_alloc(); > #else > std::abort(); > #endif > handler (); >- p = (void *) malloc (sz); >+ p = std::malloc (sz); > } >+ while (!p); > > return p; > } >Index: gcc/libstdc++-v3/libsupc++/new_opnt.cc >=================================================================== >RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new_opnt.cc,v >retrieving revision 1.3.22.1 >diff -u -r1.3.22.1 new_opnt.cc >--- gcc/libstdc++-v3/libsupc++/new_opnt.cc 3 Jun 2003 16:53:00 -0000 1.3.22.1 >+++ gcc/libstdc++-v3/libsupc++/new_opnt.cc 3 Apr 2004 17:11:53 -0000 >@@ -36,31 +36,26 @@ > extern "C" void *malloc (std::size_t); > extern new_handler __new_handler; > >-void * >-operator new (std::size_t sz, const std::nothrow_t&) throw() >+void* __operator_new_nothrow(std::size_t sz) throw () > { > void *p; >- >- /* malloc (0) is unpredictable; avoid it. */ >- if (sz == 0) >- sz = 1; >- p = (void *) malloc (sz); >- while (p == 0) >+ do > { >- new_handler handler = __new_handler; >+ std::new_handler handler = __new_handler; > if (! handler) >- return 0; >+ return 0; > try >- { >- handler (); >- } >- catch (bad_alloc &) >- { >- return 0; >- } >+ { >+ handler (); >+ } >+ catch (std::bad_alloc &) >+ { >+ return 0; >+ } > >- p = (void *) malloc (sz); >+ p = std::malloc (sz); > } >+ while (!p); > > return p; > } >Index: gcc/libstdc++-v3/libsupc++/new_opv.cc >=================================================================== >RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new_opv.cc,v >retrieving revision 1.3.22.1 >diff -u -r1.3.22.1 new_opv.cc >--- gcc/libstdc++-v3/libsupc++/new_opv.cc 3 Jun 2003 16:53:00 -0000 1.3.22.1 >+++ gcc/libstdc++-v3/libsupc++/new_opv.cc 3 Apr 2004 17:11:53 -0000 >@@ -30,8 +30,3 @@ > > #include "new" > >-void * >-operator new[] (std::size_t sz) throw (std::bad_alloc) >-{ >- return ::operator new(sz); >-} >Index: gcc/libstdc++-v3/libsupc++/new_opvnt.cc >=================================================================== >RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/new_opvnt.cc,v >retrieving revision 1.3.22.1 >diff -u -r1.3.22.1 new_opvnt.cc >--- gcc/libstdc++-v3/libsupc++/new_opvnt.cc 3 Jun 2003 16:53:00 -0000 1.3.22.1 >+++ gcc/libstdc++-v3/libsupc++/new_opvnt.cc 3 Apr 2004 17:11:53 -0000 >@@ -30,8 +30,3 @@ > > #include "new" > >-void * >-operator new[] (std::size_t sz, const std::nothrow_t& nothrow) throw() >-{ >- return ::operator new(sz, nothrow); >-} > > >
(In reply to comment #28) > Subject: Re: octave built under Cygwin very slow > > Hi, > > It's good to know that you are enagaging with this one. I sent a > message this morning that some how got deleted in the process of being > sent; it's main content was not to forget that the most extreme > manifestation of the difference is with g++ 3.3.1 (cygming special) for > which > > new/delete takes 1900ns/loop > malloc/erase 400ns/loop > > This is how we detected this in the first place. > > For reference, on the same Athlon 1700, 3.2.2 20030222 (RH 3.2.2-5) > > Gives > > new/delete 140ns/loop > malloc/erase 100ns/loop > > Unfortunately, I just this morning deleted the g++ 3.3.1 and replaced it > with 3.2 (which does not show such aberrant behaviour, by the way) , so > I cannot test your patch! But you can still test it with 3.2 (I'm using 3.5-tree-ssa 20040403).
Subject: Re: octave built under Cygwin very slow Sorry, yes, you are right, I can test it but not to see it's effect on that gruesome cyming special. Paul epanelelytha at kellertimo dot de wrote: >------- Additional Comments From epanelelytha at kellertimo dot de 2004-04-03 18:00 ------- >(In reply to comment #28) > > >>Subject: Re: octave built under Cygwin very slow >> >>Hi, >> >>It's good to know that you are enagaging with this one. I sent a >>message this morning that some how got deleted in the process of being >>sent; it's main content was not to forget that the most extreme >>manifestation of the difference is with g++ 3.3.1 (cygming special) for >>which >> >>new/delete takes 1900ns/loop >>malloc/erase 400ns/loop >> >>This is how we detected this in the first place. >> >>For reference, on the same Athlon 1700, 3.2.2 20030222 (RH 3.2.2-5) >> >>Gives >> >>new/delete 140ns/loop >>malloc/erase 100ns/loop >> >>Unfortunately, I just this morning deleted the g++ 3.3.1 and replaced it >>with 3.2 (which does not show such aberrant behaviour, by the way) , so >>I cannot test your patch! >> >> >But you can still test it with 3.2 (I'm using 3.5-tree-ssa 20040403). > > >
No feedback in 3 months
Subject: Re: new/delete much slower than malloc/free Dear All, I do apologise, I missed the necessity to feedback and was supposing that the necessary fixes would feed through to the next gcc release bundled in with Cygwin. As far as I was concerned, the problem was sufficiently "fixed" by reverting to 3.2. Obviously, I will take a look at the patches and see how they can be applied to octave. If any blinding flashes of inspiration occur, I will report back to you asap. One thing that I get no sense of from the thread is why the Mingw/Cygwin gcc-3.3 is so very bad, even in comparison with what you guys were finding. Perhaps, I should try the tests with -sjlj disabled first, since the profiling at first mislead us (well, me, at least) into believing that the problem lay entirely there? Anyway, I will find a "victim" machine onto which to install Cygwin and gcc-3.3 (I might as well make the problem as bad as possible!). Best regards and thanks for your efforts. Paul Thomas ----- Original Message ----- From: "pinskia at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> To: <paulthomas2@wanadoo.fr> Sent: Monday, July 12, 2004 4:50 PM Subject: [Bug libstdc++/14563] new/delete much slower than malloc/free > > ------- Additional Comments From pinskia at gcc dot gnu dot org 2004-07-12 14:50 ------- > No feedback in 3 months > > -- > What |Removed |Added > -------------------------------------------------------------------------- -- > Status|WAITING |RESOLVED > Resolution| |INVALID > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14563 > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter.
This doesn't seem to be resolved at all, so let's not close it.
Subject: Re: new/delete much slower than malloc/free 'tis what I thought! What would you like me to do? Paul T ----- Original Message ----- From: "bangerth at dealii dot org" <gcc-bugzilla@gcc.gnu.org> To: <paulthomas2@wanadoo.fr> Sent: Monday, July 12, 2004 10:55 PM Subject: [Bug libstdc++/14563] new/delete much slower than malloc/free > > ------- Additional Comments From bangerth at dealii dot org 2004-07-12 20:55 ------- > This doesn't seem to be resolved at all, so let's not close it. > > -- > What |Removed |Added > -------------------------------------------------------------------------- -- > Status|RESOLVED |UNCONFIRMED > Resolution|INVALID | > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14563 > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter.
I've been experiencing a severe performance problem on Cygwin with gcc 3.4.2 in which programs compiled with enable-sjlj-exceptions are 6 times slower than disable-sjlj-exceptions. After reading the thread here I experimented with the new/malloc test case posted previously but modified so that the whole thing is wrapped in a try block. Here are the results, all run on the same machine (2Ghz P4) with array_size = 100, loop_count = 3000000. Intel Windows C++ done looping time 1=1.14 done looping time 2=1.125 VMWare SUSE Linux 9.1 gcc 3.4.2 (default sjlj-exceptions, presumably disable-) done looping time 1=0.75 done looping time 2=0.62 Cygwin gcc 3.4.2 disable-sjlj-exceptions done looping time 1=5.516 done looping time 2=5.265 Cygwin gcc 3.4.2 enable-sjlj-exceptions done looping time 1=16.953 done looping time 2=5.328 Cygwin distribution gcc 3.3.1 (enable-sjlj-exceptions) done looping time 1=17.016 done looping time 2=5.328 There seem to be 2 problems with Cygwin. First, both new & malloc are 5 or 6 times slower than on Linux or using Intel. Second, enable-sjlj-exceptions slows down new by another factor of 3 on top of this. The full configuration for the Cygwin enable case is Configured with: ../gcc/configure --prefix=/gcc-3.4 --with-gcc --with-gnu-ld -- with-gnu-as --enable-languages=c,c++,f77 --enable-libgcj --enable-threads=posix --with-system-zlib --enable-nls --without-included-gettext --enable-interpreter --enable-sjlj-exceptions --disable-version-specific-runtime-libs --enable-shared --disable-win32-registry --enable-java-gc=boehm --disable-hash-synchronization - -verbose --target=i686-pc-cygwin --host=i686-pc-cygwin --build=i686-pc-cygwin Thread model: posix gcc version 3.4.2 20040720 (prerelease) My own applications are very array-intensive so it's not immediately obvious that new is the culprit in my case, but there's probably a connection on the sjlj problem. Ron Hylton
One more case: Cygwin distribution gcc 3.3.1 -mno-cygwin done looping time 1=1.766 done looping time 2=1.406
I think basically you are messed up untill Cygwin switches to dwarf2 exceptions.
Andrew, I was afraid of that. Thanks for confirming it.
Subject: Re: new/delete much slower than malloc/free Ron, I am just back from California and found this in my in-tray. I had great difficulty with my french ISP, whilst I was there, and had a number of wierd replies like this one. Is there some comment on this problem from "Andrew" that I missed or were you replying to my comment about the problem being absent in gcc-3.2? I will give your -mno-cygwin a go with gcc-3.2, as soon as I have dealt with an overloaded in-tray. Paul ron_hylton at hotmail dot com wrote: >------- Additional Comments From ron_hylton at hotmail dot com 2004-07-29 04:23 ------- >Andrew, I was afraid of that. Thanks for confirming it. > > >
Ron, can you please attach your testcase that shows the problem to this PR? This PR is a regression on cygwin because the speed is back with 3.2.
*** Bug 18414 has been marked as a duplicate of this bug. ***
No its not a regression. GCC-3.2 built with sjlj shows the same problem. The "fast" version of GCC-3.2 that OP referenced was a "cygming-special" that had Dwarf-2 EH enabled. As I indicated ealier, this experiment was dropped because of problems with Win32 API callbacks and DW-2 EH Danny
(In reply to comment #40) > Ron, can you please attach your testcase that shows the problem to this PR? > > This PR is a regression on cygwin because the speed is back with 3.2. This is the test case I was using: #include <iostream> #include <stdio.h> #include <time.h> #include <string> using namespace std; int main() { int array_size = 100; int loop_count = 3000000; try { long t1 = clock(); for (int iloop = 0; iloop < loop_count; iloop++) { int *myarray = new int [array_size]; delete [] myarray; } long t2 = clock(); double delt1 = (double)( t2 - t1 )/ (double)(CLOCKS_PER_SEC); cout << "done looping time 1=" << delt1 << endl; long t3 = clock(); for (int jloop = 0; jloop < loop_count; jloop++) { int *myarray = (int *)malloc(array_size * sizeof(int)); if (myarray== NULL) { printf("alloc failed\n"); exit(1); } else free (myarray); } long t4 = clock(); double delt2 = (double)( t4 - t3 )/ (double)(CLOCKS_PER_SEC); cout << "done looping time 2=" << delt2 << endl; } catch (...) { cout << "exception" << std::endl; return 1; } return 0; }
(In reply to comment #40) > Ron, can you please attach your testcase that shows the problem to this PR? > This PR is a regression on cygwin because the speed is back with 3.2. Here's a test case for you... -Ken ------------------------------------------------------- // Uncomment one of these defines. // With the first define uncommented, I get 3.293 usec per "operator new" use. // With the second define uncommented, I get 1.019 usec per "operator new" use. // A high price to pay for having one's exceptions properly declared! //#define THROW throw (std::bad_alloc) #define THROW // These definitions are taken straight from libstdc++. #include "new" #include <exception_defines.h> using std::new_handler; using std::bad_alloc; extern "C" void *malloc (std::size_t); extern new_handler __new_handler; void * operator new (std::size_t sz) THROW { void *p; /* malloc (0) is unpredictable; avoid it. */ if (sz == 0) sz = 1; p = (void *) malloc (sz); while (p == 0) { new_handler handler = __new_handler; if (! handler) #ifdef __EXCEPTIONS throw bad_alloc(); #else std::abort(); #endif handler (); p = (void *) malloc (sz); } return p; } void * operator new[] (std::size_t sz) THROW { return ::operator new(sz); } #include <string.h> #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <assert.h> typedef unsigned long long u64; typedef u64 Usec; #ifdef WIN32 #include <Windows.h> inline Usec Now() { DWORD ticks = GetTickCount(); return ((Usec) ticks) * 1000; } #else #include <sys/types.h> #include <sys/time.h> inline Usec Now() { struct timeval tv; if( gettimeofday( &tv, 0 ) ) { perror( "gettimeofday" ); exit( 1 ); } return ((Usec) tv.tv_sec) * 1000000 + tv.tv_usec; } #endif using namespace std; main() { int sizeMin = 4; int sizeMax = 100; int allocsOutstanding = 1000; int reps = 1000; int allocsPerRep = 1000; int sizeRange = sizeMax - sizeMin; char ** ptrs = (char **) malloc( sizeof( char * ) * allocsOutstanding ); memset( ptrs, 0, sizeof( char * ) * allocsOutstanding ); Usec start = Now(); int m = reps; while( m-- ) { int n = allocsPerRep; while( n-- ) { int r = rand(); int index = r % allocsOutstanding; char * p = ptrs[index]; delete[] p; // free( p ); int size = (r % sizeRange) + sizeMin; p = new char[ size ]; // p = (char *) malloc( size ); ptrs[index] = p; } } Usec stop = Now(); double t = ((double) stop - start) / ((double) allocsPerRep * reps); printf( "cost of new + delete is about %0.3f usec\n", t ); fflush( stdout ); }
Subject: Re: new/delete much slower than malloc/free because of sjlj exceptions > Here's a test case for you... > -Ken That's interesting.... Using your test case: (i) gcc 3.2 20020927 ( prerelease) both versions take 0.62micro-sec/new (ii) gcc 3.1.1 (cygming special) I get 2.1 and 0.66micro-sec/new (iii) gcc 4.0.0 20041010 (experimental) I get 0.62 and 0.59micro-sec/new This latter was a tad unexpected - I built in from a snapshot on one of the German mirror sites. Does this imply that I have picked up Dwarf2 as a default? Going back to the beginning of this rather long thread, you will note that it was building octave that first exposed this problem. I think that octave is calling new too many times anyway, for certain types of code, and had started hanging counters on an overloaded new operator. It would not be a big deal to substitute your version and to compare the performance with THROW defined or not. Give me a few days, the build takes a few hours under Cygwin and I have some concreting to do this weekend.... *sigh* Regards Paul Thomas
Subject: Re: new/delete much slower than malloc/free because of sjlj exceptions Thanks, Paul. Let me know if I can help in any way. I appeneded the output of "gcc -v". -Ken =============================================== Reading specs from /usr/lib/gcc-lib/i686-pc-cygwin/3.3.3/specs Configured with: /gcc/gcc-3.3.3-3/configure --verbose --prefix=/usr --exec-prefix=/usr --sysconfdir=/etc --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-languages=c,ada,c++,d,f77,java,objc,pascal --enable-nls --without-included-gettext --enable-libgcj --with-system-zlib --enable-interpreter --enable-threads=posix --enable-java-gc=boehm --enable-sjlj-exceptions --disable-version-specific-runtime-libs --disable-win32-registry Thread model: posix gcc version 3.3.3 (cygwin special) ==================================================== On 13 Nov 2004 11:03:05 -0000, paulthomas2 at wanadoo dot fr <gcc-bugzilla@gcc.gnu.org> wrote: > > That's interesting.... > > Using your test case: > (i) gcc 3.2 20020927 ( prerelease) both versions take 0.62micro-sec/new > (ii) gcc 3.1.1 (cygming special) I get 2.1 and 0.66micro-sec/new > (iii) gcc 4.0.0 20041010 (experimental) I get 0.62 and 0.59micro-sec/new > > This latter was a tad unexpected - I built in from a snapshot on one of the > German mirror sites. Does this imply that I have picked up Dwarf2 as a > default?
Subject: Re: new/delete much slower than malloc/free because of sjlj exceptions Ken, Did you miss the question? Paul >>(iii) gcc 4.0.0 20041010 (experimental) I get 0.62 and 0.59micro-sec/new >> >>This latter was a tad unexpected - I built in from a snapshot on one of the >>German mirror sites. Does this imply that I have picked up Dwarf2 as a >>default? >> >> > > > >
Subject: Re: new/delete much slower than malloc/free because of sjlj exceptions > Did you miss the question? Umm, apparently I did.. the only thing I see in the bug log that looks like a question is this: > Does this imply that I have picked up Dwarf2 as a default? I don't know the answer. The only thing I can say that might be related is that there are assembly statements in my output like "call __Unwind_SjLj_Register"; that (with the --enable-sjlj-exceptions) has led me to believe I'm using SjLj exceptions. Again, let me know if there's anything I can help with. -Ken On 14 Nov 2004 18:04:07 -0000, paulthomas2 at wanadoo dot fr <gcc-bugzilla@gcc.gnu.org> wrote: > > ------- Additional Comments From paulthomas2 at wanadoo dot fr 2004-11-14 18:04 ------- > Subject: Re: new/delete much slower than malloc/free because > of sjlj exceptions > > > Ken, > > Did you miss the question? > > Paul > >
If you used the non throw new, it would become faster.
(In reply to comment #37) > I think basically you are messed up untill Cygwin switches to dwarf2 > exceptions. > This is now (=gcc 4.3) possible by adding --disable-sjlj-exceptions to configure. Can we close with milestone gcc-4.3.0? Danny
As per Danny's suggestion in comment #50 (impressive...)
*** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Page where seen: http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla.