Bug 40852 - [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
Summary: [parallel-mode] parallel sort run time increases ~10 fold when vector size ge...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.4.1
: P3 normal
Target Milestone: 4.4.3
Assignee: Johannes Singler
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-07-24 20:15 UTC by David Jaffe
Modified: 2009-10-28 10:44 UTC (History)
4 users (show)

See Also:
Host: x86_64-unknown-linux-gnu
Target: x86_64-unknown-linux-gnu
Build: x86_64-unknown-linux-gnu
Known to work:
Known to fail:
Last reconfirmed: 2009-10-22 06:57:14


Attachments
Patch replacing uint64_t by double to avoid overflow, for trunk. (294 bytes, patch)
2009-10-22 07:16 UTC, Johannes Singler
Details | Diff
Patch replacing uint64_t by double to avoid overflow, for branch 4.4. (278 bytes, patch)
2009-10-22 07:17 UTC, Johannes Singler
Details | Diff
Patch avoid large intermediates to avoid overflow, for trunk. (651 bytes, patch)
2009-10-23 10:01 UTC, Johannes Singler
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Jaffe 2009-07-24 20:15:09 UTC
Parallel sorts get ~10 times slower as one increases the vector size from 4*10^9 to 5*10^9, perhaps at exactly 2^32, but this wasn't checked.  The example below is for a vector of ints but the same phenomenon is observed on a vector of long longs.

To reproduce (sort_test.cc is below):

0. Adjust 'processors' in sort_test.cc.
1. g++ -O3 -fopenmp sort_test.cc -lgomp
2. ./a.out

output:

58 seconds used in sort [for vector of size 4,000,000,000]
667 seconds used in sort [for vector of size 5,000,000,000]

gcc version information:

crd4% gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-4.4.1/configure --with-gmp=/broad/tools/Linux/x86_64/pkgs/gcc_4.4.1 --with-mpfr=/broad/tools/Linux/x86_64/pkgs/gcc_4.4.1 --prefix=/broad/tools/Linux/x86_64/pkgs/gcc_4.4.1
Thread model: posix
gcc version 4.4.1 (GCC) 
We first observed the problem under gcc 4.3.3.

hardware info:

crd4% uname -a
Linux crd4 2.6.16.54-0.2.5-smp #1 SMP Mon Jan 21 13:29:51 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux
This is a 32-processor machine with 256 GB of memory, but I don't think the problem is 
specific to this architecture.

sort_test.cc:

#include <iostream>
#include <omp.h>
#include <time.h>
#include <vector>
using namespace std;
int main( )
{    for ( long long  m = 4; m <= 5; m++ )
     {    const long long entries = m * (long long) 1000000000;
          const int processors = 32;
          vector<int> x(entries);
          for ( long long i = 0; i < entries; i++ )
               x[i] = (i*i) % 123456789;
          time_t clock1, clock2; time( &clock1 );
          omp_set_num_threads(processors);
          sort( x.begin( ), x.end( ) );
          time( &clock2 );           
          cout << clock2 - clock1 << " seconds used in sort" << endl;    }    }
Comment 1 Richard Biener 2009-07-24 20:29:05 UTC
I suppose you are running into cache effects.  Why do you think this is a GCC bug?
Comment 2 jaffe@broadinstitute.org 2009-07-24 20:43:39 UTC
Subject: Re:  parallel sort run time increases ~10 fold
 when vector size gets over ~4*10^9

If instead of sorting a vec<int>, one sorts a vec<long long>, there is still a ten-fold
slowdown, as one increases the vector size from 4 to 5 billion.  So it's not the total
amount of memory that matters, but rather the number of entries in the vector.  I don't
think this is about cache effects.

Best,

David

============================================================================================

rguenth at gcc dot gnu dot org wrote:
> ------- Comment #1 from rguenth at gcc dot gnu dot org  2009-07-24 20:29 -------
> I suppose you are running into cache effects.  Why do you think this is a GCC
> bug?
> 
> 

Comment 3 Paolo Carlini 2009-07-24 21:15:23 UTC
Out of curiosity, did you try parallel-mode on that machine? Basically, just add -D_GLIBCXX_PARALLEL, but refer to the documentation of course:

http://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html#manual.ext.parallel_mode.intro

I'm also adding Johannes, in CC...

Note, I don't think we have any specific issue with the normal, serial, std::sort...
Comment 4 jaffe@broadinstitute.org 2009-07-24 21:20:06 UTC
Subject: Re:  parallel sort run time increases ~10 fold
 when vector size gets over ~4*10^9

Oh crap, yes I did, and now I see that I accidentally left off the first three lines of sort_test.cc.
They are:

#define _GLIBCXX_PARALLEL
#include <algorithm>
#include <iomanip>

David

=======================================================================================================

paolo dot carlini at oracle dot com wrote:
> ------- Comment #3 from paolo dot carlini at oracle dot com  2009-07-24 21:15 -------
> Out of curiosity, did you try parallel-mode on that machine? Basically, just
> add -D_GLIBCXX_PARALLEL, but refer to the documentation of course:
> 
> http://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html#manual.ext.parallel_mode.intro
> 
> I'm also adding Johannes, in CC...
> 
> Note, I don't think we have any specific issue with the normal, serial,
> std::sort...
> 
> 

Comment 5 Paolo Carlini 2009-07-24 21:23:52 UTC
So this is issue is just that you are not completely happy with the behavior of parallel-mode. Ok... Let's see what Johannes thinks.
Comment 6 Jason Merrill 2009-10-19 18:07:49 UTC
Have you tried selecting a different sort algorithm?  The default seems to be the multi-way mergesort, but there are two quicksort options as well.
Comment 7 Johannes Singler 2009-10-20 07:46:18 UTC
Sorry, the CC has never reached me.  
So concerning comment #4:  Was the parallel mode actually activated?
The multiway mergesort takes another time the space of the input as temporarily.  Sure that the program was not swapping?
Comment 8 jaffe@broadinstitute.org 2009-10-20 10:55:10 UTC
Subject: Re:  [parallel-mode] parallel sort run time
 increases ~10 fold when vector size gets over ~4*10^9

Regarding comment #7, I just ran this now on a machine with 32 processors and 512 GB memory.

(a) Sorting 4 x 10^9 ints took 0.9 minutes.
(b) Sorting 5 x 10^9 ints took 16 minutes.

The second test used about 40 GB, which is a small fraction of the available memory.

(c) Sorting 2.5 x 10^9 structures having 2 ints each took 1.1 minutes.

Regarding comment #6, repeating (a) and (b) with __gnu_parallel::balanced_quicksort_tag( ):

(a') 6.3 minutes
(b') 8.1 minutes,

so the algorithm is slower on these data but does not exhibit the same jump in runtime.
I also tried __gnu_parallel::quicksort_tag( ) which was about the same for (b) [(a) not tested].
Comment 9 Johannes Singler 2009-10-22 06:57:14 UTC
I can reproduce the bug on my machine (2 Quadcore Nehalems, 48GB RAM)

4 x 10^9 ints: 65 seconds used in sort
5 x 10^9 ints: 193 seconds used in sort
Comment 10 Johannes Singler 2009-10-22 07:15:50 UTC
The problem is in multiseq_selection.h, where this line has an overflow

(static_cast<uint64_t>(__total) * __rank / __N - __leftsize)

if (__total * __rank) exceeds 64 bits.  The quick fix is to use a temporary double, which solves the original test case:

4 x 10^9 ints: 64 seconds used in sort
5 x 10^9 ints: 80 seconds used in sort

Find patches for branch (4.4) and trunk (4.5) attached.

However, I do not fully trust the double arithmetics yet, although some test cases work.  Does anybody else know a better way to avoid an overflow in ((a * b) / c) with only integer arithmetics and normal rounding?

Maybe I can find a way to avoid this calculation altogether.
Comment 11 Johannes Singler 2009-10-22 07:16:51 UTC
Created attachment 18862 [details]
Patch replacing uint64_t by double to avoid overflow, for trunk.
Comment 12 Johannes Singler 2009-10-22 07:17:35 UTC
Created attachment 18863 [details]
Patch replacing uint64_t by double to avoid overflow, for branch 4.4.
Comment 13 Johannes Singler 2009-10-22 07:42:34 UTC
(In reply to comment #10)

> However, I do not fully trust the double arithmetics yet, although some test
> cases work.

Er, this sounded a bit pessimistic, all sort tests I have tried so far work with the patch.

And some more explanation:
The overflow resulted in erratic and thus very load balancing in the merge step, causing the huge running times.
Comment 14 Pawel Sikora 2009-10-22 09:01:23 UTC
(In reply to comment #10)

> However, I do not fully trust the double arithmetics yet, although some test
> cases work.  Does anybody else know a better way to avoid an overflow in ((a *
> b) / c) with only integer arithmetics and normal rounding?

you can use a 128-bit integer type on x86-64.
Comment 15 jaffe@broadinstitute.org 2009-10-22 10:22:48 UTC
Subject: Re:  [parallel-mode] parallel sort run time
 increases ~10 fold when vector size gets over ~4*10^9

Wonderful!  Thank you very much for fixing this problem.
Comment 16 Johannes Singler 2009-10-22 16:41:16 UTC
(In reply to comment #14)
> (In reply to comment #10)
> 
> > However, I do not fully trust the double arithmetics yet, although some test
> > cases work.  Does anybody else know a better way to avoid an overflow in ((a *
> > b) / c) with only integer arithmetics and normal rounding?
> 
> you can use a 128-bit integer type on x86-64.

Very good idea.
Do you know a good #ifdef clause to check its availability.  Is it really just x64-64?
Also, I probably want to use it only when really needed, because I assume it to be implemented in software, in particular the division.
Comment 17 Paolo Carlini 2009-10-22 17:46:14 UTC
Is something known about the actual size of a, b, and c? Also, I don't know which is the required precision for the result: must be exact if representable? I suppose not, otherwise the suggestiong of using double would not make sense. Depending on the answer to the above, there are various options, maybe checking for a * b overflowing (if the quantities are all positive, then checking for wraparound is easy) and then taking the appropriate actions.

Anyway, barring more sophisticated solutions, using long double seems a better idea to me, because on most widespread targets a long double is at least 80 bits, with a mantissa of at least 64 bits, thus able to exactly represent any long long integer.
Comment 18 Johannes Singler 2009-10-23 10:00:17 UTC
(In reply to comment #17)
> Is something known about the actual size of a, b, and c? 

They can be as large as the input size.

> Also, I don't know which is the required precision for the result: must be 
> exact if representable?

In the last iteration, __n == 0 => __total == __N, and then, the result must absolutely be __rank, according to the specification.

Anyway, I think I have found a solution that is easier, faster, and avoids the large intermediate altogether (see attached patch).  It also fixes similar problems in two other locations.  However, this patch needs further thorough testing.

Also, __n == 2 ^ __r - 1, so __n + 1 == 2 ^ __r, and the divisions could be replaced by shifts.
Comment 19 Johannes Singler 2009-10-23 10:01:39 UTC
Created attachment 18878 [details]
Patch avoid large intermediates to avoid overflow, for trunk.
Comment 20 Paolo Carlini 2009-10-23 16:00:17 UTC
Excellent. Let's wait a bit for feedback from people experiencing this issue and then commit the patch, first mainline and then probably 4_4-branch too. Make sure to also regression test the fix on a "normal" ;) machine...
Comment 21 jaffe@broadinstitute.org 2009-10-27 09:45:42 UTC
Subject: Re:  [parallel-mode] parallel sort run time
 increases ~10 fold when vector size gets over ~4*10^9

I tested the patch from comment #19, sorting X billion integers on a machine having
32 processors and 256 GB memory, X = 4, 6, ..., 26.  The overall behavior is very
close to linear.  For example, X = 4 took 1.02 minutes, whereas X = 20 took 5.22
minutes.  Very nice!

Comment 22 Paolo Carlini 2009-10-27 15:53:56 UTC
Patch regtests fine on x86_64-linux. Johannes, can you prepare a ChangeLog entry, post and commit both? Thanks!
Comment 23 Johannes Singler 2009-10-28 10:04:22 UTC
Subject: Bug 40852

Author: singler
Date: Wed Oct 28 10:04:03 2009
New Revision: 153648

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=153648
Log:
2009-10-28  Johannes Singler  <singler@kit.edu>

        PR libstdc++/40852
        * include/parallel/multiseq_selection.h
        (multiseq_partition, multiseq_selection):  Avoid intermediate
        values exceeding the integer type range for very large inputs.


Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/parallel/multiseq_selection.h

Comment 24 Johannes Singler 2009-10-28 10:04:58 UTC
Subject: Bug 40852

Author: singler
Date: Wed Oct 28 10:04:35 2009
New Revision: 153649

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=153649
Log:
2009-10-28  Johannes Singler  <singler@kit.edu>

        PR libstdc++/40852
        * include/parallel/multiseq_selection.h
        (multiseq_partition, multiseq_selection):  Avoid intermediate
        values exceeding the integer type range for very large inputs.


Modified:
    branches/gcc-4_4-branch/libstdc++-v3/ChangeLog
    branches/gcc-4_4-branch/libstdc++-v3/include/parallel/multiseq_selection.h

Comment 25 Johannes Singler 2009-10-28 10:11:12 UTC
Closing this bug.
Comment 26 Paolo Carlini 2009-10-28 10:44:38 UTC
Fixed for 4.4.3 and mainline.
Comment 27 Jeffrey A. Law 2009-10-29 16:49:54 UTC
Subject: Bug 40852

Author: law
Date: Thu Oct 29 16:48:00 2009
New Revision: 153715

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=153715
Log:
Recorded merge of revisions 153580-153581,153584,153586-153600,153604,153606,153610,153613,153615-153618,153621,153643,153646-153648,153650-153652,153654-153667,153669-153671 via svnmerge from 
svn+ssh://law@gcc.gnu.org/svn/gcc/trunk

........
  r153580 | gccadmin | 2009-10-26 18:17:26 -0600 (Mon, 26 Oct 2009) | 1 line
  
  Daily bump.
........
  r153581 | paolo | 2009-10-26 19:18:10 -0600 (Mon, 26 Oct 2009) | 6 lines
  
  2009-10-26  Paolo Carlini  <paolo.carlini@oracle.com>
  
  	* include/std/chrono (duration<>::duration(const duration<>&)): Fix
  	per the straightforward resolution of DR 974.
  	* testsuite/20_util/duration/cons/dr974.cc: Add.
........
  r153584 | carrot | 2009-10-27 03:06:36 -0600 (Tue, 27 Oct 2009) | 16 lines
  
  	* target.h (have_conditional_execution): Add a new target hook function.
  	* target-def.h (TARGET_HAVE_CONDITIONAL_EXECUTION): Likewise.
  	* targhooks.h (default_have_conditional_execution): Likewise.
  	* targhooks.c (default_have_conditional_execution): Likewise.
  	* doc/tm.texi (TARGET_HAVE_CONDITIONAL_EXECUTION): Document it.
  	* config/arm/arm.c (TARGET_HAVE_CONDITIONAL_EXECUTION): Define it.
  	(arm_have_conditional_execution): New function.
  	* ifcvt.c (noce_process_if_block, find_if_header,
  	cond_exec_find_if_block, dead_or_predicable): Change the usage of macro
  	HAVE_conditional_execution to a target hook call.
  	* recog.c (peephole2_optimize): Likewise.
  	* sched-rgn.c (add_branch_dependences): Likewise.
  	* final.c (asm_insn_count, final_scan_insn): Likewise.
  	* bb-reorder.c (HAVE_conditional_execution): Remove it.
........
  r153586 | ebotcazou | 2009-10-27 04:09:04 -0600 (Tue, 27 Oct 2009) | 1 line
  
  Fix nits
........
  r153587 | jakub | 2009-10-27 04:28:48 -0600 (Tue, 27 Oct 2009) | 3 lines
  
  	PR c++/41020
  	* g++.dg/lookup/extern-c-redecl5.C: Fix up regexp.
........
  r153588 | aldyh | 2009-10-27 05:18:12 -0600 (Tue, 27 Oct 2009) | 5 lines
  
  	PR bootstrap/41451
  	* fold-const.c (fold_binary_loc): Do not call
  	protected_set_expr_location.
........
  r153589 | rguenth | 2009-10-27 05:30:59 -0600 (Tue, 27 Oct 2009) | 5 lines
  
  2009-10-27  Richard Guenther  <rguenther@suse.de>
  
  	PR lto/41821
  	* gimple.c (gimple_types_compatible_p): Handle OFFSET_TYPE.
........
  r153590 | revitale | 2009-10-27 05:46:07 -0600 (Tue, 27 Oct 2009) | 1 line
  
  Fix PR40648 -- Fix misaligned store vectorizer patch
........
  r153591 | charlet | 2009-10-27 07:06:06 -0600 (Tue, 27 Oct 2009) | 16 lines
  
  2009-10-27  Arnaud Charlet  <charlet@adacore.com>
  
  	* exp_aggr.adb: Fix comment.
  
  2009-10-27  Emmanuel Briot  <briot@adacore.com>
  
  	* prj-err.adb (Error_Msg): take into account continuation lines when
  	computing whether we have a warning.
  
  2009-10-27  Vasiliy Fofanov  <fofanov@adacore.com>
  
  	* make.adb, s-os_lib.adb, s-os_lib.ads (Create_Temp_Output_File): New
  	routine that is designed to create temp file descriptor specifically
  	for redirecting an output stream.
........
  r153592 | charlet | 2009-10-27 07:16:48 -0600 (Tue, 27 Oct 2009) | 45 lines
  
  2009-10-27  Vincent Celier  <celier@adacore.com>
  
  	* makeutl.adb (Check_Source_Info_In_ALI): Do not recompile if a subunit
  	from the runtime is found, except if gnatmake switch -a is used and this
  	subunit cannot be found.
  
  2009-10-27  Ed Schonberg  <schonberg@adacore.com>
  
  	* gnatbind.adb (gnatbind): When the -R option is selected, list subunits
  	as well, for tools that need the complete closure of the main program.
  
  2009-10-27  Sergey Rybin  <rybin@adacore.com>
  
  	* gnat_ugn.texi: Minor updates.
  
  2009-10-27  Emmanuel Briot  <briot@adacore.com>
  
  	* prj-tree.adb (Free): Fix memory leak.
  
  2009-10-27  Vasiliy Fofanov  <fofanov@adacore.com>
  
  	* adaint.c, s-os_lib.adb (__gnat_create_output_file_new): New function
  	that ensures the file that is created is new. Use this function to make
  	sure there is no race condition if several processes are creating temp
  	files concurrently.
  
  	* s-os_lib.ads: Update comment.
  
  2009-10-27  Thomas Quinot  <quinot@adacore.com>
  
  	* sem_ch12.adb: Minor reformatting
  
  2009-10-27  Javier Miranda  <miranda@adacore.com>
  
  	* exp_ch4.ads (Integer_Promotion_Possible): New subprogram.
  	* exp_ch4.adb (Integer_Promotion_Possible): New subprogram.
  	(Expand_N_Type_Conversion): Replace code that checks if the integer
  	promotion of the operands is possible by a call to the new function
  	Integer_Promotion_Possible. Minor reformating because an enclosing
  	block is now not needed.
  	* checks.adb (Apply_Arithmetic_Overflow_Check): Add missing check to
  	see if the integer promotion is possible; in such case the runtime
  	checks are not generated.
........
  r153593 | charlet | 2009-10-27 07:22:25 -0600 (Tue, 27 Oct 2009) | 17 lines
  
  2009-10-27  Thomas Quinot  <quinot@adacore.com>
  
  	* sem_ch12.adb (Install_Formal_Packages): Do not omit installation of
  	visible entities when the formal package doesn't have a box.
  
  	* checks.adb: Minor reformatting.
  
  2009-10-27  Vincent Celier  <celier@adacore.com>
  
  	* prj-part.adb (Parse): Catch exception Types.Unrecoverable_Error and
  	set Project to Empty_Node.
  
  2009-10-27  Robert Dewar  <dewar@adacore.com>
  
  	* gnatbind.adb: Minor reformatting
........
  r153594 | charlet | 2009-10-27 07:51:46 -0600 (Tue, 27 Oct 2009) | 18 lines
  
  2009-10-27  Robert Dewar  <dewar@adacore.com>
  
  	* s-os_lib.ads, s-os_lib.adb, prj-err.adb, makeutl.adb: Minor
  	reformatting.
  
  2009-10-27  Ed Schonberg  <schonberg@adacore.com>
  
  	* sem.util.ads, sem_util.adb (Denotes_Same_Object,
  	Denotes_Same_Prefix): New functions to detect overlap between actuals
  	that are not by-copy in a call, when one of them is in-out.
  	* sem_warn.ads, sem_warn.adb (Warn_On_Overlapping_Actuals): New
  	procedure,  called on a subprogram call to warn when an in-out actual
  	that is not by-copy overlaps with another actual, thus leadind to
  	potentially dangerous aliasing in the body of the called subprogram.
  	Currently the warning is under control of the -gnatX switch.
  	* sem_res.adb (resolve_call): call Warn_On_Overlapping_Actuals.
........
  r153595 | charlet | 2009-10-27 08:02:58 -0600 (Tue, 27 Oct 2009) | 6 lines
  
  2009-10-27  Robert Dewar  <dewar@adacore.com>
  
  	* sem_warn.adb, sem_util.adb, sem_util.ads: Minor reformatting. Add
  	comments.
........
  r153596 | charlet | 2009-10-27 08:07:19 -0600 (Tue, 27 Oct 2009) | 2 lines
  
  Minor doc updates.
........
  r153597 | charlet | 2009-10-27 08:14:44 -0600 (Tue, 27 Oct 2009) | 6 lines
  
  2009-10-27  Robert Dewar  <dewar@adacore.com>
  
  	* s-fileio.adb, s-fileio.ads, sem_util.adb, sem_warn.adb,
  	sem_warn.ads: Minor reformatting
........
  r153598 | rguenth | 2009-10-27 09:16:35 -0600 (Tue, 27 Oct 2009) | 5 lines
  
  2009-10-27  Richard Guenther  <rguenther@suse.de>
  
  	* tree-complex.c (expand_complex_div_wide): Check for
  	INTEGER_CST, not TREE_CONSTANT on comparison folding result.
........
  r153599 | jakub | 2009-10-27 09:50:50 -0600 (Tue, 27 Oct 2009) | 6 lines
  
  	PR c/41842
  	* c-typeck.c (convert_arguments): Return -1 if any of the arguments is
  	error_mark_node.
  
  	* gcc.dg/pr41842.c: New test.
........
  r153600 | rguenth | 2009-10-27 09:52:44 -0600 (Tue, 27 Oct 2009) | 14 lines
  
  2009-10-27  Richard Guenther  <rguenther@suse.de>
  
  	* tree-ssa-structalias.c (find_func_aliases): In IPA mode
  	handle calls to externally visible functions like in regular mode.
  	(create_variable_info_for): Do not create function infos here.
  	(have_alias_info): Remove write-only variable.
  	(solve_constraints): New function split out from common code
  	in compute_points_to_sets and ipa_pta_execute.
  	(compute_points_to_sets): Adjust.
  	(ipa_pta_execute): Likewise.  Handle clones and externally visible
  	functions like in non-IPA mode.
  
  	* gcc.dg/torture/ipa-pta-1.c: Adjust testcase.
........
  r153604 | uros | 2009-10-27 11:03:47 -0600 (Tue, 27 Oct 2009) | 3 lines
  
  	* ChangeLog: Fix formatting.
  	* testsuite/ChangeLog: Ditto.
........
  r153606 | ktietz | 2009-10-27 11:14:47 -0600 (Tue, 27 Oct 2009) | 11 lines
  
  2009-10-27  Kai Tietz <kai.tietz@onevision.com>
  
          PR/41799
          * config/i386/mingw32.h (CHECK_EXECUTE_STACK_ENABLED): New macro.
          * config/i386/mingw.opt: Add fset-stack-executable.
          * config/i386/i386.c (ix86_trampoline_init): Make call to
          emit_library_call conditional, if CHECK_EXECUTE_STACK_ENABLED is
          defined and its value is not zero.
          * doc/invoke.texi
........
  r153610 | espindola | 2009-10-27 12:17:13 -0600 (Tue, 27 Oct 2009) | 7 lines
  
  2009-10-27  Dmitry Gorbachev  <d.g.gorbachev@gmail.com>
  
  	PR lto/41652
  	* configure.ac: Call AC_SYS_LARGEFILE before AC_OUTPUT.
  	* configure: Regenerate.
........
  r153613 | ebotcazou | 2009-10-27 13:41:13 -0600 (Tue, 27 Oct 2009) | 4 lines
  
  	* raise-gcc (db_region_for): Use _Unwind_GetIPInfo instead of
  	_Unwind_GetIP if HAVE_GETIPINFO is defined.
  	(db_action_for): Likewise.
........
  r153615 | rth | 2009-10-27 14:09:07 -0600 (Tue, 27 Oct 2009) | 7 lines
  
          PR c++/41819
          * tree-eh.c (eh_region_may_contain_throw_map): Rename from
          eh_region_may_contain_throw; update users.
          (eh_region_may_contain_throw): New function.
          (lower_catch): Check flag_exceptions before creating exception region.
          (lower_eh_filter, lower_eh_must_not_throw): Likewise.
          (lower_cleanup): Tidy existing flag_exceptions check to match.
........
  r153616 | ebotcazou | 2009-10-27 14:24:31 -0600 (Tue, 27 Oct 2009) | 3 lines
  
  	* gcc-interface/decl.c (purpose_member_field): New static function.
  	(annotate_rep): Use it instead of purpose_member.
........
  r153617 | jason | 2009-10-27 15:58:09 -0600 (Tue, 27 Oct 2009) | 10 lines
  
  	Allow no-capture lambdas to convert to function pointer.
  	* semantics.c (maybe_add_lambda_conv_op): New.
  	* parser.c (cp_parser_lambda_expression): Call it.
  	(cp_parser_lambda_declarator_opt): Make op() static if
  	no captures.
  	* mangle.c (write_closure_type_name): Adjust.
  	* semantics.c (finish_this_expr): Adjust.
  	* decl.c (grok_op_properties): Allow it.
  	* call.c (build_user_type_conversion_1): Handle static conversion op.
  	(build_op_call): And op().
........
  r153618 | rth | 2009-10-27 17:25:54 -0600 (Tue, 27 Oct 2009) | 1 line
  
          * cgraphunit.c (cgraph_optimize): Maintain timevar stack properly.
........
  r153621 | gccadmin | 2009-10-27 18:16:59 -0600 (Tue, 27 Oct 2009) | 1 line
  
  Daily bump.
........
  r153643 | kkojima | 2009-10-27 22:22:21 -0600 (Tue, 27 Oct 2009) | 4 lines
  
  	* config/sh/sh.md (stuff_delay_slot): Move const_int pattern
  	inside the unspec vector.
........
  r153646 | bonzini | 2009-10-28 03:49:58 -0600 (Wed, 28 Oct 2009) | 6 lines
  
  2009-10-28  Paolo Bonzini  <bonzini@gnu.org>
  
  	* config/sh/sh.md (cbranchfp4_media): Remove hack extending
  	cstore result to DImode.
........
  r153647 | bonzini | 2009-10-28 03:54:01 -0600 (Wed, 28 Oct 2009) | 6 lines
  
  2009-10-28  Paolo Bonzini  <bonzini@gnu.org>
  
  	* expmed.c (emit_store_flag): Check costs before
  	transforming to the opposite representation.
........
  r153648 | singler | 2009-10-28 04:04:03 -0600 (Wed, 28 Oct 2009) | 8 lines
  
  2009-10-28  Johannes Singler  <singler@kit.edu>
  
          PR libstdc++/40852
          * include/parallel/multiseq_selection.h
          (multiseq_partition, multiseq_selection):  Avoid intermediate
          values exceeding the integer type range for very large inputs.
........
  r153650 | bonzini | 2009-10-28 04:17:29 -0600 (Wed, 28 Oct 2009) | 15 lines
  
  2009-10-28  Paolo Bonzini  <bonzini@gnu.org>
  
  	PR rtl-optimization/40741
  	* config/arm/arm.c (thumb1_rtx_costs): IOR or XOR with
  	a small constant is cheap.
  	* config/arm/arm.md (andsi3, iorsi3): Try to place the result of
  	force_reg on the LHS.
  	(xorsi3): Likewise, and split the XOR if the constant is complex
  	and not in Thumb mode.
  
  2009-10-28  Paolo Bonzini  <bonzini@gnu.org>
  
  	PR rtl-optimization/40741
  	* gcc.target/arm/thumb-branch1.c: New.
........
  r153651 | bonzini | 2009-10-28 04:27:15 -0600 (Wed, 28 Oct 2009) | 13 lines
  
  2009-10-28  Paolo Bonzini  <bonzini@gnu.org>
  
  	PR rtl-optimization/39715
  	* combine.c (simplify_comparison): Use extensions to
  	widen comparisons.  Try an ANDing first.
  
  testsuite:
  2009-10-28  Paolo Bonzini  <bonzini@gnu.org>
  
  	PR rtl-optimization/39715
  	* gcc.target/arm/thumb-bitfld1.c: New.
........
  r153652 | bonzini | 2009-10-28 06:37:30 -0600 (Wed, 28 Oct 2009) | 13 lines
  
  2009-10-28  Paolo Bonzini  <bonzini@gnu.org>
  
  	PR rtl-optimization/41812
  
  	Revert:
  	2009-06-27  Paolo Bonzini  <bonzini@gnu.org>
  
  	* df-problems.c (df_md_scratch): New.
  	(df_md_alloc, df_md_free): Allocate/free it.
  	(df_md_local_compute): Only include live registers in init.
  	(df_md_transfer_function): Prune the in-set computed by
  	the confluence function, and the gen-set too.
........
  r153654 | paolo | 2009-10-28 07:07:00 -0600 (Wed, 28 Oct 2009) | 6 lines
  
  2009-10-28  Paolo Carlini  <paolo.carlini@oracle.com>
  
  	* include/bits/stl_iterator_base_funcs.h: (next): Change
  	template parameter name consistently with the resolution
  	of DR 1011 ([Ready] in Santa Cruz).
........
  r153655 | rguenth | 2009-10-28 07:28:32 -0600 (Wed, 28 Oct 2009) | 14 lines
  
  2009-10-28  Richard Guenther  <rguenther@suse.de>
  
  	PR middle-end/41855
  	* tree-ssa-alias.c (refs_may_alias_p_1): Deal with CONST_DECLs
  	(ref_maybe_used_by_call_p_1): Fix bcopy handling.
  	(call_may_clobber_ref_p_1): Likewise.
  	* tree-ssa-structalias.c (find_func_aliases): Likewise.
  	* alias.c (nonoverlapping_memrefs_p): Deal with CONST_DECLs.
  
  	* gfortran.dg/lto/20091028-1_0.f90: New testcase.
  	* gfortran.dg/lto/20091028-1_1.c: Likewise.
  	* gfortran.dg/lto/20091028-2_0.f90: Likewise.
  	* gfortran.dg/lto/20091028-2_1.c: Likewise.
........
  r153656 | charlet | 2009-10-28 07:31:51 -0600 (Wed, 28 Oct 2009) | 25 lines
  
  2009-10-28  Robert Dewar  <dewar@adacore.com>
  
  	* a-ztexio.adb, a-ztexio.ads, a-witeio.ads, a-witeio.adb,
  	a-textio.ads, a-textio.adb: Reorganize (moving specs from private part
  	to body).
  	(Initialize_Standard_Files): New procedure.
  	* a-tienau.adb: Minor change to make EOF directly visible
  	* a-tirsfi.ads, a-wrstfi.adb, a-wrstfi.ads, a-zrstfi.adb,
  	a-zrstfi.ads, a-tirsfi.adb: New unit, initial version.
  	* gnat_rm.texi: Add documentation for
  	Ada.[Wide_[Wide_]]Text_IO.Reset_Standard_Files.
  	* Makefile.rtl: Add entries for
  	Ada.[Wide_[Wide_]]Text_IO.Reset_Standard_Files
  
  2009-10-28  Thomas Quinot  <quinot@adacore.com>
  
  	* exp_ch9.ads: Minor reformatting
  	* sem_ch3.adb: Minor reformatting
  	* sem_aggr.adb: Minor reformatting.
  	* sem_attr.adb: Minor reformatting
  	* tbuild.adb, tbuild.ads, par-ch4.adb, exp_ch4.adb (Tbuild.New_Op_Node):
  	New subprogram.
  	Minor code reorganization/factoring.
........
  r153657 | charlet | 2009-10-28 07:41:05 -0600 (Wed, 28 Oct 2009) | 29 lines
  
  2009-10-28  Thomas Quinot  <quinot@adacore.com>
  
  	* exp_ch4.adb (Expand_N_Type_Conversion): Perform Integer promotion for
  	the operand of the unary minus and ABS operators.
  
  	* sem_type.adb (Covers): A concurrent type and its corresponding record
  	type are compatible.
  	* exp_attr.adb (Expand_N_Attribute_Reference): Do not rewrite a 'Access
  	attribute reference for the current instance of a protected type while
  	analyzing an access discriminant constraint in a component definition.
  	Such a reference is handled in the corresponding record's init proc,
  	while initializing the constrained component.
  	* exp_ch9.adb (Expand_N_Protected_Type_Declaration): When creating the
  	corresponding record type, propagate components'
  	Has_Per_Object_Constraint flag.
  	* exp_ch3.adb (Build_Init_Procedure.Build_Init_Statements):
  	For a concurrent type, set up concurrent aspects before initializing
  	components with a per object constrain, because they may be controlled,
  	and their initialization may call entries or protected subprograms of
  	the enclosing concurrent object.
  
  2009-10-28  Emmanuel Briot  <briot@adacore.com>
  
  	* prj-nmsc.adb (Add_If_Not_In_List): New subprogram, for better sharing
  	of code.
  	(Find_Source_Dirs): resolve links if Opt.Follow_Links_For_Dirs when
  	processing the directories specified explicitly in the project file.
........
  r153658 | charlet | 2009-10-28 07:50:10 -0600 (Wed, 28 Oct 2009) | 10 lines
  
  2009-10-28  Robert Dewar  <dewar@adacore.com>
  
  	* exp_attr.adb, exp_ch9.adb, prj-nmsc.adb, tbuild.adb, ali.adb,
  	types.ads: Minor reformatting
  
  2009-10-28  Tristan Gingold  <gingold@adacore.com>
  
  	* init.c: Fix __gnat_error_handler for Darwin10 (Snow Leopard)
........
  r153659 | rguenth | 2009-10-28 07:52:20 -0600 (Wed, 28 Oct 2009) | 11 lines
  
  2009-10-28  Richard Guenther  <rguenther@suse.de>
  
  	* tree.c (free_lang_data_in_type): Do not call get_alias_set.
  	(free_lang_data): Unconditionally compute alias sets for all
  	standard integer types.  Bail out if gate bailed out previously.
  	Do not reset the types_compatible_p langhook.
  	(gate_free_lang_data): Remove.
  	(struct pass_ipa_free_lang_data): Enable unconditionally.
  	* gimple.c (gimple_get_alias_set): Use the same alias-set for
  	all pointer types.
........
  r153660 | charlet | 2009-10-28 08:07:16 -0600 (Wed, 28 Oct 2009) | 2 lines
  
  	* gcc-interface/Make-lang.in: Update dependencies.
........
  r153661 | charlet | 2009-10-28 08:09:12 -0600 (Wed, 28 Oct 2009) | 22 lines
  
  2009-10-28  Vincent Celier  <celier@adacore.com>
  
  	* prj-nmsc.adb (Add_To_Or_Remove_From_List): New name of procedure
  	Add_If_Not_In_List to account to the fact that a directory may be
  	removed from the list. Only remove directory if Removed is True.
  
  2009-10-28  Gary Dismukes  <dismukes@adacore.com>
  
  	* a-textio.ads, a-textio.ads: Put back function EOF_Char in private
  	part. Put back body of function EOF_Char.
  	* a-tienau.adb: Remove with of Interfaces.C_Streams and change EOF back
  	to EOF_Char.
  
  2009-10-28  Emmanuel Briot  <briot@adacore.com>
  
  	* prj-tree.adb (Free): Fix memory leak.
  
  2009-10-28  Thomas Quinot  <quinot@adacore.com>
  
  	* s-fileio.adb: Minor reformatting
........
  r153662 | charlet | 2009-10-28 08:14:05 -0600 (Wed, 28 Oct 2009) | 9 lines
  
  2009-10-28  Thomas Quinot  <quinot@adacore.com>
  
  	* s-crtl.ads (System.CRTL.strerror): New function.
  
  2009-10-28  Ed Schonberg  <schonberg@adacore.com>
  
  	* sem_type.adb: Add guard to recover some type errors.
........
  r153663 | charlet | 2009-10-28 08:22:09 -0600 (Wed, 28 Oct 2009) | 12 lines
  
  2009-10-28  Bob Duff  <duff@adacore.com>
  
  	* s-fileio.adb: Give more information in exception messages.
  
  2009-10-28  Robert Dewar  <dewar@adacore.com>
  
  	* gnat_ugn.texi: Document new -gnatyt requirement for space after right
  	paren if next token starts with digit or letter.
  	* styleg.adb (Check_Right_Paren): New rule for space after if next
  	character is a letter or digit.
........
  r153664 | rguenth | 2009-10-28 08:33:17 -0600 (Wed, 28 Oct 2009) | 4 lines
  
  2009-10-28  Richard Guenther  <rguenther@suse.de>
  
          * gimple.c (gimple_get_alias_set): Fix comment typo.
........
  r153665 | jakub | 2009-10-28 08:36:28 -0600 (Wed, 28 Oct 2009) | 3 lines
  
  	* var-tracking.c (emit_note_insn_var_location): Get the mode of
  	a variable part from its REG, MEM or VALUE.
........
  r153666 | jakub | 2009-10-28 08:37:24 -0600 (Wed, 28 Oct 2009) | 4 lines
  
  	* var-tracking.c (emit_note_insn_var_location): Don't call the second
  	vt_expand_loc unnecessarily when location is not a register nor
  	memory.
........
  r153667 | jakub | 2009-10-28 08:39:06 -0600 (Wed, 28 Oct 2009) | 6 lines
  
  	PR target/41762
  	* config/i386/i386.c (ix86_pic_register_p): Don't call
  	rtx_equal_for_cselib_p for VALUEs discarded as useless.
  
  	* gcc.dg/pr41762.c: New test.
........
  r153669 | jakub | 2009-10-28 08:43:04 -0600 (Wed, 28 Oct 2009) | 6 lines
  
  	PR debug/41801
  	* builtins.c (get_builtin_sync_mem): Expand loc in ptr_mode,
  	call convert_memory_address on addr.
  
  	* g++.dg/ext/sync-3.C: New test.
........
  r153670 | jakub | 2009-10-28 08:45:03 -0600 (Wed, 28 Oct 2009) | 6 lines
  
  	PR middle-end/41837
  	* ipa-struct-reorg.c (find_field_in_struct_1): Return NULL if
  	fields don't have DECL_NAME.
  
  	* gcc.dg/pr41837.c: New test.
........
  r153671 | rguenth | 2009-10-28 08:48:34 -0600 (Wed, 28 Oct 2009) | 15 lines
  
  2009-10-28  Richard Guenther  <rguenther@suse.de>
  
  	PR lto/41808
  	PR lto/41839
  	* tree-ssa.c (useless_type_conversion_p): Do not treat
  	conversions to pointers to incomplete types as useless.
  	* gimple.c (gimple_types_compatible_p): Compare struct tags,
  	not typedef names.
  
  	* gcc.dg/lto/20091027-1_0.c: New testcase.
  	* gcc.dg/lto/20091027-1_1.c: Likewise.
  	* g++.dg/lto/20091026-1_0.C: Likewise.
  	* g++.dg/lto/20091026-1_1.C: Likewise.
  	* g++.dg/lto/20091026-1_a.h: Likewise.
........

Modified:
    branches/reload-v2a/   (props changed)

Propchange: branches/reload-v2a/
            ('svnmerge-integrated' modified)