Bug 72851 - [6 Regression] memory hog with -O3 on s390x-linux-gnu
Summary: [6 Regression] memory hog with -O3 on s390x-linux-gnu
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 6.1.1
: P3 normal
Target Milestone: 6.3
Assignee: Richard Biener
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-09 12:17 UTC by Matthias Klose
Modified: 2016-08-23 13:53 UTC (History)
1 user (show)

See Also:
Host:
Target: s390x-linux-gnu
Build:
Known to work: 5.4.1, 6.2.1, 7.0
Known to fail: 6.1.1, 6.2.0
Last reconfirmed: 2016-08-10 00:00:00


Attachments
preprocessed source (21.47 KB, application/x-xz)
2016-08-09 12:17 UTC, Matthias Klose
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Klose 2016-08-09 12:17:47 UTC
Created attachment 39087 [details]
preprocessed source

seen with the gcc-6-branch and trunk 20160804

gcc -g -O3 -std=c99 -fPIC -pthread -c des.i

allocates all available memory, and then is killed. Workaround is to build with -O2.
Comment 1 Andreas Krebbel 2016-08-10 10:12:36 UTC
Reghunt indicates that this was introduced with:

Author: rguenth
Date: Tue Jul  7 07:59:40 2015
New Revision: 225504

URL: https://gcc.gnu.org/viewcvs?rev=225504&root=gcc&view=rev
Log:
2015-07-07  Richard Biener  <rguenther@suse.de>

	* tree-ssa-propagate.c (add_ssa_edge): Dump what edge list we
	add which use to.
	(add_control_edge): Remove excessive vertical space in dumping.
	(process_ssa_edge_worklist): Simulate at most one statement and
	return whether we did.  Do not simulate PHIs if they are in a
	BB not yet simulated.
	(ssa_propagate): Adjust to always drain the BB worklist whenever
	a BB is available there, likewise the VARYING edges list before
	the interesting edge list.
Comment 2 Richard Biener 2016-08-10 12:22:53 UTC
Huh, if the reghunt is correct it points at some latent issue.  Will try to reproduce with a cross.
Comment 3 Richard Biener 2016-08-10 12:33:48 UTC
Doesn't reproduce with a cross.  I've configured with just --target=s390x-suse-linux thus end up with

> ./xgcc -B. -S des.i -g -O3 -std=c99 -fPIC -pthread -v
Reading specs from ./specs
COLLECT_GCC=./xgcc
Target: s390x-suse-linux
Configured with: /space/rguenther/src/svn/gcc-6-branch/configure --target=s390x-suse-linux --enable-languages=c,c++ --enable-checking=yes
Thread model: posix
gcc version 6.1.1 20160719 [gcc-6-branch revision 238088] (GCC) 
COLLECT_GCC_OPTIONS='-B' '.' '-S' '-g' '-O3' '-std=c99' '-fPIC' '-pthread' '-v' '-m64' '-mzarch' '-march=z900'
 ./cc1 -fpreprocessed des.i -quiet -dumpbase des.i -m64 -mzarch -march=z900 -auxbase des -g -O3 -std=c99 -version -fPIC -o des.s

not sure if -march is the one you see this with.

Ah, using -march=z10 is required to reproduce it and it iterates forever in VRP.

Investiating.
Comment 4 Richard Biener 2016-08-10 12:47:12 UTC
Reduced testcase:

typedef unsigned char uint8_t;
typedef unsigned long int uint64_t;
union unaligned_64 {
    uint64_t l;
}
__attribute__((packed)) __attribute__((may_alias));
typedef struct AVDES {
    uint64_t round_keys[3][16];
} AVDES;
static const uint8_t PC1_shuffle[] = {
    64-57,64-49,64-41,64-33,64-25,64-17,64-9,     64-1,64-58,64-50,64-42,64-34,64-26,64-18,     64-10,64-2,64-59,64-51,64-43,64-35,64-27,     64-19,64-11,64-3,64-60,64-52,64-44,64-36,     64-63,64-55,64-47,64-39,64-31,64-23,64-15,     64-7,64-62,64-54,64-46,64-38,64-30,64-22,     64-14,64-6,64-61,64-53,64-45,64-37,64-29,     64-21,64-13,64-5,64-28,64-20,64-12,64-4 };
static const uint8_t PC2_shuffle[] = {
    56-14,56-17,56-11,56-24,56-1,56-5,     56-3,56-28,56-15,56-6,56-21,56-10,     56-23,56-19,56-12,56-4,56-26,56-8,     56-16,56-7,56-27,56-20,56-13,56-2,     56-41,56-52,56-31,56-37,56-47,56-55,     56-30,56-40,56-51,56-45,56-33,56-48,     56-44,56-49,56-39,56-56,56-34,56-53,     56-46,56-42,56-50,56-36,56-29,56-32 };
static uint64_t shuffle(uint64_t in, const uint8_t *shuffle, int shuffle_len)
{
  int i;
  uint64_t res = 0;
  for (i = 0; i < shuffle_len; i++)
    res += res + ((in >> *shuffle++) & 1);
  return res;
}
void gen_roundkeys(uint64_t K[16], uint64_t key)
{
  int i;
  uint64_t CDn = shuffle(key, PC1_shuffle, sizeof(PC1_shuffle));
  for (i = 0; i < 16; i++)
    K[i] = shuffle(CDn, PC2_shuffle, sizeof(PC2_shuffle));
}
Comment 5 Matthias Klose 2016-08-10 12:49:59 UTC
configured --with-arch=zEC12, I can reproduce it with a cross as well. so maybe check with -march=zEC12?
Comment 6 Richard Biener 2016-08-10 13:45:56 UTC
Ok, so with clever stmt / SSA use ordering you can get to quadratic propagation as VRP allows ranges to arbitrarily grow:

Found new range for res_561: [0, 20368]
Found new range for res_561: [0, 20370]
Found new range for res_561: [0, 20372]
Found new range for res_561: [0, 20374]
Found new range for res_561: [0, 20376]
Found new range for res_561: [0, 20378]
Found new range for res_561: [0, 20380]
Found new range for res_561: [0, 20382]
Found new range for res_561: [0, 20384]
Found new range for res_561: [0, 20386]
Found new range for res_561: [0, 20388]
Found new range for res_561: [0, 20390]
Found new range for res_561: [0, 20392]
Found new range for res_561: [0, 20394]
Found new range for res_561: [0, 20396]
...

the SSA worklist in the SSA propagator is just a stack that is pushed/popped to/from.

Ideally that worklist would be processed in stmt order but that requires
sorting the worklist vector or changing the worklist data structure to
sth "better" (a balanced tree?).  Slightly "randomizing" the worklist processing, say by

Index: gcc/tree-ssa-propagate.c
===================================================================
--- gcc/tree-ssa-propagate.c    (revision 238469)
+++ gcc/tree-ssa-propagate.c    (working copy)
@@ -422,7 +422,8 @@ process_ssa_edge_worklist (vec<gimple *>
       basic_block bb;
 
       /* Pull the statement to simulate off the worklist.  */
-      gimple *stmt = worklist->pop ();
+      gimple *stmt = (*worklist)[0];
+      worklist->unordered_remove (0);
 
       /* If this statement was already visited by simulate_block, then
         we don't need to visit it again here.  */

fixes the testcase.  Of course it's just a matter of pure luck that it
re-appears (with the current processing order it's just most likely given
its depth-first processing nature).  Picking a truly random element from
the worklist would be better than the above (but truly random has to be
predictable to not break bootstrap).
Comment 7 Richard Biener 2016-08-12 07:35:12 UTC
Author: rguenth
Date: Fri Aug 12 07:34:40 2016
New Revision: 239405

URL: https://gcc.gnu.org/viewcvs?rev=239405&root=gcc&view=rev
Log:
2016-08-12  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/72851
	* tree-ssa-propagate.c: Include cfganal.h.  Rewrite block and stmt
	worklists to use bitmaps indexed in execution order.
	(executable_blocks, cfg_blocks_num, cfg_blocks_tail, cfg_blocks_head,
	bb_in_list, interesting_ssa_edges, varying_ssa_edges): Remove.
	(cfg_blocks): Make a bitmap.
	(bb_to_cfg_order, cfg_order_to_bb, ssa_edge_worklist, uid_to_stmt):
	New globals.
	(cfg_blocks_empty_p): Adjust.
	(cfg_blocks_add): Likewise.
	(cfg_blocks_get): Likewise.
	(add_ssa_edge): Likewise.
	(add_control_edge): Likewise.
	(simulate_stmt): Likewise.
	(process_ssa_edge_worklist): Likewise.
	(simulate_block): Likewise.
	(ssa_prop_init): Compute PRE order and stmt UIDs.
	(ssa_prop_fini): Adjust.
	(ssa_propagate): Adjust.

	* gcc.dg/torture/pr72851.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/torture/pr72851.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-propagate.c
Comment 8 Richard Biener 2016-08-12 09:35:52 UTC
Fixed on trunk.  Not sure what to do on the branch - nothing for 6.2 at least.
Comment 9 Richard Biener 2016-08-23 13:49:33 UTC
Author: rguenth
Date: Tue Aug 23 13:49:00 2016
New Revision: 239699

URL: https://gcc.gnu.org/viewcvs?rev=239699&root=gcc&view=rev
Log:
2016-08-23  Richard Biener  <rguenther@suse.de>

	Backport from mainline
	2016-08-16  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/76783
	* tree-ssa-propagate.c (ssa_prop_init): Use RPO order.  Clear
	BB visited flags at start.

	* gcc.dg/pr76783.c: New testcase.
	* gcc.dg/tree-ssa/pr69270-2.c: Adjust.

	2016-08-12  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/72851
	* tree-ssa-propagate.c: Include cfganal.h.  Rewrite block and stmt
	worklists to use bitmaps indexed in execution order.
	(executable_blocks, cfg_blocks_num, cfg_blocks_tail, cfg_blocks_head,
	bb_in_list, interesting_ssa_edges, varying_ssa_edges): Remove.
	(cfg_blocks): Make a bitmap.
	(bb_to_cfg_order, cfg_order_to_bb, ssa_edge_worklist, uid_to_stmt):
	New globals.
	(cfg_blocks_empty_p): Adjust.
	(cfg_blocks_add): Likewise.
	(cfg_blocks_get): Likewise.
	(add_ssa_edge): Likewise.
	(add_control_edge): Likewise.
	(simulate_stmt): Likewise.
	(process_ssa_edge_worklist): Likewise.
	(simulate_block): Likewise.
	(ssa_prop_init): Compute PRE order and stmt UIDs.
	(ssa_prop_fini): Adjust.
	(ssa_propagate): Adjust.

	* gcc.dg/torture/pr72851.c: New testcase.

Added:
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/pr76783.c
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/torture/pr72851.c
Modified:
    branches/gcc-6-branch/gcc/ChangeLog
    branches/gcc-6-branch/gcc/testsuite/ChangeLog
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/pr69270-2.c
    branches/gcc-6-branch/gcc/tree-ssa-propagate.c
Comment 10 Richard Biener 2016-08-23 13:53:53 UTC
Fixed.