Created attachment 39087 [details] preprocessed source seen with the gcc-6-branch and trunk 20160804 gcc -g -O3 -std=c99 -fPIC -pthread -c des.i allocates all available memory, and then is killed. Workaround is to build with -O2.
Reghunt indicates that this was introduced with: Author: rguenth Date: Tue Jul 7 07:59:40 2015 New Revision: 225504 URL: https://gcc.gnu.org/viewcvs?rev=225504&root=gcc&view=rev Log: 2015-07-07 Richard Biener <rguenther@suse.de> * tree-ssa-propagate.c (add_ssa_edge): Dump what edge list we add which use to. (add_control_edge): Remove excessive vertical space in dumping. (process_ssa_edge_worklist): Simulate at most one statement and return whether we did. Do not simulate PHIs if they are in a BB not yet simulated. (ssa_propagate): Adjust to always drain the BB worklist whenever a BB is available there, likewise the VARYING edges list before the interesting edge list.
Huh, if the reghunt is correct it points at some latent issue. Will try to reproduce with a cross.
Doesn't reproduce with a cross. I've configured with just --target=s390x-suse-linux thus end up with > ./xgcc -B. -S des.i -g -O3 -std=c99 -fPIC -pthread -v Reading specs from ./specs COLLECT_GCC=./xgcc Target: s390x-suse-linux Configured with: /space/rguenther/src/svn/gcc-6-branch/configure --target=s390x-suse-linux --enable-languages=c,c++ --enable-checking=yes Thread model: posix gcc version 6.1.1 20160719 [gcc-6-branch revision 238088] (GCC) COLLECT_GCC_OPTIONS='-B' '.' '-S' '-g' '-O3' '-std=c99' '-fPIC' '-pthread' '-v' '-m64' '-mzarch' '-march=z900' ./cc1 -fpreprocessed des.i -quiet -dumpbase des.i -m64 -mzarch -march=z900 -auxbase des -g -O3 -std=c99 -version -fPIC -o des.s not sure if -march is the one you see this with. Ah, using -march=z10 is required to reproduce it and it iterates forever in VRP. Investiating.
Reduced testcase: typedef unsigned char uint8_t; typedef unsigned long int uint64_t; union unaligned_64 { uint64_t l; } __attribute__((packed)) __attribute__((may_alias)); typedef struct AVDES { uint64_t round_keys[3][16]; } AVDES; static const uint8_t PC1_shuffle[] = { 64-57,64-49,64-41,64-33,64-25,64-17,64-9, 64-1,64-58,64-50,64-42,64-34,64-26,64-18, 64-10,64-2,64-59,64-51,64-43,64-35,64-27, 64-19,64-11,64-3,64-60,64-52,64-44,64-36, 64-63,64-55,64-47,64-39,64-31,64-23,64-15, 64-7,64-62,64-54,64-46,64-38,64-30,64-22, 64-14,64-6,64-61,64-53,64-45,64-37,64-29, 64-21,64-13,64-5,64-28,64-20,64-12,64-4 }; static const uint8_t PC2_shuffle[] = { 56-14,56-17,56-11,56-24,56-1,56-5, 56-3,56-28,56-15,56-6,56-21,56-10, 56-23,56-19,56-12,56-4,56-26,56-8, 56-16,56-7,56-27,56-20,56-13,56-2, 56-41,56-52,56-31,56-37,56-47,56-55, 56-30,56-40,56-51,56-45,56-33,56-48, 56-44,56-49,56-39,56-56,56-34,56-53, 56-46,56-42,56-50,56-36,56-29,56-32 }; static uint64_t shuffle(uint64_t in, const uint8_t *shuffle, int shuffle_len) { int i; uint64_t res = 0; for (i = 0; i < shuffle_len; i++) res += res + ((in >> *shuffle++) & 1); return res; } void gen_roundkeys(uint64_t K[16], uint64_t key) { int i; uint64_t CDn = shuffle(key, PC1_shuffle, sizeof(PC1_shuffle)); for (i = 0; i < 16; i++) K[i] = shuffle(CDn, PC2_shuffle, sizeof(PC2_shuffle)); }
configured --with-arch=zEC12, I can reproduce it with a cross as well. so maybe check with -march=zEC12?
Ok, so with clever stmt / SSA use ordering you can get to quadratic propagation as VRP allows ranges to arbitrarily grow: Found new range for res_561: [0, 20368] Found new range for res_561: [0, 20370] Found new range for res_561: [0, 20372] Found new range for res_561: [0, 20374] Found new range for res_561: [0, 20376] Found new range for res_561: [0, 20378] Found new range for res_561: [0, 20380] Found new range for res_561: [0, 20382] Found new range for res_561: [0, 20384] Found new range for res_561: [0, 20386] Found new range for res_561: [0, 20388] Found new range for res_561: [0, 20390] Found new range for res_561: [0, 20392] Found new range for res_561: [0, 20394] Found new range for res_561: [0, 20396] ... the SSA worklist in the SSA propagator is just a stack that is pushed/popped to/from. Ideally that worklist would be processed in stmt order but that requires sorting the worklist vector or changing the worklist data structure to sth "better" (a balanced tree?). Slightly "randomizing" the worklist processing, say by Index: gcc/tree-ssa-propagate.c =================================================================== --- gcc/tree-ssa-propagate.c (revision 238469) +++ gcc/tree-ssa-propagate.c (working copy) @@ -422,7 +422,8 @@ process_ssa_edge_worklist (vec<gimple *> basic_block bb; /* Pull the statement to simulate off the worklist. */ - gimple *stmt = worklist->pop (); + gimple *stmt = (*worklist)[0]; + worklist->unordered_remove (0); /* If this statement was already visited by simulate_block, then we don't need to visit it again here. */ fixes the testcase. Of course it's just a matter of pure luck that it re-appears (with the current processing order it's just most likely given its depth-first processing nature). Picking a truly random element from the worklist would be better than the above (but truly random has to be predictable to not break bootstrap).
Author: rguenth Date: Fri Aug 12 07:34:40 2016 New Revision: 239405 URL: https://gcc.gnu.org/viewcvs?rev=239405&root=gcc&view=rev Log: 2016-08-12 Richard Biener <rguenther@suse.de> PR tree-optimization/72851 * tree-ssa-propagate.c: Include cfganal.h. Rewrite block and stmt worklists to use bitmaps indexed in execution order. (executable_blocks, cfg_blocks_num, cfg_blocks_tail, cfg_blocks_head, bb_in_list, interesting_ssa_edges, varying_ssa_edges): Remove. (cfg_blocks): Make a bitmap. (bb_to_cfg_order, cfg_order_to_bb, ssa_edge_worklist, uid_to_stmt): New globals. (cfg_blocks_empty_p): Adjust. (cfg_blocks_add): Likewise. (cfg_blocks_get): Likewise. (add_ssa_edge): Likewise. (add_control_edge): Likewise. (simulate_stmt): Likewise. (process_ssa_edge_worklist): Likewise. (simulate_block): Likewise. (ssa_prop_init): Compute PRE order and stmt UIDs. (ssa_prop_fini): Adjust. (ssa_propagate): Adjust. * gcc.dg/torture/pr72851.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/torture/pr72851.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-propagate.c
Fixed on trunk. Not sure what to do on the branch - nothing for 6.2 at least.
Author: rguenth Date: Tue Aug 23 13:49:00 2016 New Revision: 239699 URL: https://gcc.gnu.org/viewcvs?rev=239699&root=gcc&view=rev Log: 2016-08-23 Richard Biener <rguenther@suse.de> Backport from mainline 2016-08-16 Richard Biener <rguenther@suse.de> PR tree-optimization/76783 * tree-ssa-propagate.c (ssa_prop_init): Use RPO order. Clear BB visited flags at start. * gcc.dg/pr76783.c: New testcase. * gcc.dg/tree-ssa/pr69270-2.c: Adjust. 2016-08-12 Richard Biener <rguenther@suse.de> PR tree-optimization/72851 * tree-ssa-propagate.c: Include cfganal.h. Rewrite block and stmt worklists to use bitmaps indexed in execution order. (executable_blocks, cfg_blocks_num, cfg_blocks_tail, cfg_blocks_head, bb_in_list, interesting_ssa_edges, varying_ssa_edges): Remove. (cfg_blocks): Make a bitmap. (bb_to_cfg_order, cfg_order_to_bb, ssa_edge_worklist, uid_to_stmt): New globals. (cfg_blocks_empty_p): Adjust. (cfg_blocks_add): Likewise. (cfg_blocks_get): Likewise. (add_ssa_edge): Likewise. (add_control_edge): Likewise. (simulate_stmt): Likewise. (process_ssa_edge_worklist): Likewise. (simulate_block): Likewise. (ssa_prop_init): Compute PRE order and stmt UIDs. (ssa_prop_fini): Adjust. (ssa_propagate): Adjust. * gcc.dg/torture/pr72851.c: New testcase. Added: branches/gcc-6-branch/gcc/testsuite/gcc.dg/pr76783.c branches/gcc-6-branch/gcc/testsuite/gcc.dg/torture/pr72851.c Modified: branches/gcc-6-branch/gcc/ChangeLog branches/gcc-6-branch/gcc/testsuite/ChangeLog branches/gcc-6-branch/gcc/testsuite/gcc.dg/tree-ssa/pr69270-2.c branches/gcc-6-branch/gcc/tree-ssa-propagate.c
Fixed.