Summary: | [12/13/14/15 Regression] Many false positive warnings with recursive function | ||
---|---|---|---|
Product: | gcc | Reporter: | Andreas Rheinhardt <andreas.rheinhardt> |
Component: | ipa | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | NEW --- | ||
Severity: | normal | CC: | aldyh, amacleod, dimhen, fxue, fxue, jakub, jamborm, marxin, msebor |
Priority: | P2 | Keywords: | diagnostic, missed-optimization |
Version: | 10.0 | ||
Target Milestone: | 12.5 | ||
Host: | Target: | ||
Build: | Known to work: | ||
Known to fail: | 10.3.0, 11.2.0, 12.0 | Last reconfirmed: | 2021-11-09 00:00:00 |
Bug Depends on: | |||
Bug Blocks: | 56456, 88443 | ||
Attachments: | A file that produces false warnings when compiled with -O3. |
Description
Andreas Rheinhardt
2021-09-28 08:40:00 UTC
Confirmed with GCC 10 through 12. Presumably all three warnings are false positives here, including -Waggressive-loop-optimizations. The encode_block.constprop definition shows the first statement that triggers the -Warray-bounds: __attribute__((access ("^0[7]", ))) int encode_block.constprop (int[256] * blk2, unsigned int level) { ... <bb 2> [local count: 118111600]: __builtin_memset (&MEM <int[7][256]> [(void *)&block2 + 4398046509056B], 0, 17179869176); ivtmp.329_244 = (unsigned long) &MEM <int[7][256]> [(void *)&block2 + 1024B]; _157 = (unsigned long) &block2; _46 = _157 + 17179870192; The access to block2 is to the global definition of the array, not the argument (I have renamed the latter to blk2 before producing the dump to avoid confusion). Since block2 is a 2 X 256 matrix the memset access is out of bounds. The warning is doing its job pointing it out. The whole function must be unreachable and shouldn't be emitted. A simple workaround is to add at the begining of the function: if (level == 0) return 0; What I found is since we don't run loop copy header until after IPA-CP, we get some interesting IR where we don't jump thread the level == 0 case correctly. Started with r10-5098-g9b14fc3326e087975653b1af8ac54114041cde51 + Creating a specialized node of encode_block/0. + replacing param #0 block2 with const &block2 + replacing param #1 level with const 4 + Creating a specialized node of encode_block/0. + replacing param #0 block2 with const &block2 + replacing param #1 level with const 3 + Creating a specialized node of encode_block/0. + replacing param #0 block2 with const &block2 + replacing param #1 level with const 2 + Creating a specialized node of encode_block/0. + replacing param #0 block2 with const &block2 + replacing param #1 level with const 1 + Creating a specialized node of encode_block/0. + replacing param #0 block2 with const &block2 + replacing param #1 level with const 0 + Creating a specialized node of encode_block/0. + replacing param #0 block2 with const &block2 + replacing param #1 level with const 4294967295 + Creating a specialized node of encode_block/0. + replacing param #0 block2 with const &block2 + replacing param #1 level with const 4294967294 Maybe the IPA transformation should just use VRP info to guide it. We don't have anything that says level is [0, 5], that is there purely from the fact that the caller calls it with 5, but we know that it isn't -1U: # RANGE [0, 4294967294] _9 = level_19(D) + 4294967295; _23 = encode_block (block2_21(D), _9); _26 = encode_block (block2_21(D), _9); so we shouldn't create that specialized node and those that are needed just because we've created them. In the ipa-cp dump it even mentions it: Node: encode_block/0: param [0]: &block2 [loc_time: 0, loc_size: 0, prop_time: 0, prop_size: 0] ctxs: VARIABLE Bits: value = 0x0, mask = 0xfffffffffffffffc int[256] * [1B, +INF] AGGS VARIABLE param [1]: VARIABLE 5 [loc_time: 147.512, loc_size: 27, prop_time: 760193, prop_size: 216] 4 [loc_time: 147.512, loc_size: 27, prop_time: 143948, prop_size: 189] 3 [loc_time: 147.512, loc_size: 27, prop_time: 31125.7, prop_size: 162] 2 [loc_time: 147.512, loc_size: 27, prop_time: 7822.76, prop_size: 135] 1 [loc_time: 147.512, loc_size: 27, prop_time: 2325.83, prop_size: 108] 0 [loc_time: 147.512, loc_size: 27, prop_time: 825.122, prop_size: 81] 4294967295 [loc_time: 147.512, loc_size: 27, prop_time: 342.227, prop_size: 54] 4294967294 [loc_time: 147.512, loc_size: 27, prop_time: 147.512, prop_size: 27] ctxs: VARIABLE Bits unusable (BOTTOM) unsigned int [0, 4294967294] AGGS VARIABLE but doesn't connect that when the param has value range of [0, 4294967294] and 4294967295 is outside of that range, it doesn't make sense to specialize on it. It is in /* Recursively generate lattice values with a limited count. */ FOR_EACH_VEC_ELT (val_seeds, i, src_val) { for (int j = 1; j < max_recursive_depth; j++) { tree cstval = get_val_across_arith_op (opcode, opnd1_type, opnd2, src_val, res_type); if (!cstval || !ipacp_value_safe_for_type (res_type, cstval)) break; ret |= dest_lat->add_value (cstval, cs, src_val, src_idx, src_offset, &src_val, j); gcc_checking_assert (src_val); } } (but there is another spot doing the similar thing) where it would be nice to also break if cstval is non-NULL and safe for type, but is outside of the value range. I have no idea how to get from this spot at that value range though. (In reply to Jakub Jelinek from comment #6) > It is in > /* Recursively generate lattice values with a limited count. */ > FOR_EACH_VEC_ELT (val_seeds, i, src_val) > { > for (int j = 1; j < max_recursive_depth; j++) > { > tree cstval = get_val_across_arith_op (opcode, opnd1_type, > opnd2, > src_val, res_type); > if (!cstval > || !ipacp_value_safe_for_type (res_type, cstval)) > break; > > ret |= dest_lat->add_value (cstval, cs, src_val, src_idx, > src_offset, &src_val, j); > gcc_checking_assert (src_val); > } > } > (but there is another spot doing the similar thing) where it would be nice > to also break if cstval is non-NULL and safe for type, but is outside of the > value range. I have no idea how to get from this spot at that value range > though. By default, ipcp is told to clone a recursive function 8 times, that exceeds value space of index in this case. We could rely on ipa fnsummary on condition predicate of a call to avoid generating never-executed copy. I will take it. I am about to thest the following patch. In longer-run, it would be better to never generate lattice values outside of the value_range but there is an ordering problem, we need the complete VR info before we can use it. I plan to rearrange IPA-CP into making multiple passes over the lattice dependency graph and this should quite naturally be solved by doing this kind of resursive-value-generation only in second and later passes. diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc index 453e9c93cc3..cbbb8bbc80a 100644 --- a/gcc/ipa-cp.cc +++ b/gcc/ipa-cp.cc @@ -6154,8 +6154,16 @@ decide_whether_version_node (struct cgraph_node *node) { ipcp_value<tree> *val; for (val = lat->values; val; val = val->next) - ret |= decide_about_value (node, i, -1, val, &avals, - &self_gen_clones); + { + if (!plats->m_value_range.bottom_p () + && !plats->m_value_range.m_vr.contains_p (val->value)) + { + gcc_checking_assert (val->self_recursion_generated_p ()); + continue; + } + ret |= decide_about_value (node, i, -1, val, &avals, + &self_gen_clones); + } } if (!plats->aggs_bottom) The patch had to be tweaked a bit but I have proposed the following one on the mailing list: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590371.html (In reply to Martin Jambor from comment #8) > I am about to thest the following patch. In longer-run, it would be better > to never generate lattice values outside of the value_range but there is an > ordering problem, we need the complete VR info before we can use it. I plan > to rearrange IPA-CP into making multiple passes over the lattice dependency > graph and this should quite naturally be solved by doing this kind of > resursive-value-generation only in second and later passes. > > > diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc > index 453e9c93cc3..cbbb8bbc80a 100644 > --- a/gcc/ipa-cp.cc > +++ b/gcc/ipa-cp.cc > @@ -6154,8 +6154,16 @@ decide_whether_version_node (struct cgraph_node *node) > { > ipcp_value<tree> *val; > for (val = lat->values; val; val = val->next) > - ret |= decide_about_value (node, i, -1, val, &avals, > - &self_gen_clones); > + { > + if (!plats->m_value_range.bottom_p () > + && !plats->m_value_range.m_vr.contains_p (val->value)) > + { > + gcc_checking_assert (val->self_recursion_generated_p ()); > + continue; > + } > + ret |= decide_about_value (node, i, -1, val, &avals, > + &self_gen_clones); > + } > } > > if (!plats->aggs_bottom) Here is a complication that value range for recursion index might not be not easily computed, and could not prevent IPA-CP generating useless copy. Constraint of recursion index comes from "block2[level][x]", not its value range deduced from condition predicate (level > 0). Change the case to cover up value range of "level", and we will get same warning. So in the circumstances, one way for us is to disable warning for these copies? extern int block2[7][256]; extern unsigned G_START; extern unsigned G_SCALE; static int encode_block(int block2[7][256], unsigned level) { int best_score = 0; for (unsigned x = G_START; x < G_SCALE * level; x++) { int v = block2[1][x]; block2[level][x] = 0; best_score += v * v; } if (G_SCALE * level > G_START && best_score > 64) { int score = 0; score += encode_block(block2, level - 1); score += encode_block(block2, level - 1); if (score < best_score) { best_score = score; } } return best_score; } I am very well aware that my patch was just a mitigation, not something that would avoid the problem under all circumstances. We can attempt to look at array access indices during the summary creation phase and save constraints on parameters in order to not create the clones. But even then, inlining and late optimizations can expose such invalid array accesses in the clones that we still create and that are practically impossible to know about at IPA-CP time. It would be great if we finally invented a way to communicate to users that a warning comes from a function specialized for a given context or inlined at a particular point. Then the user would see that compiler created some dead code and might think it is stupid, but would at least know what is going on (see PR 102061 and the discussion in PR 60761). Limiting cloning if we know from VR that we should, like my patch does, is still a good thing to do, I think. The master branch has been updated by Martin Jambor <jamborm@gcc.gnu.org>: https://gcc.gnu.org/g:cf68f5a6d20db2aee2f3e674ad3f10e1c458edf9 commit r12-7937-gcf68f5a6d20db2aee2f3e674ad3f10e1c458edf9 Author: Martin Jambor <mjambor@suse.cz> Date: Thu Mar 31 17:14:42 2022 +0200 ipa-cp: Do not create clones for values outside known value range (PR 102513) PR 102513 shows we emit bogus array access warnings when IPA-CP creates clones specialized for values which it deduces from arithmetic jump functions describing self-recursive calls. Those can however be avoided if we consult the IPA-VR information that the same pass also has. The patch below does that at the stage when normally values are only examined for profitability. It would be better not to create lattices describing such bogus values in the first place, however that presents an ordering problem, the pass currently propagates all information, and so both constants and VR, in no particular order when processing SCCs, and so this approach seemed much simpler. I plan to rearrange the pass so that it clones in multiple passes over the call graph (or rather the lattice dependence graph) and it feels natural to only do propagation for these kinds of recursion in the second or later passes, which would fix the issue more elegantly. gcc/ChangeLog: 2022-02-14 Martin Jambor <mjambor@suse.cz> PR ipa/102513 * ipa-cp.cc (decide_whether_version_node): Skip scalar values which do not fit the known value_range. gcc/testsuite/ChangeLog: 2022-02-14 Martin Jambor <mjambor@suse.cz> PR ipa/102513 * gcc.dg/ipa/pr102513.c: New test. Mitigated using value-range on master (soon to be GCC 12). On 11 and earlier, the determined IPA value-range is "variable" - I have not looked at why - so the same approach cannot be taken there. I am un-assigning myself so that whoever wants to explore other options how to prevent/mitigate this issue does not feel hindered. (But I'll keep the issue on my list to ponder about.) GCC 10.4 is being released, retargeting bugs to GCC 10.5. GCC 10 branch is being closed. GCC 11 branch is being closed. |