Bug 102513

Summary: [12/13/14/15 Regression] Many false positive warnings with recursive function
Product: gcc Reporter: Andreas Rheinhardt <andreas.rheinhardt>
Component: ipaAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: normal CC: aldyh, amacleod, dimhen, fxue, fxue, jakub, jamborm, marxin, msebor
Priority: P2 Keywords: diagnostic, missed-optimization
Version: 10.0   
Target Milestone: 12.5   
Host: Target:
Build: Known to work:
Known to fail: 10.3.0, 11.2.0, 12.0 Last reconfirmed: 2021-11-09 00:00:00
Bug Depends on:    
Bug Blocks: 56456, 88443    
Attachments: A file that produces false warnings when compiled with -O3.

Description Andreas Rheinhardt 2021-09-28 08:40:00 UTC
Created attachment 51514 [details]
A file that produces false warnings when compiled with -O3.

extern int block2[7][256];

static int encode_block(int block2[7][256], unsigned level)
{
    int best_score = 0;

    for (unsigned x = 0; x < level; x++) {
        int v = block2[1][x];
        block2[level][x] = 0;
        best_score      += v * v;
    }

    if (level > 0 && best_score > 64) {
        int score = 0;

        score += encode_block(block2, level - 1);
        score += encode_block(block2, level - 1);

        if (score < best_score) {
            best_score = score;
        }
    }

    return best_score;
}

int foo(void)
{
    return encode_block(block2, 5);
}

As is easily seen, level in the above code is always in the range 0..5 and therefore all accesses are safe. Yet when one compiles the above with -O3 (-O2 is quiet), one receives several warnings:

../gcc_bug.c: In function 'encode_block.constprop':
../gcc_bug.c:9:26: warning: iteration 256 invokes undefined behavior [-Waggressive-loop-optimizations]
    9 |         block2[level][x] = 0;
      |         ~~~~~~~~~~~~~~~~~^~~
../gcc_bug.c:7:5: note: within this loop
    7 |     for (unsigned x = 0; x < level; x++) {
      |     ^~~
../gcc_bug.c:9:26: warning: iteration 256 invokes undefined behavior [-Waggressive-loop-optimizations]
    9 |         block2[level][x] = 0;
      |         ~~~~~~~~~~~~~~~~~^~~
../gcc_bug.c:7:5: note: within this loop
    7 |     for (unsigned x = 0; x < level; x++) {
      |     ^~~
../gcc_bug.c:9:26: warning: iteration 256 invokes undefined behavior [-Waggressive-loop-optimizations]
    9 |         block2[level][x] = 0;
      |         ~~~~~~~~~~~~~~~~~^~~
../gcc_bug.c:7:5: note: within this loop
    7 |     for (unsigned x = 0; x < level; x++) {
      |     ^~~
../gcc_bug.c:9:26: warning: iteration 256 invokes undefined behavior [-Waggressive-loop-optimizations]
    9 |         block2[level][x] = 0;
      |         ~~~~~~~~~~~~~~~~~^~~
../gcc_bug.c:7:5: note: within this loop
    7 |     for (unsigned x = 0; x < level; x++) {
      |     ^~~
../gcc_bug.c:9:26: warning: iteration 256 invokes undefined behavior [-Waggressive-loop-optimizations]
    9 |         block2[level][x] = 0;
      |         ~~~~~~~~~~~~~~~~~^~~
../gcc_bug.c:7:5: note: within this loop
    7 |     for (unsigned x = 0; x < level; x++) {
      |     ^~~
../gcc_bug.c:9:26: warning: iteration 256 invokes undefined behavior [-Waggressive-loop-optimizations]
    9 |         block2[level][x] = 0;
      |         ~~~~~~~~~~~~~~~~~^~~
../gcc_bug.c:7:5: note: within this loop
    7 |     for (unsigned x = 0; x < level; x++) {
      |     ^~~
../gcc_bug.c:9:26: warning: iteration 256 invokes undefined behavior [-Waggressive-loop-optimizations]
    9 |         block2[level][x] = 0;
      |         ~~~~~~~~~~~~~~~~~^~~
../gcc_bug.c:7:5: note: within this loop
    7 |     for (unsigned x = 0; x < level; x++) {
      |     ^~~
../gcc_bug.c:9:26: warning: '__builtin_memset' writing 17179869176 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
    9 |         block2[level][x] = 0;
      |         ~~~~~~~~~~~~~~~~~^~~
../gcc_bug.c:9:26: warning: '__builtin_memset' writing 17179869172 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
    9 |         block2[level][x] = 0;
      |         ~~~~~~~~~~~~~~~~~^~~
../gcc_bug.c:9:26: warning: '__builtin_memset' writing 17179869168 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
    9 |         block2[level][x] = 0;
      |         ~~~~~~~~~~~~~~~~~^~~
../gcc_bug.c:9:26: warning: '__builtin_memset' writing 17179869168 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
    9 |         block2[level][x] = 0;
      |         ~~~~~~~~~~~~~~~~~^~~
In function 'encode_block',
    inlined from 'encode_block.constprop' at ../gcc_bug.c:17:18:
../gcc_bug.c:9:26: warning: '__builtin_memset' writing 17179869172 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
    9 |         block2[level][x] = 0;
      |         ~~~~~~~~~~~~~~~~~^~~
../gcc_bug.c:9:26: warning: '__builtin_memset' writing 17179869168 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
    9 |         block2[level][x] = 0;
      |         ~~~~~~~~~~~~~~~~~^~~
../gcc_bug.c:9:26: warning: '__builtin_memset' writing 17179869168 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
    9 |         block2[level][x] = 0;
      |         ~~~~~~~~~~~~~~~~~^~~

If one adds -Wall, one gets -Warray-bounds warnings instead of -Wstringop-overflow= warnings.
This happens in latest git master; I bisected it to 9b14fc3326e087975653b1af8ac54114041cde51. None of the other open -Warray-bounds/-Wstringop-overflow= warnings have been traced back to this commit, so I believe this bug report not to be a duplicate of one of the other open bugs.

PS: The above is based upon https://github.com/FFmpeg/FFmpeg/blob/56e9e0273a79fb4ff4276503e69981f8e383f22b/libavcodec/svq1enc.c#L91
This code is covered by our regression tests and none of them fail, so GCC does not seem to miscompile it (despite the warnings indicating GCC's internal state to be garbage).
(Somehow the warning is not triggered by 9b14fc3326e087975653b1af8ac54114041cde51 for this file: The warnings only appeared with GCC 11, GCC 10 is fine. I did not bisect this further. For another one of our files (https://github.com/FFmpeg/FFmpeg/blob/8be701d9f7f77ff2282cc7fe6e0791ca5419de70/libavfilter/vf_find_rect.c#L157) the warnings appeared with 9b14fc3326e087975653b1af8ac54114041cde51 and are also in GCC 10, yet somehow not in GCC 11 and also not in GCC master. I also did not bisect this further, as the underlying bug is probably the same for all, so fixing the testcase should fix all of it.)
Comment 1 Martin Sebor 2021-11-09 18:27:21 UTC
Confirmed with GCC 10 through 12.  Presumably all three warnings are false positives here, including -Waggressive-loop-optimizations.

The encode_block.constprop definition shows the first statement that triggers the -Warray-bounds:

__attribute__((access ("^0[7]", )))
int encode_block.constprop (int[256] * blk2, unsigned int level)
{
...
  <bb 2> [local count: 118111600]:
  __builtin_memset (&MEM <int[7][256]> [(void *)&block2 + 4398046509056B], 0, 17179869176);
  ivtmp.329_244 = (unsigned long) &MEM <int[7][256]> [(void *)&block2 + 1024B];
  _157 = (unsigned long) &block2;
  _46 = _157 + 17179870192;

The access to block2 is to the global definition of the array, not the argument (I have renamed the latter to blk2 before producing the dump to avoid confusion).  Since block2 is a 2 X 256 matrix the memset access is out of bounds.  The warning is doing its job pointing it out.  The whole function must be unreachable and shouldn't be emitted.
Comment 2 Andrew Pinski 2021-11-24 08:10:59 UTC
A simple workaround is to add at the begining of the function:
if (level == 0) return 0;

What I found is since we don't run loop copy header until after IPA-CP, we get some interesting IR where we don't jump thread the level == 0 case correctly.
Comment 3 Jakub Jelinek 2022-01-20 12:05:23 UTC
Started with r10-5098-g9b14fc3326e087975653b1af8ac54114041cde51
Comment 4 Jakub Jelinek 2022-01-20 12:26:22 UTC
+  Creating a specialized node of encode_block/0.
+    replacing param #0 block2 with const &block2
+    replacing param #1 level with const 4
+  Creating a specialized node of encode_block/0.
+    replacing param #0 block2 with const &block2
+    replacing param #1 level with const 3
+  Creating a specialized node of encode_block/0.
+    replacing param #0 block2 with const &block2
+    replacing param #1 level with const 2
+  Creating a specialized node of encode_block/0.
+    replacing param #0 block2 with const &block2
+    replacing param #1 level with const 1
+  Creating a specialized node of encode_block/0.
+    replacing param #0 block2 with const &block2
+    replacing param #1 level with const 0
+  Creating a specialized node of encode_block/0.
+    replacing param #0 block2 with const &block2
+    replacing param #1 level with const 4294967295
+  Creating a specialized node of encode_block/0.
+    replacing param #0 block2 with const &block2
+    replacing param #1 level with const 4294967294

Maybe the IPA transformation should just use VRP info to guide it.
We don't have anything that says level is [0, 5], that is there purely from the
fact that the caller calls it with 5, but we know that it isn't -1U:
  # RANGE [0, 4294967294]
  _9 = level_19(D) + 4294967295;
  _23 = encode_block (block2_21(D), _9);
  _26 = encode_block (block2_21(D), _9);
so we shouldn't create that specialized node and those that are needed just because we've created them.
Comment 5 Jakub Jelinek 2022-01-20 12:45:24 UTC
In the ipa-cp dump it even mentions it:
  Node: encode_block/0:
    param [0]: &block2 [loc_time: 0, loc_size: 0, prop_time: 0, prop_size: 0]
         ctxs: VARIABLE
         Bits: value = 0x0, mask = 0xfffffffffffffffc
         int[256] * [1B, +INF]
        AGGS VARIABLE
    param [1]: VARIABLE
               5 [loc_time: 147.512, loc_size: 27, prop_time: 760193, prop_size: 216]
               4 [loc_time: 147.512, loc_size: 27, prop_time: 143948, prop_size: 189]
               3 [loc_time: 147.512, loc_size: 27, prop_time: 31125.7, prop_size: 162]
               2 [loc_time: 147.512, loc_size: 27, prop_time: 7822.76, prop_size: 135]
               1 [loc_time: 147.512, loc_size: 27, prop_time: 2325.83, prop_size: 108]
               0 [loc_time: 147.512, loc_size: 27, prop_time: 825.122, prop_size: 81]
               4294967295 [loc_time: 147.512, loc_size: 27, prop_time: 342.227, prop_size: 54]
               4294967294 [loc_time: 147.512, loc_size: 27, prop_time: 147.512, prop_size: 27]
         ctxs: VARIABLE
         Bits unusable (BOTTOM)
         unsigned int [0, 4294967294]
        AGGS VARIABLE

but doesn't connect that when the param has value range of [0, 4294967294] and 4294967295
is outside of that range, it doesn't make sense to specialize on it.
Comment 6 Jakub Jelinek 2022-01-20 13:58:54 UTC
It is in
      /* Recursively generate lattice values with a limited count.  */
      FOR_EACH_VEC_ELT (val_seeds, i, src_val)
        {
          for (int j = 1; j < max_recursive_depth; j++)
            {
              tree cstval = get_val_across_arith_op (opcode, opnd1_type, opnd2,
                                                     src_val, res_type);
              if (!cstval
                  || !ipacp_value_safe_for_type (res_type, cstval))
                break;

              ret |= dest_lat->add_value (cstval, cs, src_val, src_idx,
                                          src_offset, &src_val, j);
              gcc_checking_assert (src_val);
            }
        }
(but there is another spot doing the similar thing) where it would be nice to also break if cstval is non-NULL and safe for type, but is outside of the value range.  I have no idea how to get from this spot at that value range though.
Comment 7 Feng Xue 2022-02-06 00:37:53 UTC
(In reply to Jakub Jelinek from comment #6)
> It is in
>       /* Recursively generate lattice values with a limited count.  */
>       FOR_EACH_VEC_ELT (val_seeds, i, src_val)
>         {
>           for (int j = 1; j < max_recursive_depth; j++)
>             {
>               tree cstval = get_val_across_arith_op (opcode, opnd1_type,
> opnd2,
>                                                      src_val, res_type);
>               if (!cstval
>                   || !ipacp_value_safe_for_type (res_type, cstval))
>                 break;
> 
>               ret |= dest_lat->add_value (cstval, cs, src_val, src_idx,
>                                           src_offset, &src_val, j);
>               gcc_checking_assert (src_val);
>             }
>         }
> (but there is another spot doing the similar thing) where it would be nice
> to also break if cstval is non-NULL and safe for type, but is outside of the
> value range.  I have no idea how to get from this spot at that value range
> though.

By default, ipcp is told to clone a recursive function 8 times, that exceeds value space of index in this case. We could rely on ipa fnsummary on condition predicate of a call to avoid generating never-executed copy. I will take it.
Comment 8 Martin Jambor 2022-02-14 13:58:57 UTC
I am about to thest the following patch.  In longer-run, it would be better to never generate lattice values outside of the value_range but there is an ordering problem, we need the complete VR info before we can use it.  I plan to rearrange IPA-CP into making multiple passes over the lattice dependency graph and this should quite naturally be solved by doing this kind of resursive-value-generation only in second and later passes. 


diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 453e9c93cc3..cbbb8bbc80a 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -6154,8 +6154,16 @@ decide_whether_version_node (struct cgraph_node *node)
        {
          ipcp_value<tree> *val;
          for (val = lat->values; val; val = val->next)
-           ret |= decide_about_value (node, i, -1, val, &avals,
-                                      &self_gen_clones);
+           {
+             if (!plats->m_value_range.bottom_p ()
+                 && !plats->m_value_range.m_vr.contains_p (val->value))
+               {
+                 gcc_checking_assert (val->self_recursion_generated_p ());
+                 continue;
+               }
+             ret |= decide_about_value (node, i, -1, val, &avals,
+                                        &self_gen_clones);
+           }
        }
 
       if (!plats->aggs_bottom)
Comment 9 Martin Jambor 2022-02-14 18:19:32 UTC
The patch had to be tweaked a bit but I have proposed the following one on the mailing list:

https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590371.html
Comment 10 Feng Xue 2022-02-15 04:50:23 UTC
(In reply to Martin Jambor from comment #8)
> I am about to thest the following patch.  In longer-run, it would be better
> to never generate lattice values outside of the value_range but there is an
> ordering problem, we need the complete VR info before we can use it.  I plan
> to rearrange IPA-CP into making multiple passes over the lattice dependency
> graph and this should quite naturally be solved by doing this kind of
> resursive-value-generation only in second and later passes. 
> 
> 
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index 453e9c93cc3..cbbb8bbc80a 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -6154,8 +6154,16 @@ decide_whether_version_node (struct cgraph_node *node)
>         {
>           ipcp_value<tree> *val;
>           for (val = lat->values; val; val = val->next)
> -           ret |= decide_about_value (node, i, -1, val, &avals,
> -                                      &self_gen_clones);
> +           {
> +             if (!plats->m_value_range.bottom_p ()
> +                 && !plats->m_value_range.m_vr.contains_p (val->value))
> +               {
> +                 gcc_checking_assert (val->self_recursion_generated_p ());
> +                 continue;
> +               }
> +             ret |= decide_about_value (node, i, -1, val, &avals,
> +                                        &self_gen_clones);
> +           }
>         }
>  
>        if (!plats->aggs_bottom)

Here is a complication that value range for recursion index might not be not easily computed, and could not prevent IPA-CP generating useless copy. Constraint of recursion index comes from "block2[level][x]", not its value range deduced from condition predicate (level > 0). Change the case to cover up value range of "level", and we will get same warning. So in the circumstances, one way for us is to disable warning for these copies?

extern int block2[7][256];

extern unsigned G_START;
extern unsigned G_SCALE;

static int encode_block(int block2[7][256], unsigned level)
{
    int best_score = 0;

    for (unsigned x = G_START; x < G_SCALE * level; x++) {
        int v = block2[1][x];
        block2[level][x] = 0;
        best_score      += v * v;
    }

    if (G_SCALE * level > G_START && best_score > 64) {
        int score = 0;

        score += encode_block(block2, level - 1);
        score += encode_block(block2, level - 1);

        if (score < best_score) {
            best_score = score;
        }
    }

    return best_score;
}
Comment 11 Martin Jambor 2022-02-17 15:25:41 UTC
I am very well aware that my patch was just a mitigation, not
something that would avoid the problem under all circumstances.  We
can attempt to look at array access indices during the summary
creation phase and save constraints on parameters in order to not
create the clones.  But even then, inlining and late optimizations can
expose such invalid array accesses in the clones that we still create
and that are practically impossible to know about at IPA-CP time.

It would be great if we finally invented a way to communicate to users
that a warning comes from a function specialized for a given context
or inlined at a particular point.  Then the user would see that
compiler created some dead code and might think it is stupid, but
would at least know what is going on (see PR 102061 and the discussion
in PR 60761).

Limiting cloning if we know from VR that we should, like my patch
does, is still a good thing to do, I think.
Comment 12 GCC Commits 2022-03-31 15:17:47 UTC
The master branch has been updated by Martin Jambor <jamborm@gcc.gnu.org>:

https://gcc.gnu.org/g:cf68f5a6d20db2aee2f3e674ad3f10e1c458edf9

commit r12-7937-gcf68f5a6d20db2aee2f3e674ad3f10e1c458edf9
Author: Martin Jambor <mjambor@suse.cz>
Date:   Thu Mar 31 17:14:42 2022 +0200

    ipa-cp: Do not create clones for values outside known value range (PR 102513)
    
    PR 102513 shows we emit bogus array access warnings when IPA-CP
    creates clones specialized for values which it deduces from arithmetic
    jump functions describing self-recursive calls.  Those can however be
    avoided if we consult the IPA-VR information that the same pass also
    has.
    
    The patch below does that at the stage when normally values are only
    examined for profitability.  It would be better not to create lattices
    describing such bogus values in the first place, however that presents
    an ordering problem, the pass currently propagates all information,
    and so both constants and VR, in no particular order when processing
    SCCs, and so this approach seemed much simpler.
    
    I plan to rearrange the pass so that it clones in multiple passes over
    the call graph (or rather the lattice dependence graph) and it feels
    natural to only do propagation for these kinds of recursion in the
    second or later passes, which would fix the issue more elegantly.
    
    gcc/ChangeLog:
    
    2022-02-14  Martin Jambor  <mjambor@suse.cz>
    
            PR ipa/102513
            * ipa-cp.cc (decide_whether_version_node): Skip scalar values
            which do not fit the known value_range.
    
    gcc/testsuite/ChangeLog:
    
    2022-02-14  Martin Jambor  <mjambor@suse.cz>
    
            PR ipa/102513
            * gcc.dg/ipa/pr102513.c: New test.
Comment 13 Martin Jambor 2022-03-31 16:37:09 UTC
Mitigated using value-range on master (soon to be GCC 12).  On 11 and earlier, the determined IPA value-range is "variable" - I have not looked at why - so the same approach cannot be taken there.

I am un-assigning myself so that whoever wants to explore other options how to prevent/mitigate this issue does not feel hindered.  (But I'll keep the issue on my list to ponder about.)
Comment 14 Jakub Jelinek 2022-06-28 10:46:28 UTC
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
Comment 15 Richard Biener 2023-07-07 10:41:03 UTC
GCC 10 branch is being closed.
Comment 16 Richard Biener 2024-07-19 13:14:02 UTC
GCC 11 branch is being closed.