Bug 100801 - [11/12 Regression] Aggressive loop optimizations cause incorrect warning
Summary: [11/12 Regression] Aggressive loop optimizations cause incorrect warning
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 11.1.0
: P2 normal
Target Milestone: 11.5
Assignee: Richard Biener
URL:
Keywords: diagnostic, missed-optimization
Depends on:
Blocks:
 
Reported: 2021-05-27 14:03 UTC by Joel Linn
Modified: 2024-03-11 03:42 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work: 8.5.0
Known to fail: 10.1.0, 9.1.0
Last reconfirmed: 2021-05-28 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Joel Linn 2021-05-27 14:03:54 UTC
The following warning is triggered

> $ gcc-11 -c constproploopopt.c -O2 -Wall -mavx -g
> constproploopopt.c: In function ‘test’:
> constproploopopt.c:22:18: warning: iteration 4611686018427387903 invokes undefined behavior [-Waggressive-loop-optimizations]
>    22 |     dest[i] = src[i];
>       |                  ^
> constproploopopt.c:21:12: note: within this loop
>    21 |   for (; i < count; ++i) {  // handle residual elements
>       |          ~~^~~~~~~

by this (minimal) code:

> #include <stdint.h>
> #include <stdio.h>
> #if defined(_MSC_VER)
> #include <intrin.h>
> #else
> #include <x86intrin.h>
> #endif
> 
> void copy_32_unaligned(uint32_t* dest, const uint32_t* src, size_t count) {
>   // invariant/nop
>   __m128i shufmask =  _mm_set_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
> 
>   size_t i;
>   for (i = 0; i + 4 <= count; i += 4) {
>     __m128i input = _mm_loadu_si128((const __m128i*)(&src[i]));
>     __m128i output = input;
>     // no warning without the shuffle:
>     output = _mm_shuffle_epi8(input, shufmask);
>     _mm_storeu_si128((__m128i*)(&dest[i]), output);
>   }
>   for (; i < count; ++i) {  // handle residual elements
>     dest[i] = src[i];
>   }
> }
> 
> void test(uint32_t* buf1, uint32_t* buf2) {
>     copy_32_unaligned(buf2, buf1,
>                       // multiples of 4 and greater or equal then 12 trigger it:
>                       12);
> }

From objdump output I believe the generated code is correct though. The warning seems to be incorrect in this context, especially since the "residual" loop should be skipped for count=n*4 anyways.
Comment 1 Richard Biener 2021-05-28 10:54:26 UTC
Confirmed.  We're warning on the obviously unreachable loop since:

<bb 4> [local count: 1073741824]:
_35 = 12;
if (_35 != 12)
  goto <bb 7>; [75.00%]
else
  goto <bb 6>; [25.00%]

$3 = <basic_block 0x7ffff434b680 (4)>

since we apply final value replacement to 'i' in sccp but do not propagate it
before the next number of iteration analysis in ivcanon:

          NEXT_PASS (pass_tree_loop_init);
          NEXT_PASS (pass_tree_unswitch);
          NEXT_PASS (pass_scev_cprop);          <<<< final value repl.
          NEXT_PASS (pass_loop_split);
          NEXT_PASS (pass_loop_versioning);
          NEXT_PASS (pass_loop_jam);
          /* All unswitching, final value replacement and splitting can expose
             empty loops.  Remove them now.  */
          NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
          NEXT_PASS (pass_iv_canon);           <<<<< warning
          NEXT_PASS (pass_loop_distribution);
          NEXT_PASS (pass_linterchange);
          NEXT_PASS (pass_copy_prop);          <<<<<< propagation

IIRC final value replacement used to propagate and fold but I(?) removed this
at some point.
Comment 2 Joel Linn 2021-05-28 12:45:21 UTC
Great. In the meantime I will use 
> if (count % 4 == 0) __builtin_unreachable();
at the start of the for loop to suppress the warning as suggested by Martin Sebor https://gcc.gnu.org/pipermail/gcc-help/2021-May/140339.html
Comment 3 Matthijs van Duin 2021-07-31 00:41:45 UTC
Simpler testcase:

/* compiler flags required to trigger warning: -O1 -ftree-vrp  */

static void foo(int *x, long n)
{
	long i = 0;
	for (; i + 4 <= n; i += 4) {
	}
	for (; i < n; i++) {
		x[i]++;
	}
}

void bar(int *x)
{
	foo(x, 128);
}


A workaround with no runtime overhead is adding the following check between the loops:

if (__builtin_constant_p(n % 4 == 0) && n % 4 == 0)
	return;
Comment 4 Andrew Pinski 2021-12-20 00:55:11 UTC
Note the warning started in GCC 9 even. So it is a regression.
Comment 5 Richard Biener 2022-05-27 09:45:32 UTC
GCC 9 branch is being closed
Comment 6 Jakub Jelinek 2022-06-28 10:45:12 UTC
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
Comment 7 Richard Biener 2023-07-07 10:40:06 UTC
GCC 10 branch is being closed.
Comment 8 Jeffrey A. Law 2024-03-11 03:42:33 UTC
Works with gcc-13 and the trunk.  Adjusting markers.