Bug 113703 - ivopts miscompiles loop
Summary: ivopts miscompiles loop
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 14.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: needs-bisection, wrong-code
Depends on:
Blocks:
 
Reported: 2024-02-01 11:17 UTC by Krister Walfridsson
Modified: 2024-02-12 02:41 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 14.0, 4.8.1, 7.5.0
Last reconfirmed: 2024-02-05 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Krister Walfridsson 2024-02-01 11:17:00 UTC
The following function (gcc.dg/tree-ssa/ivopts-lt.c) is miscompiled when compiled with with -O1 for X86_64:

#include "stdint.h"

void
f1 (char *p, uintptr_t i, uintptr_t n)
{
  p += i;
  do
    {
      *p = '\0';
      p += 1;
      i++;
    }
  while (i < n);
}


The IR after cunroll looks like:

void f1 (char * p, uintptr_t i, uintptr_t n)
{
  <bb 2>:
  p_6 = p_4(D) + i_5(D);

  <bb 3>:
  # p_1 = PHI <p_6(2), p_9(5)>
  # i_2 = PHI <i_5(D)(2), i_10(5)>
  *p_1 = 0;
  p_9 = p_1 + 1;
  i_10 = i_2 + 1;
  if (i_10 < n_11(D))
    goto <bb 5>;
  else
    goto <bb 4>;

  <bb 5>:
  goto <bb 3>;

  <bb 4>:
  return;
}


This is then changed by ivopts to

void f1 (char * p, uintptr_t i, uintptr_t n)
{
  sizetype _13;
  char * _14;

  <bb 2>:
  p_6 = p_4(D) + i_5(D);
  _13 = n_11(D) - i_5(D);
  _14 = p_6 + _13;

  <bb 3>:
  # p_1 = PHI <p_6(2), p_9(5)>
  MEM[(char *)p_1] = 0;
  p_9 = p_1 + 1;
  if (p_9 < _14)
    goto <bb 5>;
  else
    goto <bb 4>;

  <bb 5>:
  goto <bb 3>;

  <bb 4>:
  return;
}


Suppose the function gets called with the values:

  p = 0x0002ffffffffffff
  i = 0xffff000000000001
  n = 0xdffd7fffffffffff

The original function writes 0 to address 0x0002000000000000, and then exits.

The optimized function overflows when calculating _14, and the function does the equivalent of
  memset(0x0002000000000000, 0, 0xdffe7ffffffffffe);
Comment 1 Richard Biener 2024-02-01 14:12:49 UTC
I think the point is we fail to represent

Analyzing # of iterations of loop 1
  exit condition [i_5(D) + 1, + , 1] < n_11(D)
  bounds on difference of bases: -18446744073709551615 ... 18446744073709551615
  result:
    zero if i_5(D) + 1 > n_11(D)
    # of iterations (n_11(D) - i_5(D)) + 18446744073709551615, bounded by 18446744073709551615
  number of iterations (n_11(D) - i_5(D)) + 18446744073709551615; zero if i_5(D) + 1 > n_11(D)

specifically the 'zero if i_5(D) + 1 > n_11(D)'

I think may_eliminate_iv is wrong here, maybe not considering overflow
of the niter expression?

I wonder if it is possible to write a runtime testcase that FAILs with
reasonable memory requirement/layout.
Comment 2 Krister Walfridsson 2024-02-01 15:01:06 UTC
Here is a runtime testcase:

#include <sys/mman.h>
#include <unistd.h>
#include <stdint.h>

__attribute__((noipa))
void f1 (char *p, uintptr_t i, uintptr_t n)
{
  p += i;
  do
    {
      *p = '\0';
      p += 1;
      i++;
    }
  while (i < n);
}

int main()
{
  long pgsz = sysconf (_SC_PAGESIZE);
  void *p = mmap (NULL, pgsz * 2, PROT_READ|PROT_WRITE,
     MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
  if (p == MAP_FAILED)
    return 0;
  mprotect (p+pgsz, pgsz, PROT_NONE);
  uintptr_t n = -3 - (uintptr_t)p;
  f1 (p+2, -2, n);
  return 0;
}
Comment 3 Krister Walfridsson 2024-02-01 15:30:36 UTC
Oops. I messed up the test case...  It "works", but the actual values does not make sense...

The following is better:

int main()
{
  long pgsz = sysconf (_SC_PAGESIZE);
  void *p = mmap (NULL, pgsz * 2, PROT_READ|PROT_WRITE,
     MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
  if (p == MAP_FAILED)
    return 0;
  mprotect (p+pgsz, pgsz, PROT_NONE);
  uintptr_t n = -2 - (uintptr_t)(p+pgsz);
  f1 (p+pgsz, -2, n);
  return 0;
}
Comment 4 Richard Biener 2024-02-05 10:04:55 UTC
Confirmed.
Comment 5 Richard Biener 2024-02-06 11:33:32 UTC
It's going wrong in iv_elimination_compare_lt which tries to exactly handle this kind of loop:

   We aim to handle the following situation:

   sometype *base, *p;
   int a, b, i;
  
   i = a;
   p = p_0 = base + a;
  
   do
     {
       bla (*p);
       p++;
       i++;
     }
   while (i < b);
    
   Here, the number of iterations of the loop is (a + 1 > b) ? 0 : b - a - 1.
   We aim to optimize this to

   p = p_0 = base + a;
   do
     {
       bla (*p);
       p++;
     }
   while (p < p_0 - a + b);
    
   This preserves the correctness, since the pointer arithmetics does not
   overflow.  More precisely:
  
   1) if a + 1 <= b, then p_0 - a + b is the final value of p, hence there is no
      overflow in computing it or the values of p.
   2) if a + 1 > b, then we need to verify that the expression p_0 - a does not
      overflow.  To prove this, we use the fact that p_0 = base + a.

there's either a hole in that logic or the implementation is off.

  /* Finally, check that CAND->IV->BASE - CAND->IV->STEP * A does not
     overflow.  */
  offset = fold_build2 (MULT_EXPR, TREE_TYPE (cand->iv->step),
                        cand->iv->step,
                        fold_convert (TREE_TYPE (cand->iv->step), a));
  if (!difference_cannot_overflow_p (data, cand->iv->base, offset))
    return false;

where 'A' is 'i', CAND->IV->BASE is 'p + i' and CAND->IV->STEP is 1
as 'sizetype'.

That just checks that (p + i) - i doesn't overflow.

Somehow it misses to prove p + b doesn't overflow since we end up with
p' < (p + i) + (n - i) aka p' < p + n.