[Bug tree-optimization/103721] New: [12 regression] wrong code generated for loop with conditional (jump threading?)

sss@li-snyder.org gcc-bugzilla@gcc.gnu.org
Tue Dec 14 22:19:20 GMT 2021


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103721

            Bug ID: 103721
           Summary: [12 regression] wrong code generated for loop with
                    conditional (jump threading?)
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: sss@li-snyder.org
  Target Milestone: ---

hi -

With a recent checkout of gcc12 (20211214), on a x86_64-pc-linux-gnu host,
the following source is miscompiled with -O2:

-- x.cc --------------------------------------------------
bool flag();
const int* bar();

const int* f (const int* world)
{
  const int* searchVolume = world;
  const int* currentVolume = nullptr;
  while (currentVolume != searchVolume && searchVolume) {
    currentVolume = searchVolume;
    if (flag())
      searchVolume = bar();
  }
  return (currentVolume);
}
----------------------------------------------------------


This can be demonstrated with this test harness:


-- test.cc -------------------------------------------------
#include <stdio.h>

const int* f (const int* world);
bool flag() { return true; }

int ii[3] = {0};
int ipos = 0;

const int *bar() {
  if (ipos == 2)
    return nullptr;
  return &ii[++ipos];
}

int main()
{
  const int* i = f (&ii[0]);
  printf ("%d\n", i - &ii[0]);
  return 0;
}
------------------------------------------------------------

I expect this to print 2, which it does when compiled with -O0 or O1.
But with -O2:

$ g++ -c -O2 x.cc -o x.o
$ g++ -o test  test.cc x.o
$ ./test
0


Here is the generated code for the function f (directives removed
for readability):

_Z1fPKi:
.LFB0:
        pushq   %rbx
        movq    %rdi, %rbx
        testq   %rdi, %rdi
        je      .L2
        call    _Z4flagv
        testb   %al, %al
        jne     .L10
.L2:
        movq    %rbx, %rax
        popq    %rbx
        ret
.L10:
        call    _Z3barv
        movq    %rbx, %rax
        popq    %rbx
        ret


As one can see, in the generated code, the loop is not executed more
than once.  It appears as if the fact that searchVolume can be changed
within the if statement is being lost --- if the conditional is commented
out, then things work as expected.

Things seem to go bad by at least the threadfull1 pass.
In the previous pass, 110t.mergephi2, we have:

-----------------------------------------------------------------

;; Function f (_Z1fPKi, funcdef_no=0, decl_uid=2370, cgraph_uid=1,
symbol_order=0)

Removing basic block 6
const int * f (const int * world)
{
  const int * currentVolume;
  const int * searchVolume;
  bool _1;
  bool _2;
  bool _3;
  bool _14;
  const int * _16;

  <bb 2> [local count: 118111600]:
  goto <bb 8>; [100.00%]

  <bb 3> [local count: 955630225]:
  _14 = flag ();
  if (_14 != 0)
    goto <bb 4>; [33.00%]
  else
    goto <bb 8>; [67.00%]

  <bb 4> [local count: 315357972]:
  _16 = bar ();

  <bb 8> [local count: 1073741824]:
  # searchVolume_4 = PHI <_16(4), world_7(D)(2), searchVolume_4(3)>
  # currentVolume_5 = PHI <searchVolume_4(4), 0B(2), searchVolume_4(3)>
  _1 = searchVolume_4 != currentVolume_5;
  _2 = searchVolume_4 != 0B;
  _3 = _1 & _2;
  if (_3 != 0)
    goto <bb 3>; [89.00%]
  else
    goto <bb 7>; [11.00%]

  <bb 7> [local count: 118111600]:
  return currentVolume_5;

}



-----------------------------------------------------------------

which does seem to represent the loop correctly.
But in 111t.threadfull1, this has been changed to:

------------------------------------------------------------------


;; Function f (_Z1fPKi, funcdef_no=0, decl_uid=2370, cgraph_uid=1,
symbol_order=0)

Disambiguating loop 1 with multiple latches
Merged latch edges of loop 1
;; 2 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2 3 4 8 9 7
;;
;; Loop 1
;;  header 9, latch 8
;;  depth 1, outer 0
;;  nodes: 9 8 4 3
;; 2 succs { 9 }
;; 3 succs { 4 8 }
;; 4 succs { 8 }
;; 8 succs { 9 }
;; 9 succs { 3 7 }
;; 7 succs { 1 }
Jump threading proved probability of edge 9->7 too small (it is 11.0% (guessed)
should be always (guessed))

SSA replacement table
N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j

searchVolume_12 -> { searchVolume_4 }
currentVolume_17 -> { currentVolume_5 }
.MEM_18 -> { .MEM_6 }
_19 -> { _1 }
_20 -> { _2 }
_21 -> { _3 }
currentVolume_22 -> { currentVolume_5 }
.MEM_23 -> { .MEM_6 }
Incremental SSA update started at block: 9
Number of blocks in CFG: 11
Number of blocks to update: 6 ( 55%)


Merging blocks 2 and 9
Merging blocks 8 and 10
const int * f (const int * world)
{
  const int * currentVolume;
  const int * searchVolume;
  bool _1;
  bool _2;
  bool _3;
  bool _14;
  const int * _16;
  bool _19;
  bool _20;
  bool _21;

  <bb 2> [local count: 118111600]:
  _1 = world_7(D) != 0B;
  _2 = world_7(D) != 0B;
  _3 = _1 & _2;
  if (_3 != 0)
    goto <bb 3>; [97.00%]
  else
    goto <bb 6>; [3.00%]

  <bb 3> [local count: 955630225]:
  _14 = flag ();
  if (_14 != 0)
    goto <bb 4>; [33.00%]
  else
    goto <bb 5>; [67.00%]

  <bb 4> [local count: 315357972]:
  _16 = bar ();

  <bb 5> [local count: 955630224]:
  # searchVolume_11 = PHI <_16(4), world_7(D)(3)>
  # currentVolume_8 = PHI <world_7(D)(4), world_7(D)(3)>
  _19 = currentVolume_8 != searchVolume_11;
  _20 = searchVolume_11 != 0B;
  _21 = _19 & _20;

  <bb 6> [local count: 118111600]:
  # currentVolume_22 = PHI <0B(2), currentVolume_8(5)>
  return currentVolume_22;

}

------------------------------------------------------------------

which has no backwards branch.


More information about the Gcc-bugs mailing list