Bug 96351 - missed opportunity to optimize out redundant loop
Summary: missed opportunity to optimize out redundant loop
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 11.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: VRP
  Show dependency treegraph
 
Reported: 2020-07-28 07:32 UTC by Fei Yang
Modified: 2020-11-26 08:21 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-07-28 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Fei Yang 2020-07-28 07:32:56 UTC
inline unsigned int
stringLen(const short* const src)
{
    if (src == 0 || *src == 0) {
        return 0;
    } else {
        const short* pszTmp = src + 1;

        while (*pszTmp)
            ++pszTmp;

        return (unsigned int)(pszTmp - src);
    }
}

extern void bar();

void foo(const short* const str) {
    unsigned int len = stringLen(str);
    if (!len) {
        bar();
    }
}

When stringLen is inlined into foo, the else block in stringLen can be simplified into non-zero, thus eliminating the while loop. This looks like a tree VRP issue, but this pass does not work as expected for this test case.

$ g++ -S -O2 foo.cpp -fdump-tree-vrp

Consider function foo, value ranges after VRP does not help here:
 48
 49 .MEM_1: <<< error >>> VARYING
 50 str_3(D): const short int * const VARYING
 51 _6: short int VARYING
 52 str_7: const short int * const [1B, +INF]  EQUIVALENCES: { str_3(D) } (1 elements)
 53 pszTmp_8: const short int * [1B, +INF]  EQUIVALENCES: { pszTmp_10 } (1 elements)
 54 pszTmp_9: const short int * const [1B, +INF]
 55 pszTmp_10: const short int * const [1B, +INF]
 56 _11: short int VARYING
 57 pszTmp_12: const short int * [1B, +INF]
 58 _13: unsigned int [0, 0]
 59 _14: long int VARYING
 60 _15: long int [-4611686018427387904, 4611686018427387903]
 61 _16: unsigned int VARYING
 62 _18: unsigned int [0, 0]
 63 pszTmp_19: const short int * [1B, +INF]  EQUIVALENCES: { pszTmp_10 } (1 elements)

 ......

 93   <bb 4> [local count: 439750964]:
 94   pszTmp_9 = str_3(D) + 2;
 95
 96   <bb 5> [local count: 3997736055]:
 97   # pszTmp_10 = PHI <pszTmp_9(4), pszTmp_12(6)>
 98   _11 = *pszTmp_10;
 99   if (_11 == 0)
100     goto <bb 7>; [11.00%]
101   else
102     goto <bb 6>; [89.00%]
103
104   <bb 6> [local count: 3557985095]:
105   pszTmp_12 = pszTmp_10 + 2;
106   goto <bb 5>; [100.00%]
107
108   <bb 7> [local count: 439750964]:
109   # pszTmp_8 = PHI <pszTmp_10(5)>
110   _14 = pszTmp_8 - str_3(D);
111   _15 = _14 /[ex] 2;
112   _16 = (unsigned int) _15;
113   if (_16 == 0)
114     goto <bb 8>; [3.91%]
115   else
116     goto <bb 9>; [96.09%]
117
118   <bb 8> [local count: 354334798]:
119   bar ();
120
121   <bb 9> [local count: 1073741824]:
122   return;

Any suggestions to proceed?
Comment 1 Richard Biener 2020-07-28 08:49:53 UTC
VRP doesn't work "backwards", it also does an incredibly bad job at
tracking pointer ranges where it simply prefers to track non-NULL.
Here we'd need to track symbolic [str_7 + [2, +INF] ] or so which
it cannot do either.

Eventually other analyses (SCEV?) could derive sth for

  # pszTmp_8 = PHI <pszTmp_19(8)>
  _14 = pszTmp_8 - str_7;
  _15 = _14 /[ex] 2;

but even there, since we do not know the number of iterations, the
representation will have its limits.
Comment 2 Andrew Macleod 2020-11-20 18:54:34 UTC
I had to do some tweaking and manually inline it, but eventually I managed to get:

 <bb 4> :
  pszTmp_11 = str_8(D) + 2;
  goto <bb 6>; [INV]

  <bb 5> :
  pszTmp_13 = pszTmp_5 + 2;

  <bb 6> :
  # pszTmp_5 = PHI <pszTmp_11(4), pszTmp_13(5)>
  _1 = *pszTmp_5;
  if (_1 != 0)
    goto <bb 5>; [INV]
  else
    goto <bb 7>; [INV]

  <bb 7> :
  # pszTmp_16 = PHI <pszTmp_5(6)>
  _2 = pszTmp_16 - str_8(D);
  _3 = _2 /[ex] 2;
  len_12 = (unsigned int) _3;

<bb 8> :
  # len_4 = PHI <0(2), len_12(7), 0(3)>
  if (len_4 == 0)
    goto <bb 9>; [INV]
  else
    goto <bb 10>; [INV]


Nothing we can do right now.  Its possible that the relational work in the next release *might* be able to sort something out, but thats probably down the road.

We'll be able to record that
a) pszTmp_11 > str_8(D)
b) pszTmp_13 > pszTmp_5
c) when we see 
# pszTmp_5 = PHI <pszTmp_11(4), pszTmp_13(5)>
we might possibly be able to deduce that 
d) pszTmp_5 > str_8(D),  which then leads to
e) pszTmp_16 > str_8(D).

which means we could know that _2 > 0.

The tricky bit will be knowing that not only is _2 > 0, but its actually [2, +INF].  Initial plans for relations are not to track possible offsets, but it could be follow up work.

If we can know that, then we'd be able to calculate len_12 = [1, +INF/2].

With a non-zero range for len_12, then it would be possible for the threader to create 2 execution paths.. the zero and the non-zero path.

on the non-zero path , there would no longer need to be a reference to len_12, and dead code could conceivably eliminate the enter else block.

I'd keep this around as a reference, but I don't see this happening any time soon. If the threaders get a revamp next year... then maybe!