Bug 117638 - No loop splitting and bounds check not optimized out with -D_GLIBCXX_ASSERTIONS
Summary: No loop splitting and bounds check not optimized out with -D_GLIBCXX_ASSERTIONS
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 15.0
: P3 normal
Target Milestone: 14.3
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: std::vector
  Show dependency treegraph
 
Reported: 2024-11-17 10:28 UTC by Sam James
Modified: 2024-12-27 16:46 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-11-18 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sam James 2024-11-17 10:28:51 UTC
With a modified version of g++.dg/tree-ssa/loop-split-1.C (s/reserve/resize):
```
#include <vector>
#include <cmath>

constexpr unsigned s = 100000000;

int main()
{
    std::vector<float> a, b, c;
    a.resize(s);
    b.resize(s);
    c.resize(s);

    for(unsigned i = 0; i < s; ++i)
    {
        if(i == 0)
            a[i] = b[i] * c[i];
        else
            a[i] = (b[i] + c[i]) * c[i-1] * std::log(i);
    }
}
```

I see the following:
* With GCC 13 using -D_GLIBCXX_ASSERTIONS, we split the loop.
* With GCC 14 and trunk using -D_GLIBCXX_ASSERTIONS, we can't.
* We don't seem able to optimise out the bounds check with GCC 14 and trunk (and we do it on every iteration!)
Comment 1 Sam James 2024-11-17 10:35:34 UTC
(In reply to Sam James from comment #0)
> I see the following:
> * With GCC 13 using -D_GLIBCXX_ASSERTIONS, we split the loop.
> * With GCC 14 and trunk using -D_GLIBCXX_ASSERTIONS, we can't.
> * We don't seem able to optimise out the bounds check with GCC 14 and trunk
> (and we do it on every iteration!)

I think it's all the same thing (the inability to loop-split here).
Comment 2 Richard Biener 2024-11-18 08:51:27 UTC
Did the standard library change or loop splitting?  (there were a few correctness fixes there)
Comment 3 Sam James 2024-11-18 10:24:26 UTC
Ah, I made a mistake: with 14, we start to handle the non-assertions case.
Comment 4 Jan Hubicka 2024-12-27 16:46:33 UTC
Both with assertions or without we offline _M_default_append which would be better inlined. It is because main is known to be called once.

One difference is that non-assertion clobbers the vectors prior construction



   <bb 2> [local count: 10737416]:
-  MEM[(struct _Vector_impl_data *)&a] ={v} {CLOBBER(bob)};
-  MEM[(struct _Vector_impl_data *)&a]._M_finish = 0B;
-  MEM[(struct _Vector_impl_data *)&b] ={v} {CLOBBER(bob)};
-  MEM[(struct _Vector_impl_data *)&b]._M_finish = 0B;
-  MEM[(struct _Vector_impl_data *)&c] ={v} {CLOBBER(bob)};
-  MEM[(struct _Vector_impl_data *)&c]._M_finish = 0B;
   MEM <float *> [(struct vector *)&a] = 0B;
+  MEM <float *> [(struct vector *)&a + 8B] = 0B;
   MEM <float *> [(struct vector *)&a + 16B] = 0B;
   std::vector<float>::_M_default_append (&a, 100000000);
 
   <bb 3> [local count: 10737416]:
-  a$_M_start_113 = MEM <float *> [(struct vector *)&a];
-  a$_M_end_of_storage_114 = MEM <float *> [(struct vector *)&a + 16B];
+  a$_M_start_127 = MEM <float *> [(struct vector *)&a];
+  a$_M_finish_128 = MEM <float *> [(struct vector *)&a + 8B];
+  a$_M_end_of_storage_140 = MEM <float *> [(struct vector *)&a + 16B];
   MEM <float *> [(struct vector *)&b] = 0B;
+  MEM <float *> [(struct vector *)&b + 8B] = 0B;
   MEM <float *> [(struct vector *)&b + 16B] = 0B;
   std::vector<float>::_M_default_append (&b, 100000000);
   goto <bb 5>; [100.00%]

It is not clear to me why CLOBBER is gone...

However the main problem is that we do not know # of iterations of the loop.
With assertions enabled we know that loop will terminate once it reaches the vector size but we do not know its size which is computed as:

  std::vector<float>::_M_default_append (&c, 100000000);
  goto <bb 7>; [100.00%]
  
  <bb 6> [count: 0]:
<L13>:
  c$_M_start_188 = MEM <float *> [(struct vector *)&c];
  if (c$_M_start_188 != 0B)
    goto <bb 32>; [0.00%]
  else
    goto <bb 33>; [0.00%]

  c$_M_start_198 = MEM <float *> [(struct vector *)&c];
  c$_M_finish_199 = MEM <float *> [(struct vector *)&c + 8B];
  c$_M_end_of_storage_187 = MEM <float *> [(struct vector *)&c + 16B];
  _194 = b$_M_finish_174 - b$_M_start_97;
  __dif_195 = _194 /[ex] 4;

So I think we would need to either inline _M_realloc_append (bypasing called once heuristics and based on knowledge that vector is empty at the begining)

We also can not optimize the loop exit condition since we do now know that the vector does not point to s.