Bug 117639 - Modified loop-split-1.C doesn't recognise non-escaping std::vector<float>
Summary: Modified loop-split-1.C doesn't recognise non-escaping std::vector<float>
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 15.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: std::vector
  Show dependency treegraph
 
Reported: 2024-11-17 10:32 UTC by Sam James
Modified: 2024-12-27 16:00 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-11-18 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sam James 2024-11-17 10:32:24 UTC
Tweaked g++.dg/tree-ssa/loop-split-1.C:
```
#include <vector>
#include <cmath>

constexpr unsigned s = 100000000;

void p()
{
    std::vector<float> a, b, c;
    a.resize(s);
    b.resize(s);
    c.resize(s);

    for(unsigned i = 0; i < s; ++i)
    {
        if(i == 0)
            a[i] = b[i] * c[i];
        else
            a[i] = (b[i] + c[i]) * c[i-1] * std::log(i);
    }
}
```

Clang optimises this out to just ret.
Comment 1 Jan Hubicka 2024-11-18 12:41:36 UTC
clang inlines _M_realloc_append at -O2/-O3 while we don't even at -O3. In general not inlining makes sense since it is quite big and called infrequently. Maybe we can propagate some stuff to make it smaller for inliner in this special case.

Without inlining it is hard to track what _M_realloc_append does to the array and optimize out paired allocations.

~/trunk-install-new5/bin/g++ -O3 lp.C -fdump-tree-all-details --param max-inline-insns-auto=500

yields to following optimized dump:

void p ()
{
  unsigned int i;
  double _58;
  signed int _292;
        
  <bb 2> [local count: 566793954]:

  <bb 3> [local count: 495962352]:
  # i_81 = PHI <1(2), i_53(5)>
  _292 = (signed int) i_81;
  _58 = (double) _292;
  if (_58 u> 0.0)
    goto <bb 5>; [99.95%]
  else
    goto <bb 4>; [0.05%]
        
  <bb 4> [local count: 247978]:
  __builtin_log (_58);
        
  <bb 5> [local count: 495962352]:
  i_53 = i_81 + 1;
  if (i_53 != 100000000)
    goto <bb 3>; [97.84%]
  else
    goto <bb 6>; [2.16%]
  
  <bb 6> [local count: 10737416]:
  return;
    
}         

so we keep code around because we do not eliminate log. -fno-math-errno solves that. Does std::log need to set errno?
Comment 2 Andrew Pinski 2024-12-21 09:14:42 UTC
One thing I should note is that adding -std=c++20 inlines _M_realloc_append and this gets optimized with -O3.
Comment 3 Jan Hubicka 2024-12-27 16:00:38 UTC
With -O3 -std=c++20
https://godbolt.org/z/3WKnn8rax
we inline but still get stuck on loop calling log and modifying errno.

Without -std=c++20 we reach --param max-inline-insns-auto.  We need --param max-inline-insns-auto=55 to inline. Default is 30.

IPA function summary for void std::vector<_Tp, _Alloc>::_M_default_append(size_type) [with _Tp = float; _Alloc = std::allocator<float>]/1347 inlinable
  global time:     38.660448
  self size:       57
  global size:     72
  min size:       0
  self stack:      0
  global stack:    0
  estimated growth:1
    size:0.000000, time:0.000000
    size:3.000000, time:2.000000,  executed if:(not inlined)
    size:2.000000, time:2.000000,  nonconst if:(op1 changed)
    size:0.500000, time:0.250000,  executed if:(op0 not sra candidate) && (op1 != 0) && (not inlined),  nonconst if:(op0 not sra candidate) && (op0[ref offset: 64] changed) && (op1 != 0) && (not inlined)
    size:0.500000, time:0.250000,  executed if:(op0 not sra candidate) && (op1 != 0),  nonconst if:(op0 not sra candidate) && (op0[ref offset: 64] changed) && (op1 != 0)
    size:0.500000, time:0.250000,  executed if:(op0 not sra candidate) && (op1 != 0) && (not inlined),  nonconst if:(op0[ref offset: 0] changed) && (op0 not sra candidate) && (op1 != 0) && (not inlined)
    size:0.500000, time:0.250000,  executed if:(op0 not sra candidate) && (op1 != 0),  nonconst if:(op0[ref offset: 0] changed) && (op0 not sra candidate) && (op1 != 0)
    size:5.000000, time:2.170000,  executed if:(op1 != 0),  nonconst if:(op0[ref offset: 64] changed || op0[ref offset: 0] changed) && (op1 != 0)
    size:0.500000, time:0.250000,  executed if:(op0 not sra candidate) && (op1 != 0) && (not inlined),  nonconst if:(op0[ref offset: 128] changed) && (op0 not sra candidate) && (op1 != 0) && (not inlined)
    size:0.500000, time:0.250000,  executed if:(op0 not sra candidate) && (op1 != 0),  nonconst if:(op0[ref offset: 128] changed) && (op0 not sra candidate) && (op1 != 0)
    size:2.000000, time:1.000000,  executed if:(op1 != 0),  nonconst if:(op0[ref offset: 64] changed || op0[ref offset: 128] changed) && (op1 != 0)
    size:2.000000, time:1.000000,  executed if:(op1 != 0),  nonconst if:(op1 changed || op0[ref offset: 64] changed || op0[ref offset: 128] changed) && (op1 != 0)
    size:8.000000, time:2.680000,  executed if:(op1 != 0),  nonconst if:(op1 changed || op0[ref offset: 64] changed || op0[ref offset: 0] changed) && (op1 != 0)
    size:10.000000, time:3.061500,  executed if:(op1 != 0)
    size:2.500000, time:0.752500,  executed if:(op0 not sra candidate) && (op1 != 0) && (not inlined)
    size:2.500000, time:0.752500,  executed if:(op0 not sra candidate) && (op1 != 0)
    size:2.000000, time:0.670000,  executed if:(op1 != 0),  nonconst if:(op0[ref offset: 0] changed) && (op1 != 0)
    size:6.000000, time:1.500000,  executed if:(op1 != 0),  nonconst if:(op1 changed) && (op1 != 0)
    size:2.000000, time:0.330000,  executed if:(op1,(# + 18446744073709551615) != 0) && (op1 != 0),  nonconst if:(op1,(# + 18446744073709551615) != 0) && (op1 changed) && (op1 != 0)
    size:10.000000, time:11.670000,  executed if:(op1,(# + 18446744073709551615) != 0) && (op1 != 0)
  calls:
    void operator delete(void*, std::size_t)/1478 function body not available
      freq:0.18 loop depth: 0 size: 3 time: 12 predicate: (op0[ref offset: 0] != 0B) && (op1 != 0)
    void* __builtin_memcpy(void*, const void*, long unsigned int)/1477 function body not available
      freq:0.14 loop depth: 0 size: 4 time: 13 predicate: (op1 != 0)
    static _ForwardIterator std::__uninitialized_default_n_1<true>::__uninit_default_n(_ForwardIterator, _Size) [with _ForwardIterator = float*; _Size = long unsigned int]/1480 inlined
      freq:0.34
      Stack frame offset 0, callee self size 0
    allocate.isra/1427 inlined
      freq:0.34
      Stack frame offset 0, callee self size 0
      long int __builtin_expect(long int, long int)/1473 function body not available
        freq:0.34 loop depth: 0 size: 0 time:  0 predicate: (op1 != 0)
         op1 is compile time invariant
         op1 points to local or readonly memory
      void __builtin_unreachable()/1471 unreachable
        freq:0.00 loop depth: 0 size: 0 time:  0 predicate: (false)
      void __builtin_unreachable()/1471 unreachable
        freq:0.00 loop depth: 0 size: 0 time:  0 predicate: (false)
      void* operator new(std::size_t)/1476 function body not available
        freq:0.30 loop depth: 0 size: 3 time: 12 predicate: (op1 != 0)
    void std::__throw_length_error(const char*)/1472 function body not available
      freq:0.00 loop depth: 0 size: 2 time: 11 predicate: (op1 != 0)
       op0 is compile time invariant
    static _ForwardIterator std::__uninitialized_default_n_1<true>::__uninit_default_n(_ForwardIterator, _Size) [with _ForwardIterator = float*; _Size = long unsigned int]/1480 inlined
      freq:0.16
      Stack frame offset 0, callee self size 0
    void __builtin_unreachable()/1471 function body not available
      freq:0.00 loop depth: 0 size: 0 time:  0 predicate: (op1 != 0)
    void __builtin_unreachable()/1471 function body not available
      freq:0.00 loop depth: 0 size: 0 time:  0 predicate: (op1 != 0)