Tweaked g++.dg/tree-ssa/loop-split-1.C: ``` #include <vector> #include <cmath> constexpr unsigned s = 100000000; void p() { std::vector<float> a, b, c; a.resize(s); b.resize(s); c.resize(s); for(unsigned i = 0; i < s; ++i) { if(i == 0) a[i] = b[i] * c[i]; else a[i] = (b[i] + c[i]) * c[i-1] * std::log(i); } } ``` Clang optimises this out to just ret.
clang inlines _M_realloc_append at -O2/-O3 while we don't even at -O3. In general not inlining makes sense since it is quite big and called infrequently. Maybe we can propagate some stuff to make it smaller for inliner in this special case. Without inlining it is hard to track what _M_realloc_append does to the array and optimize out paired allocations. ~/trunk-install-new5/bin/g++ -O3 lp.C -fdump-tree-all-details --param max-inline-insns-auto=500 yields to following optimized dump: void p () { unsigned int i; double _58; signed int _292; <bb 2> [local count: 566793954]: <bb 3> [local count: 495962352]: # i_81 = PHI <1(2), i_53(5)> _292 = (signed int) i_81; _58 = (double) _292; if (_58 u> 0.0) goto <bb 5>; [99.95%] else goto <bb 4>; [0.05%] <bb 4> [local count: 247978]: __builtin_log (_58); <bb 5> [local count: 495962352]: i_53 = i_81 + 1; if (i_53 != 100000000) goto <bb 3>; [97.84%] else goto <bb 6>; [2.16%] <bb 6> [local count: 10737416]: return; } so we keep code around because we do not eliminate log. -fno-math-errno solves that. Does std::log need to set errno?
One thing I should note is that adding -std=c++20 inlines _M_realloc_append and this gets optimized with -O3.
With -O3 -std=c++20 https://godbolt.org/z/3WKnn8rax we inline but still get stuck on loop calling log and modifying errno. Without -std=c++20 we reach --param max-inline-insns-auto. We need --param max-inline-insns-auto=55 to inline. Default is 30. IPA function summary for void std::vector<_Tp, _Alloc>::_M_default_append(size_type) [with _Tp = float; _Alloc = std::allocator<float>]/1347 inlinable global time: 38.660448 self size: 57 global size: 72 min size: 0 self stack: 0 global stack: 0 estimated growth:1 size:0.000000, time:0.000000 size:3.000000, time:2.000000, executed if:(not inlined) size:2.000000, time:2.000000, nonconst if:(op1 changed) size:0.500000, time:0.250000, executed if:(op0 not sra candidate) && (op1 != 0) && (not inlined), nonconst if:(op0 not sra candidate) && (op0[ref offset: 64] changed) && (op1 != 0) && (not inlined) size:0.500000, time:0.250000, executed if:(op0 not sra candidate) && (op1 != 0), nonconst if:(op0 not sra candidate) && (op0[ref offset: 64] changed) && (op1 != 0) size:0.500000, time:0.250000, executed if:(op0 not sra candidate) && (op1 != 0) && (not inlined), nonconst if:(op0[ref offset: 0] changed) && (op0 not sra candidate) && (op1 != 0) && (not inlined) size:0.500000, time:0.250000, executed if:(op0 not sra candidate) && (op1 != 0), nonconst if:(op0[ref offset: 0] changed) && (op0 not sra candidate) && (op1 != 0) size:5.000000, time:2.170000, executed if:(op1 != 0), nonconst if:(op0[ref offset: 64] changed || op0[ref offset: 0] changed) && (op1 != 0) size:0.500000, time:0.250000, executed if:(op0 not sra candidate) && (op1 != 0) && (not inlined), nonconst if:(op0[ref offset: 128] changed) && (op0 not sra candidate) && (op1 != 0) && (not inlined) size:0.500000, time:0.250000, executed if:(op0 not sra candidate) && (op1 != 0), nonconst if:(op0[ref offset: 128] changed) && (op0 not sra candidate) && (op1 != 0) size:2.000000, time:1.000000, executed if:(op1 != 0), nonconst if:(op0[ref offset: 64] changed || op0[ref offset: 128] changed) && (op1 != 0) size:2.000000, time:1.000000, executed if:(op1 != 0), nonconst if:(op1 changed || op0[ref offset: 64] changed || op0[ref offset: 128] changed) && (op1 != 0) size:8.000000, time:2.680000, executed if:(op1 != 0), nonconst if:(op1 changed || op0[ref offset: 64] changed || op0[ref offset: 0] changed) && (op1 != 0) size:10.000000, time:3.061500, executed if:(op1 != 0) size:2.500000, time:0.752500, executed if:(op0 not sra candidate) && (op1 != 0) && (not inlined) size:2.500000, time:0.752500, executed if:(op0 not sra candidate) && (op1 != 0) size:2.000000, time:0.670000, executed if:(op1 != 0), nonconst if:(op0[ref offset: 0] changed) && (op1 != 0) size:6.000000, time:1.500000, executed if:(op1 != 0), nonconst if:(op1 changed) && (op1 != 0) size:2.000000, time:0.330000, executed if:(op1,(# + 18446744073709551615) != 0) && (op1 != 0), nonconst if:(op1,(# + 18446744073709551615) != 0) && (op1 changed) && (op1 != 0) size:10.000000, time:11.670000, executed if:(op1,(# + 18446744073709551615) != 0) && (op1 != 0) calls: void operator delete(void*, std::size_t)/1478 function body not available freq:0.18 loop depth: 0 size: 3 time: 12 predicate: (op0[ref offset: 0] != 0B) && (op1 != 0) void* __builtin_memcpy(void*, const void*, long unsigned int)/1477 function body not available freq:0.14 loop depth: 0 size: 4 time: 13 predicate: (op1 != 0) static _ForwardIterator std::__uninitialized_default_n_1<true>::__uninit_default_n(_ForwardIterator, _Size) [with _ForwardIterator = float*; _Size = long unsigned int]/1480 inlined freq:0.34 Stack frame offset 0, callee self size 0 allocate.isra/1427 inlined freq:0.34 Stack frame offset 0, callee self size 0 long int __builtin_expect(long int, long int)/1473 function body not available freq:0.34 loop depth: 0 size: 0 time: 0 predicate: (op1 != 0) op1 is compile time invariant op1 points to local or readonly memory void __builtin_unreachable()/1471 unreachable freq:0.00 loop depth: 0 size: 0 time: 0 predicate: (false) void __builtin_unreachable()/1471 unreachable freq:0.00 loop depth: 0 size: 0 time: 0 predicate: (false) void* operator new(std::size_t)/1476 function body not available freq:0.30 loop depth: 0 size: 3 time: 12 predicate: (op1 != 0) void std::__throw_length_error(const char*)/1472 function body not available freq:0.00 loop depth: 0 size: 2 time: 11 predicate: (op1 != 0) op0 is compile time invariant static _ForwardIterator std::__uninitialized_default_n_1<true>::__uninit_default_n(_ForwardIterator, _Size) [with _ForwardIterator = float*; _Size = long unsigned int]/1480 inlined freq:0.16 Stack frame offset 0, callee self size 0 void __builtin_unreachable()/1471 function body not available freq:0.00 loop depth: 0 size: 0 time: 0 predicate: (op1 != 0) void __builtin_unreachable()/1471 function body not available freq:0.00 loop depth: 0 size: 0 time: 0 predicate: (op1 != 0)