[Bug c++/78180] New: Poor optimization of std::array on gcc 4.8/5.4/6.2 as compared to simple raw array
barry.revzin at gmail dot com
gcc-bugzilla@gcc.gnu.org
Tue Nov 1 21:00:00 GMT 2016
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78180
Bug ID: 78180
Summary: Poor optimization of std::array on gcc 4.8/5.4/6.2 as
compared to simple raw array
Product: gcc
Version: 6.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: barry.revzin at gmail dot com
Target Milestone: ---
Here is a complete benchmark comparing a bunch of simple operations on a
std::array<int64_t, 128> vs a int64_t[128]. I'm using
https://github.com/google/benchmark and compiling with -std=c++11 -O3
-D_GLIBCXX_USE_CXX11_ABI=0:
=============================================================
#include <array>
#include <benchmark/benchmark_api.h>
template <class C>
class Rolling
{
C times_{};
uint32_t idx_;
const uint32_t size_;
public:
Rolling(uint32_t size)
: idx_(0)
, size_(size)
{ }
void add(int64_t t)
{
times_[idx_] = t;
++idx_;
if (idx_ == size_) {
idx_ = 0;
}
}
bool exceeded(int64_t now, int64_t intv)
{
return now - times_[idx_] < intv;
}
};
template <class C>
void BM_Rolling(benchmark::State& state)
{
Rolling<C> r(100);
int64_t i = 0;
int64_t exc = 0;
while (state.KeepRunning()) {
for (int i = 0; i < state.range(0); ++i) {
r.add(i);
if (r.exceeded(i, 1000000)) {
benchmark::DoNotOptimize(++exc);
}
}
}
}
#define JOIN(...) __VA_ARGS__
BENCHMARK_TEMPLATE(BM_Rolling, int64_t[128])->Range(8, 8<<10);
BENCHMARK_TEMPLATE(BM_Rolling, JOIN(std::array<int64_t, 128>))->Range(8,
8<<10);
BENCHMARK_MAIN();
=============================================================
This yields the following performance numbers (similar across 4.8.2, 5.4.0, and
6.2.0):
Run on (16 X 3199.66 MHz CPU s)
2016-11-01 15:56:13
Benchmark Time CPU
Iterations
-------------------------------------------------------------------------------------
BM_Rolling<JOIN(std::array<int64_t, 128>)>/8 18 ns 18 ns
39568747
BM_Rolling<JOIN(std::array<int64_t, 128>)>/64 135 ns 134 ns
5218330
BM_Rolling<JOIN(std::array<int64_t, 128>)>/512 1084 ns 1031 ns
678795
BM_Rolling<JOIN(std::array<int64_t, 128>)>/4k 8221 ns 8185 ns
85583
BM_Rolling<JOIN(std::array<int64_t, 128>)>/8k 16975 ns 16520 ns
42752
BM_Rolling<int64_t[128]>/8 15 ns 15 ns
45940368
BM_Rolling<int64_t[128]>/64 112 ns 111 ns
6301196
BM_Rolling<int64_t[128]>/512 821 ns 817 ns
858168
BM_Rolling<int64_t[128]>/4k 6538 ns 6496 ns
108570
BM_Rolling<int64_t[128]>/8k 12957 ns 12902 ns
53582
That is a large performance gap between std::array and raw array, where I
wouldn't expect any. When compiling with clang, I don't see any gap at all
(though for both containers, the performance is significantly worse than
gcc's).
More information about the Gcc-bugs
mailing list