The value of std::experimental::memory_alignment appears to be too large for fixed-size vectors that are larger than the native vector length. For example, on both x86 AVX512 and ARM NEON, the alignment requirement for a 12-element double vector reported by libstdc++ is 128 bytes, even though there are no instructions that require such a huge alignment (the maximum is 512 bits/64 bytes for AVX512). https://godbolt.org/z/45hEWG14b #include <experimental/simd> namespace stdx = std::experimental; using simd12 = stdx::simd<double, stdx::simd_abi::fixed_size<12>>; static_assert(stdx::memory_alignment_v<simd12> <= 64); // static assertion failed: the comparison reduces to '(128 <= 64)' Looking at bits/simd.h, it appears that the alignment for fixed_size_simd<T, N> is computed simply as bit_ceil(sizeof(T) * N), which is very conservative. Having different alignment requirements for large SIMD vectors makes allocating buffers that are reused with different vector lengths quite tricky.
> Having different alignment requirements for large SIMD vectors makes allocating buffers that are reused with different vector lengths quite tricky. Not really. Considering operator new gets passed the alignment since c++14.
(In reply to Andrew Pinski from comment #1) > Not really. Considering operator new gets passed the alignment since c++14. s/c++14/c++17/
Thank you for taking the time to report this inefficiency. But this is working as intended. The design goal of the fixed_size ABI was an ABI-stable "this is never going to break on ABI boundaries" type. The only way to guarantee that without an oracle that can predict the future is to use the most conservative alignment. FWIW, I always thought that fixed_size is an interesting experiment and potentially an ABI tag someone at some point might want. But I don't think it should have been the prominent "I want a fixed number of elements" ABI that it is in the TS. This is going to be better with C++26 ... if we make it. :)
Thank you, I appreciate for the quick responses. > The design goal of the fixed_size ABI was an ABI-stable "this is never going to break on ABI boundaries" type. You're right, ABI stability is not something I considered. > Not really. Considering operator new gets passed the alignment since c++17. Indeed, but allocation is not the issue I was struggling with: I wanted to be able to have the simple requirement that "all array arguments should be aligned to the native SIMD size" for the user-facing APIs, and then decide on the optimal SIMD size in the underlying algorithm, as an implementation detail. The algorithm may encounter remainders that are not multiples of the native SIMD sizes, and I was using the fixed_size_simd type to handle those. As a work-around, I'll just have to implement the remainders manually using multiple smaller native SIMD types and/or masks, rather than using fixed_size_simd.
Wrt. working on a larger data set you might be interested in: https://github.com/mattkretz/vir-simd?tab=readme-ov-file#simd-execution-policy-p0350 For the problem you seem to describe, I like to have a native_simd-aligned array of scalars and then iterate over it using native_simd. If your algorithm allows, the simplest epilogue is allocation of some extra values (this allocation is free, because of alignment and how allocators work) and then simply process a few more inputs and ignore the outputs from the padding.