This issue occurs when the UIntType parameter is 64-bits, the platform being compiled for can't use 128-bit integers (for example, 32-bit x86), and the LCG parameters are chosen to meet certain requirements. An example of code that hit this error is this: #include <random> int main() { std::linear_congruential_engine<unsigned long long int, 864691128455135232ULL, 12347ULL, 4052555153018976267ULL> gen; gen(); } This compiles fine when __int128 is present, and fails when it isn't. When I compile this with the -m32 flag, it prints out a lot of errors. To summarize the problem, the error happens when determining which template specification of _Select_uint_least_t to use in random.h. When determining how to define operator() for the engine, it realizes that the A value is large enough to overflow the 64-bit result type, and that the precondition for Schrage's method isn't met (M % A > M / A). Because of this, it attempts to find a larger integer type to use for the computation using _Select_uint_least_t. Since the result type is already 64-bits, and the 128-bit integer isn't available, it ends up not finding a large enough integer type, and hits the static assert in the unspecified variant of _Select_uint_least_t. This isn't a simple issue to resolve. The static assert that gets hit even says "sorry, would be too much trouble for a slow result". However, as far as I can tell, this instantiation of linear_congruential_engine is valid in the standard, so it shouldn't result in an error. This instantiation doesn't produce any compile errors when I tried it with MSVC, Boost, and libc++'s implementation of linear_congruential_engine (though libc++ incorrectly uses Schrage's method). While a solution would be slow, it's probably needed for full standards compliance.
Created attachment 57345 [details] Use 64-bit integers to do 128-bit arithmetic This patch defines a custom type that implements the necessary 128-bit arithmetic for linear_congruential_engine<uint64_t, ...> without __int128. It's an order of magnitude slower than using __int128 natively on x86_64, but it gives the right results.
(In reply to Lewis Fox from comment #0) > libc++'s > implementation of linear_congruential_engine (though libc++ incorrectly uses > Schrage's method). I haven't checked what libc++ actually does, but it gives consistent results for -m32 and -m64. However, apart from the first one, its results don't match what libstdc++'s. Libstdc++ and Boost give the same results: 864691128455147579 20120904253298372 499698276788149646 1283859513473858339 2211358525164835637 1565794316594927048 433037752037085551 3876224293909804757 842334705173512733 614226679431558764 3068544072891686009 780734117187050918 1963664848033545542 3791874062716605653 366668634810437006 1377239172204792920 1808622055437870614 464217244571328665 2604268511046649463 2831603438105397950 3223716036697852745 784694016100218461 1453099037541982349 2259076772827085306 2590915097279144408 3831773502251465123 320081643915990134 2376324005742734684 2037587339862127157 893313058057432898 118956510249458009 973816490706440657 2982490375421836301 1236945357523402079 3905294429470053119 2304284986302062495 2454379621599061616 1553811809817066890 2904663527490058979 2904663527490058979 2904663527490058979 2904663527490058979 2904663527490058979 2904663527490058979 2904663527490058979 etc. Libc++ gets stuck oscillating between two values right away: 864691128455147579 1965622972376959002 4035225266123976763 1965622972376959002 4035225266123976763 1965622972376959002 4035225266123976763 1965622972376959002 4035225266123976763 1965622972376959002 4035225266123976763 1965622972376959002 4035225266123976763 1965622972376959002
Based on that, I'm not sure what libc++ does is actually better than what libstdc++ does today (i.e. refuse to compile if the results would be wrong). I think the patch above would make sense though. I'm sure it could be optimized, I was more concerned with getting the results right than with performance. The first loop in operator% could just be a single left shift by the difference in the number of leading zeros between __l and __x. And halve() should just be a right shift (and "halve" isn't a reserved name).
(In reply to Jonathan Wakely from comment #1) > Created attachment 57345 [details] > Use 64-bit integers to do 128-bit arithmetic > > This patch defines a custom type that implements the necessary 128-bit > arithmetic for linear_congruential_engine<uint64_t, ...> without __int128. > > It's an order of magnitude slower than using __int128 natively on x86_64, > but it gives the right results. if (__builtin_uaddll_overflow(__l._M_lo, __c, &__l._M_lo)) __l._M_hi++; Why not just __l._M_hi += __builtin_uaddll_overflow(__l._M_lo, __c, &__l._M_lo); and similarly for subtraction? Is the reason for using the clang compat builtins instead of __builtin_{add,sub,mul}_overflow the compatibility with clang versions which don't support these?
Created attachment 57346 [details] Improved operator%
(In reply to Jakub Jelinek from comment #4) > Why not just > __l._M_hi += __builtin_uaddll_overflow(__l._M_lo, __c, > &__l._M_lo); > and similarly for subtraction? No good reason - I'll change it. > Is the reason for using the clang compat builtins instead of > __builtin_{add,sub,mul}_overflow the compatibility with clang versions which > don't > support these? Again, no reason, I was just using the ones specific to the type because all the types are fixed. Thanks for the suggestions.
Created attachment 57347 [details] With Jakub's suggested changes
if (!__l._M_hi) { __l._M_lo %= __m; return __l; } auto __n = __l._M_hi ? __builtin_clzll(__l._M_hi) : 64 + __builtin_clzll(__l._M_lo); So, on the __n initializer you already know that __l._M_hi is non-zero, no need to test it again. Sure, the compiler will optimize it, but the shorter the method in the header the better.
Oops yes!
As discussed on IRC, probably the if (!__builtin_mul_overflow(__l._M_lo, __x, &__lo)) optimization isn't a good idea, because most likely the compiler will in that case expand that to a full 64x64->128 multiplication plus testing that the upper 64 bits are or aren't zero, so computing everything this function needs and doing it again if there is overflow, without possibility to query what it computed.
(In reply to Jakub Jelinek from comment #10) > As discussed on IRC, probably the > if (!__builtin_mul_overflow(__l._M_lo, __x, &__lo)) > optimization isn't a good idea, because most likely the compiler will in > that case > expand that to a full 64x64->128 multiplication plus testing that the upper > 64 bits are or aren't zero, so computing everything this function needs and > doing it again if there is overflow, without possibility to query what it > computed. Though, if it is common enough, one could try to optimize the __ll[0] == 0 && __xx[0] == 0 case, one can do then either 32x32->64 or 64x64->64 multiplication and be done with it. But if it is rare in random's usage, it would just make the code larger.
(In reply to Jonathan Wakely from comment #2) My original comment about libc++ was in reference to the LLVM bugzilla report #27839: https://bugs.llvm.org/show_bug.cgi?id=27839 It looks like the issue you discovered is LLVM bugzilla report #34206: https://bugs.llvm.org/show_bug.cgi?id=34206 It seems like since I made that comment here, libc++ has updated to fix the misuse of Schrage's algorithm (though, looking at the current source code, it still looks wrong to me), so it does mean my initial comment is a little out of date. Either way, though, this issue wasn't in comparison to libc++, but rather that libstdc++ seems to contradict the C++ standard. For reference, MSVC doesn't have a native 128-bit integer type, but still handles these correctly by using 64-bit integer arithmetic (though MSVC could still optimize their implementation for x86_64 using intrinsics if they wanted to). This is a bit of an edge case that I don't think most users will encounter, so performance is probably less important here than accuracy. I'd personally prioritize minimizing branches (i.e. improving simplicity) than optimizing the operand sizes for performance, but that's just my opinion.
(In reply to Jakub Jelinek from comment #11) > Though, if it is common enough, one could try to optimize the __ll[0] == 0 > && __xx[0] == 0 case, one can do then either 32x32->64 or 64x64->64 > multiplication and be done with it. But if it is rare in random's usage, it > would just make the code larger. We will only use this new type when the calculation a*(m-1)+c doesn't fit in 64 bits, and the input values should be uniformly distributed in [0,m) for a good choice of parameters (and for a bad choice of parameters, you have bigger problems than the calculation being slow!) For a small value of a and large value of c we would use this new type but the multiplication step would overflow infrequently, but I don't think that's a good choice of parameters and not worth optimizing for. So I think we should just use the full 64x64 multiplication unconditionally. For operator% the if (__l._M_hi == 0) branch is probably worth it, because the general case is so much more expensive.
(In reply to Lewis Fox from comment #12) > (In reply to Jonathan Wakely from comment #2) > > My original comment about libc++ was in reference to the LLVM bugzilla > report #27839: https://bugs.llvm.org/show_bug.cgi?id=27839 Thanks, that got copied to github as https://github.com/llvm/llvm-project/issues/28213 > It looks like the issue you discovered is LLVM bugzilla report #34206: > https://bugs.llvm.org/show_bug.cgi?id=34206 And that is now https://github.com/llvm/llvm-project/issues/33554 > It seems like since I made that comment here, libc++ has updated to fix the > misuse of Schrage's algorithm (though, looking at the current source code, > it still looks wrong to me), so it does mean my initial comment is a little > out of date. Unsurprising when it took me more than 5 years to look into it properly ;-) > This is a bit of an edge case that I don't think most users will encounter, > so performance is probably less important here than accuracy. 100% agreed > I'd personally > prioritize minimizing branches (i.e. improving simplicity) than optimizing > the operand sizes for performance, but that's just my opinion. Agreed again, for although as I said in comment 13 I think the extra branch in operator% might be worthwhile. Maybe with __builtin_expect(__l._M_hi == 0, 0)) as a branch prediction hint.
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:c9ce332b557bb95987d038d98ea929cdfd1eae1d commit r14-8998-gc9ce332b557bb95987d038d98ea929cdfd1eae1d Author: Jonathan Wakely <jwakely@redhat.com> Date: Wed Feb 7 11:31:10 2024 +0000 libstdc++: Use 128-bit arithmetic for std::linear_congruential_engine [PR87744] For 32-bit targets without __int128 we need to implement the LCG transition function by hand using 64-bit types. We can also slightly simplify the __mod function by using if-constexpr unconditionally, disabling -Wc++17-extensions warnings with diagnostic pragmas. libstdc++-v3/ChangeLog: PR libstdc++/87744 * include/bits/random.h [!__SIZEOF_INT128__] (_Select_uint_least_t): Define specialization for 64-bit generators with non-power-of-two modulus and large constants. (__mod): Use if constexpr unconditionally. * testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error line number. * testsuite/26_numerics/random/linear_congruential_engine/87744.cc: New test.
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:7f3d900684ad989164114f25eb46a33871c6533d commit r14-9028-g7f3d900684ad989164114f25eb46a33871c6533d Author: Jonathan Wakely <jwakely@redhat.com> Date: Wed Feb 7 11:31:10 2024 +0000 libstdc++: Fix FAIL: 26_numerics/random/pr60037-neg.cc [PR113931] PR libstdc++/87744 PR libstdc++/113931 libstdc++-v3/ChangeLog: * testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error line number.
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:c74131e77f1a6b7afe700d3526a8992dc9744b0c commit r14-9038-gc74131e77f1a6b7afe700d3526a8992dc9744b0c Author: Jonathan Wakely <jwakely@redhat.com> Date: Wed Feb 7 11:31:10 2024 +0000 libstdc++: Fix FAIL: 26_numerics/random/pr60037-neg.cc again [PR113961] PR libstdc++/87744 PR libstdc++/113961 libstdc++-v3/ChangeLog: * testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error line number.