Bug 87744 - Some valid instantiations of linear_congruential_engine produce compiler errors when __int128 isn't available
Summary: Some valid instantiations of linear_congruential_engine produce compiler erro...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 7.3.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: rejects-valid
Depends on:
Blocks:
 
Reported: 2018-10-25 05:52 UTC by Lewis Fox
Modified: 2024-02-16 19:12 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-10-25 00:00:00


Attachments
Use 64-bit integers to do 128-bit arithmetic (1.11 KB, patch)
2024-02-06 21:09 UTC, Jonathan Wakely
Details | Diff
Improved operator% (1.10 KB, patch)
2024-02-06 21:46 UTC, Jonathan Wakely
Details | Diff
With Jakub's suggested changes (2.74 KB, patch)
2024-02-06 21:51 UTC, Jonathan Wakely
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Lewis Fox 2018-10-25 05:52:07 UTC
This issue occurs when the UIntType parameter is 64-bits, the platform being compiled for can't use 128-bit integers (for example, 32-bit x86), and the LCG parameters are chosen to meet certain requirements. An example of code that hit this error is this:

#include <random>
int main() {
    std::linear_congruential_engine<unsigned long long int, 864691128455135232ULL, 12347ULL, 4052555153018976267ULL> gen;
    gen();
}

This compiles fine when __int128 is present, and fails when it isn't. When I compile this with the -m32 flag, it prints out a lot of errors.

To summarize the problem, the error happens when determining which template specification of _Select_uint_least_t to use in random.h. When determining how to define operator() for the engine, it realizes that the A value is large enough to overflow the 64-bit result type, and that the precondition for Schrage's method isn't met (M % A > M / A). Because of this, it attempts to find a larger integer type to use for the computation using _Select_uint_least_t. Since the result type is already 64-bits, and the 128-bit integer isn't available, it ends up not finding a large enough integer type, and hits the static assert in the unspecified variant of _Select_uint_least_t.

This isn't a simple issue to resolve. The static assert that gets hit even says "sorry, would be too much trouble for a slow result". However, as far as I can tell, this instantiation of linear_congruential_engine is valid in the standard, so it shouldn't result in an error. This instantiation doesn't produce any compile errors when I tried it with MSVC, Boost, and libc++'s implementation of linear_congruential_engine (though libc++ incorrectly uses Schrage's method). While a solution would be slow, it's probably needed for full standards compliance.
Comment 1 Jonathan Wakely 2024-02-06 21:09:39 UTC
Created attachment 57345 [details]
Use 64-bit integers to do 128-bit arithmetic

This patch defines a custom type that implements the necessary 128-bit arithmetic for linear_congruential_engine<uint64_t, ...> without __int128.

It's an order of magnitude slower than using __int128 natively on x86_64, but it gives the right results.
Comment 2 Jonathan Wakely 2024-02-06 21:21:55 UTC
(In reply to Lewis Fox from comment #0)
> libc++'s
> implementation of linear_congruential_engine (though libc++ incorrectly uses
> Schrage's method).

I haven't checked what libc++ actually does, but it gives consistent results for -m32 and -m64. However, apart from the first one, its results don't match what libstdc++'s.

Libstdc++ and Boost give the same results:

864691128455147579
20120904253298372
499698276788149646
1283859513473858339
2211358525164835637
1565794316594927048
433037752037085551
3876224293909804757
842334705173512733
614226679431558764
3068544072891686009
780734117187050918
1963664848033545542
3791874062716605653
366668634810437006
1377239172204792920
1808622055437870614
464217244571328665
2604268511046649463
2831603438105397950
3223716036697852745
784694016100218461
1453099037541982349
2259076772827085306
2590915097279144408
3831773502251465123
320081643915990134
2376324005742734684
2037587339862127157
893313058057432898
118956510249458009
973816490706440657
2982490375421836301
1236945357523402079
3905294429470053119
2304284986302062495
2454379621599061616
1553811809817066890
2904663527490058979
2904663527490058979
2904663527490058979
2904663527490058979
2904663527490058979
2904663527490058979
2904663527490058979
etc.

Libc++ gets stuck  oscillating between two values right away:

864691128455147579
1965622972376959002
4035225266123976763
1965622972376959002
4035225266123976763
1965622972376959002
4035225266123976763
1965622972376959002
4035225266123976763
1965622972376959002
4035225266123976763
1965622972376959002
4035225266123976763
1965622972376959002
Comment 3 Jonathan Wakely 2024-02-06 21:28:11 UTC
Based on that, I'm not sure what libc++ does is actually better than what libstdc++ does today (i.e. refuse to compile if the results would be wrong).

I think the patch above would make sense though. I'm sure it could be optimized, I was more concerned with getting the results right than with performance.

The first loop in operator% could just be a single left shift by the difference in the number of leading zeros between __l and __x. And halve() should just be a right shift (and "halve" isn't a reserved name).
Comment 4 Jakub Jelinek 2024-02-06 21:32:58 UTC
(In reply to Jonathan Wakely from comment #1)
> Created attachment 57345 [details]
> Use 64-bit integers to do 128-bit arithmetic
> 
> This patch defines a custom type that implements the necessary 128-bit
> arithmetic for linear_congruential_engine<uint64_t, ...> without __int128.
> 
> It's an order of magnitude slower than using __int128 natively on x86_64,
> but it gives the right results.

	    if (__builtin_uaddll_overflow(__l._M_lo, __c, &__l._M_lo))
	      __l._M_hi++;

Why not just
            __l._M_hi += __builtin_uaddll_overflow(__l._M_lo, __c, &__l._M_lo);
and similarly for subtraction?
Is the reason for using the clang compat builtins instead of __builtin_{add,sub,mul}_overflow the compatibility with clang versions which don't
support these?
Comment 5 Jonathan Wakely 2024-02-06 21:46:31 UTC
Created attachment 57346 [details]
Improved operator%
Comment 6 Jonathan Wakely 2024-02-06 21:48:17 UTC
(In reply to Jakub Jelinek from comment #4)
> Why not just
>             __l._M_hi += __builtin_uaddll_overflow(__l._M_lo, __c,
> &__l._M_lo);
> and similarly for subtraction?

No good reason - I'll change it.

> Is the reason for using the clang compat builtins instead of
> __builtin_{add,sub,mul}_overflow the compatibility with clang versions which
> don't
> support these?

Again, no reason, I was just using the ones specific to the type because all the types are fixed.

Thanks for the suggestions.
Comment 7 Jonathan Wakely 2024-02-06 21:51:24 UTC
Created attachment 57347 [details]
With Jakub's suggested changes
Comment 8 Jakub Jelinek 2024-02-06 21:51:51 UTC
	    if (!__l._M_hi)
	      {
		__l._M_lo %= __m;
		return __l;
	      }
	    auto __n = __l._M_hi ? __builtin_clzll(__l._M_hi)
		       : 64 + __builtin_clzll(__l._M_lo);
So, on the __n initializer you already know that __l._M_hi is non-zero, no need to test it again.  Sure, the compiler will optimize it, but the shorter the method in the header the better.
Comment 9 Jonathan Wakely 2024-02-06 22:04:22 UTC
Oops yes!
Comment 10 Jakub Jelinek 2024-02-06 22:12:15 UTC
As discussed on IRC, probably the
if (!__builtin_mul_overflow(__l._M_lo, __x, &__lo))
optimization isn't a good idea, because most likely the compiler will in that case
expand that to a full 64x64->128 multiplication plus testing that the upper 64 bits are or aren't zero, so computing everything this function needs and doing it again if there is overflow, without possibility to query what it computed.
Comment 11 Jakub Jelinek 2024-02-06 22:14:44 UTC
(In reply to Jakub Jelinek from comment #10)
> As discussed on IRC, probably the
> if (!__builtin_mul_overflow(__l._M_lo, __x, &__lo))
> optimization isn't a good idea, because most likely the compiler will in
> that case
> expand that to a full 64x64->128 multiplication plus testing that the upper
> 64 bits are or aren't zero, so computing everything this function needs and
> doing it again if there is overflow, without possibility to query what it
> computed.

Though, if it is common enough, one could try to optimize the __ll[0] == 0 && __xx[0] == 0 case, one can do then either 32x32->64 or 64x64->64 multiplication and be done with it.  But if it is rare in random's usage, it would just make the code larger.
Comment 12 Lewis Fox 2024-02-07 10:14:42 UTC
(In reply to Jonathan Wakely from comment #2)

My original comment about libc++ was in reference to the LLVM bugzilla report #27839: https://bugs.llvm.org/show_bug.cgi?id=27839

It looks like the issue you discovered is LLVM bugzilla report #34206: https://bugs.llvm.org/show_bug.cgi?id=34206

It seems like since I made that comment here, libc++ has updated to fix the misuse of Schrage's algorithm (though, looking at the current source code, it still looks wrong to me), so it does mean my initial comment is a little out of date.

Either way, though, this issue wasn't in comparison to libc++, but rather that libstdc++ seems to contradict the C++ standard. For reference, MSVC doesn't have a native 128-bit integer type, but still handles these correctly by using 64-bit integer arithmetic (though MSVC could still optimize their implementation for x86_64 using intrinsics if they wanted to).

This is a bit of an edge case that I don't think most users will encounter, so performance is probably less important here than accuracy. I'd personally prioritize minimizing branches (i.e. improving simplicity) than optimizing the operand sizes for performance, but that's just my opinion.
Comment 13 Jonathan Wakely 2024-02-07 10:38:41 UTC
(In reply to Jakub Jelinek from comment #11)
> Though, if it is common enough, one could try to optimize the __ll[0] == 0
> && __xx[0] == 0 case, one can do then either 32x32->64 or 64x64->64
> multiplication and be done with it.  But if it is rare in random's usage, it
> would just make the code larger.

We will only use this new type when the calculation a*(m-1)+c doesn't fit in 64 bits, and the input values should be uniformly distributed in [0,m) for a good choice of parameters (and for a bad choice of parameters, you have bigger problems than the calculation being slow!)

For a small value of a and large value of c we would use this new type but the multiplication step would overflow infrequently, but I don't think that's a good choice of parameters and not worth optimizing for.

So I think we should just use the full 64x64 multiplication unconditionally.

For operator% the if (__l._M_hi == 0) branch is probably worth it, because the general case is so much more expensive.
Comment 14 Jonathan Wakely 2024-02-07 10:55:32 UTC
(In reply to Lewis Fox from comment #12)
> (In reply to Jonathan Wakely from comment #2)
> 
> My original comment about libc++ was in reference to the LLVM bugzilla
> report #27839: https://bugs.llvm.org/show_bug.cgi?id=27839

Thanks, that got copied to github as https://github.com/llvm/llvm-project/issues/28213
 
> It looks like the issue you discovered is LLVM bugzilla report #34206:
> https://bugs.llvm.org/show_bug.cgi?id=34206

And that is now https://github.com/llvm/llvm-project/issues/33554
 
> It seems like since I made that comment here, libc++ has updated to fix the
> misuse of Schrage's algorithm (though, looking at the current source code,
> it still looks wrong to me), so it does mean my initial comment is a little
> out of date.

Unsurprising when it took me more than 5 years to look into it properly ;-)

> This is a bit of an edge case that I don't think most users will encounter,
> so performance is probably less important here than accuracy.

100% agreed

> I'd personally
> prioritize minimizing branches (i.e. improving simplicity) than optimizing
> the operand sizes for performance, but that's just my opinion.

Agreed again, for although as I said in comment 13 I think the extra branch in operator% might be worthwhile. Maybe with __builtin_expect(__l._M_hi == 0, 0)) as a branch prediction hint.
Comment 15 GCC Commits 2024-02-15 11:44:08 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:c9ce332b557bb95987d038d98ea929cdfd1eae1d

commit r14-8998-gc9ce332b557bb95987d038d98ea929cdfd1eae1d
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Feb 7 11:31:10 2024 +0000

    libstdc++: Use 128-bit arithmetic for std::linear_congruential_engine [PR87744]
    
    For 32-bit targets without __int128 we need to implement the LCG
    transition function by hand using 64-bit types.
    
    We can also slightly simplify the __mod function by using if-constexpr
    unconditionally, disabling -Wc++17-extensions warnings with diagnostic
    pragmas.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/87744
            * include/bits/random.h [!__SIZEOF_INT128__] (_Select_uint_least_t):
            Define specialization for 64-bit generators with
            non-power-of-two modulus and large constants.
            (__mod): Use if constexpr unconditionally.
            * testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
            line number.
            * testsuite/26_numerics/random/linear_congruential_engine/87744.cc:
            New test.
Comment 16 GCC Commits 2024-02-16 10:52:51 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:7f3d900684ad989164114f25eb46a33871c6533d

commit r14-9028-g7f3d900684ad989164114f25eb46a33871c6533d
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Feb 7 11:31:10 2024 +0000

    libstdc++: Fix FAIL: 26_numerics/random/pr60037-neg.cc [PR113931]
    
            PR libstdc++/87744
            PR libstdc++/113931
    
    libstdc++-v3/ChangeLog:
    
            * testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
            line number.
Comment 17 GCC Commits 2024-02-16 19:12:18 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:c74131e77f1a6b7afe700d3526a8992dc9744b0c

commit r14-9038-gc74131e77f1a6b7afe700d3526a8992dc9744b0c
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Feb 7 11:31:10 2024 +0000

    libstdc++: Fix FAIL: 26_numerics/random/pr60037-neg.cc again [PR113961]
    
            PR libstdc++/87744
            PR libstdc++/113961
    
    libstdc++-v3/ChangeLog:
    
            * testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
            line number.