I'd like to see Clang's _ExtInt(N) arbitrary-precision fixed-width integers (https://clang.llvm.org/docs/LanguageExtensions.html#extended-integer-types) in GCC. I have in mind at least a scenario where they would help: - Bignum They are much simpler than using GMP. I have a project of mine where I ended up doing my own type made of an array: 'typedef uint64_t uint512_a[8];'. Having the ability to handle 'typedef unsigned _ExtInt(512) uint512;' as easily as __int128 is great.
This also triggers the following wish: 'widthof(t)', which would be equivalent to 'sizeof(t) * CHAR_BIT' for normal types, but would be equal to N in the case of _ExtInt(N). It could also be used to mean the exact bit width of a bitfield. This is helpful to have a generic TYPE_MAX(t) macro (yet another wish, although this would be for glibc once widthof() is in GCC), which could be implemented as: #define ISSIGNED(t) (((t) - 1) < 0) #define __STYPE_MAX(t) (((((t) 1 << widthof(t) - 2)) - 1) << 1) + 1) #define __UTYPE_MAX(t) ((t) -1) #define TYPE_MAX(t) (ISSIGNED(t) ? __STYPE_MAX(t) : __UTYPE_MAX(t)) #define TYPE_MIN(t) ((t) ~TYPE_MAX(t)) These macros could be used for *any* integer type, including _ExtInt() and bitfields, if the compiler provided widthof().
There was a missing comma. Fix: #define __STYPE_MAX(t) (((((t) 1 << (widthof(t) - 2)) - 1) << 1) + 1)
D'oh. s/comma/parenthesis/
We should implement whatever is standardized or close to being standardized, rather than the clang extension. See http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf
Thanks for that info. It's nice to see the standard is considering that. Yes, we should add what the standard is going to add, so I'd wait to see what the standard decides in the end. Cheers, Alex
It's _BitInt in C2X, see N2763 for the final wording. The operator for computing the width of a type, present in earlier drafts, was removed to get a minimal version into C2X; I don't know if it will be re-proposed in time for C2X (as it's a previously proposed feature, the deadline would be the January meeting of WG14; the deadline two weeks ago was for *initial* proposal of new features). Note the importance of agreeing the ABI with the relevant ABI working groups! And that the ABI needs to be clear about whether padding bits have defined values (in the in-memory representation, and in argument passing and return). A proposed x86_64 ABI is on a branch of the ABI repository (not yet on master). I'm not aware of proposals for the ABI for other architectures. As a required C2X feature, we'll need some default for the ABI for architectures that haven't made another choice (which might closely follow what x86_64 does, for example), but we should try to get architecture maintainers all to explicitly consider the ABI issue, and to work with ABI maintainers / other implementations where applicable to agree what the ABI should be on that architecture. Anyone interested in this feature can work with ABI working groups for various architectures *now* to agree on what the ABI should be, they don't need to be working on an implementation. (For GCC, we also need an ABI for _Complex _BitInt, unless we disallow that, though complex integers are outside the scope of C2X.) There is a proposal (N2858) for printf/scanf support for _BitInt. Implementing that in glibc (or any other C library) will require architecture-specific code to read the values of arguments, given that (a) we can't call va_arg with a type of run-time-determined size, unless there is a pretty small limit on the sizes we support with _BitInt and (b) we might want to support that feature in glibc well before it's desirable to depend on a GCC with _BitInt support for building glibc. ABIs like the x86_64 one (treating the argument like a standard integer type or a struct) should be OK for that purpose, since once the size gets sufficiently large (depending on the ABI), a repeated sequence of va_arg calls should suffice to read the structure elements (or a va_arg call for a pointer type, when passed by reference).
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2763.pdf
.
Is there any proposal regarding suffices for constants? I didn't see it in the main proposal for _BitInt(). I mean something like 1u8 to create a constant of type unsigned _BitInt(8). --- @Joseph Regarding your request for help, I didn't answer, because I didn't consider myself qualified to do that. However, I would love to help if I can, so if you point me to something I could help, I'll be happy to try :)
N2775 (hopefully to be considered at the Jan/Feb 2022 WG14 meeting) is the proposal for constant suffixes.
The x86-64 psABI has been changed for this: https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/8ca45392570e96920f8a15d903d6122f6d263cd0 but the state of the padding bits isn't mentioned there anywhere. Also, not sure I understand the \texttt{_BitInt(N)} types are byte-aligned to the next greatest power-of-2 up to 64 bits. sentence because for N <= 64 there are different rules that apply (size and alignment same as smallest standard integral type that can contain them) and so IMHO it should just say that the N > 64 bit-precise types are 64-bit aligned.
(In reply to Jakub Jelinek from comment #11) > The x86-64 psABI has been changed for this: > https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/ > 8ca45392570e96920f8a15d903d6122f6d263cd0 > but the state of the padding bits isn't mentioned there anywhere. It was done on purpose. Here are discussions about padding: https://reviews.llvm.org/D108643 > Also, not sure I understand the > \texttt{_BitInt(N)} types are byte-aligned to the next greatest power-of-2 > up to 64 bits. > sentence because for N <= 64 there are different rules that apply (size and > alignment > same as smallest standard integral type that can contain them) and so IMHO > it should just > say that the N > 64 bit-precise types are 64-bit aligned. It sounds reasonable.
On Tue, 25 Oct 2022, jakub at gcc dot gnu.org via Gcc-bugs wrote: > The x86-64 psABI has been changed for this: > https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/8ca45392570e96920f8a15d903d6122f6d263cd0 > but the state of the padding bits isn't mentioned there anywhere. I think the words "The value of the unused bits beyond the width of the \texttt{_BitInt(N)} value but within the size of the \texttt{_BitInt(N)} are unspecified when stored in memory or register." are what deals with padding (both padding within sizeof(_BitInt(N)) bytes, and bytes within a register or stack slot used for argument passing / return but outside sizeof(_BitInt(N)) bytes). (Of course different architectures might make different choices for how to handle padding.) I filed https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/300 in July to request an ABI for _BitInt on RISC-V. I've just now filed https://github.com/ARM-software/abi-aa/issues/175 to request such an ABI for both 32-bit and 64-bit Arm, and https://gitlab.com/x86-psABIs/i386-ABI/-/issues/5 to request such an ABI for 32-bit x86. I don't know if there are other psABIs with public issue trackers where such issues can be filed (but we'll need some sensible default anyway for architectures where we can't get an ABI properly specified in an upstream-maintained ABI document).
(In reply to joseph@codesourcery.com from comment #13) > https://gitlab.com/x86-psABIs/i386-ABI/-/issues/5 to request such an ABI > for 32-bit x86. I don't know if there are other psABIs with public issue > trackers where such issues can be filed (but we'll need some sensible > default anyway for architectures where we can't get an ABI properly > specified in an upstream-maintained ABI document). ia32 psABI will follow x86-64 psABI.
PowerPC I think does, not sure about s390.
(In reply to Jakub Jelinek from comment #15) > PowerPC I think does, not sure about s390. Does what?
See: https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/38
(In reply to Segher Boessenkool from comment #16) > (In reply to Jakub Jelinek from comment #15) > > PowerPC I think does, not sure about s390. > > Does what? Have a public place to submit issues against the powerpc abis.
(In reply to Segher Boessenkool from comment #16) > (In reply to Jakub Jelinek from comment #15) > > PowerPC I think does, not sure about s390. > > Does what? Published psABI which ought to specify how to pass/return _BitInt(N) and unsigned _BitInt(N).
(In reply to Andrew Pinski from comment #18) > (In reply to Segher Boessenkool from comment #16) > > (In reply to Jakub Jelinek from comment #15) > > > PowerPC I think does, not sure about s390. > > > > Does what? > > Have a public place to submit issues against the powerpc abis. Only the ELFv2 ABI really (it's on github). The rest doesn't have (public) maintained documents at all.
(In reply to Jakub Jelinek from comment #19) > (In reply to Segher Boessenkool from comment #16) > > (In reply to Jakub Jelinek from comment #15) > > > PowerPC I think does, not sure about s390. > > > > Does what? > > Published psABI which ought to specify how to pass/return _BitInt(N) and > unsigned _BitInt(N). psABI is an x86 thing? But there are various ABIs for PowerPC that have public documentation, six or so, and GCC has support for most of those. None of them are "processor specific" (most are OS specific, instead), and they differ in very fundamental things, in places. They are much related as well of course, either because there is an obvious choice, or history. Many of those ABIs have not seen updates for decades, and are unlikely to anymore. OTOH the GCC support for them has been updated over time, there often is only one sane choice anyway. We'll make decisions on what ELFv2 will do for _Bitint when it is closer in time than it is now. The only interesting choice is whether values in memory have undefined bits -- and they likely should, simply because all other padding bits are undefined as well.
(In reply to Jakub Jelinek from comment #15) > PowerPC I think does, not sure about s390. For s390x see here: https://github.com/IBM/s390x-abi
Seems LLVM currently only supports _BitInt up to 128, which is kind of useless for users, those sizes can be easily handled as bitfields and performing normal arithmetics on them. As for implementation, I'd like to brainstorm about it a little bit. I'd say we want a new tree code for it, say BITINT_TYPE. TYPE_PRECISION unfortunately is only 10-bit, that is not enough, so it would need the full precision to be specified somewhere else. And have targetm specify the ABI details (size of a limb (which would need to be exposed to libgcc with -fbuilding-libgcc), unless it is everywhere the same whether the limbs are least significant to most significant or vice versa, and whether the highest limb is sign/zero extended or unspecified beyond the precision. We'll need to handle the wide constants somehow, but we have a problem with wide ints that widest_int is not wide enough to handle arbitrarily long constants. Shall the type be a GIMPLE reg type? I assume for _BitInt <= 128 (or when TImode isn't supported <= 64) we just want to keep the new type on the function parameter/return value boundaries and use INTEGER_TYPEs from say gimplification. What about the large ones? Say for arbitrary size generic vectors we keep them in SSA form until late (generic vector lowering) and at that point lower, perhaps we could do the same for _BitInt? The unary as well as most of binary operations can be handled by simple loops over extraction of limbs from the large number, then there is multiplication and division/modulo. I think the latter is why LLVM restricts it to 128 bits right now, https://gcc.gnu.org/pipermail/gcc/2022-May/thread.html#238657 was an proposal from the LLVM side but I don't see it being actually further developed and don't see it on LLVM trunk. I wonder if for these libgcc APIs (and, is just __divmod/__udivmod enough, or do we want also multiplication, or for -Os purposes also other APIs?) it wouldn't be better to have more GMP/mpn like APIs where we don't specify number of limbs like in the above thread, but number of bits and perhaps don't specify it just for one argument but for multiple, so that we can then for the lowering match sign/zero extensions of the arguments and can handle say _BitInt(2048) / _BitInt(16) efficiently. Thoughts on this?
(In reply to Jakub Jelinek from comment #23) > What about the large ones? Say for arbitrary size generic vectors we keep > them in SSA form until late (generic vector lowering) and at that point > lower, perhaps we could do the same for _BitInt? The unary as well as most > of binary operations can be handled by simple loops over extraction of > limbs from the large number, then there is multiplication and > division/modulo. I think the latter is why LLVM restricts it to 128 bits > right now, Right. > https://gcc.gnu.org/pipermail/gcc/2022-May/thread.html#238657 > was an proposal from the LLVM side but I don't see it being actually further > developed and don't see it on LLVM trunk. I think work on it stalled after that thread. See also https://discourse.llvm.org/t/rfc-add-support-for-division-of-large-bitint-builtins-selectiondag-globalisel-clang/60329/
On Wed, 26 Oct 2022, jakub at gcc dot gnu.org via Gcc-bugs wrote: > Seems LLVM currently only supports _BitInt up to 128, which is kind of useless > for users, those sizes can be easily handled as bitfields and performing normal > arithmetics on them. Well, it would be useful for users of 32-bit targets who want 128-bit arithmetic, since we only support __int128 for 64-bit targets. > As for implementation, I'd like to brainstorm about it a little bit. > I'd say we want a new tree code for it, say BITINT_TYPE. OK. The signed and unsigned types of each precision do need to be distinguished from all the existing kinds of integer types (including the ones used for bit-fields: _BitInt types aren't subject to integer promotions, whereas bit-fields narrower than int are). In general the types operate like integer types (in terms of allowed operations etc.) so INTEGRAL_TYPE_P would be true for them. The main difference at front-end level is the lack of integer promotions, so that arithmetic can be carried out directly on narrower-than-int operands (but a bit-field declared with a _BitInt type gets promoted to that _BitInt type, e.g. unsigned _BitInt(7):2 acts as unsigned _BitInt(7) in arithmetic). Unlike the bit-field types, there's no such thing as a signed _BitInt(1); signed bit-precise integer types must havet least two bits. > TYPE_PRECISION unfortunately is only 10-bit, that is not enough, so it > would need the full precision to be specified somewhere else. That may complicate things because of code expecting TYPE_PRECISION to be meaningful for all integer types. But that could be addressed without needing to review every use of TYPE_PRECISION by e.g. changing TYPE_PRECISION to check wherever the _BitInt precision is specified, and instead using e.g. TYPE_RAW_PRECISION for direct access to the tree field (so only lvalue uses of TYPE_PRECISION would then need updating, other accesses would automatically get the full precision). > And have targetm specify the ABI > details (size of a limb (which would need to be exposed to libgcc with > -fbuilding-libgcc), unless it is everywhere the same whether the limbs are > least significant to most significant or vice versa, and whether the highest > limb is sign/zero extended or unspecified beyond the precision. I haven't seen an ABI specified for any architecture supporting big-endian yet, but I'd tend to expect such architectures to use big-endian ordering for the _BitInt representation to be consistent with existing integer types. > What about the large ones? I think we can at least slightly simplify things by assuming for now _BitInt multiplication / division / modulo are unlikely to be used much for arguments large enough that Karatsuba or asymptotically faster algorithms become relevant; that is, that naive quadratic-time algorithms are sufficient for those operations.
Some random comments. I wouldn't go with a new tree code, given semantics are INTEGER_TYPE it should be an INTEGER_TYPE. The TYPE_PRECISION issue is real - we have 16 spare bits in tree_type_common so we could possibly afford to make it 16 bits. Does the C standard limit the number of bits? Does it allow implementation defined limits? As of SSA representation and "lowering" this feels much like Middle-End Array Expressions in the end. I agree that first and foremost we should have the types as registers but then we can simply lower early to a representation supported by the target? AKA make _BitInt(199) intfast_t[n] with appropriate 'n' and lower all accesses, doing arithmetic either via builtins or internal functions on the whole object. Constants are tricky indeed but I suppose there's no way to write a 199 bit integer constant in source? We can always resort to constants of the intfast_t[n] representation (aka a CTOR). That said, if C allows us to limit to 128bits then let's do that for now. 32bit targets will still see all the complication when we give that a stab.
(In reply to Richard Biener from comment #26) > Does the C standard limit the number of bits? Does it allow > implementation defined limits? The latter. limits.h defines BITINT_MAXWIDTH, which must be at least as large as number of bits in unsigned long long. AFAIK LLVM plans 8388608 maximum (but due to the missing library support uses 128 as maximum right now). > Constants are tricky indeed but I suppose there's no way to write a > 199 bit integer constant in source? We can always resort to constants > of the intfast_t[n] representation (aka a CTOR). One can specify even very large constants in the source. 123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789uwb will be _BitInt with the minimum number of bits to store the above unsigned constant. > That said, if C allows us to limit to 128bits then let's do that for now. > 32bit targets will still see all the complication when we give that a stab. I'm afraid once we define BITINT_MAXWIDTH, it will become part of the ABI, so we can't increase it afterwards. Anyway, I'm afraid we probably don't have enough time to implement this properly in stage1, so might need to target GCC 14 with it. Unless somebody spends on it the remaining 2 weeks full time.
On Fri, 28 Oct 2022, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989 > > --- Comment #27 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > (In reply to Richard Biener from comment #26) > > Does the C standard limit the number of bits? Does it allow > > implementation defined limits? > > The latter. limits.h defines BITINT_MAXWIDTH, which must be at least as large > as number of bits in unsigned long long. AFAIK LLVM plans 8388608 maximum (but > due to the missing library support uses 128 as maximum right now). > > > Constants are tricky indeed but I suppose there's no way to write a > > 199 bit integer constant in source? We can always resort to constants > > of the intfast_t[n] representation (aka a CTOR). > > One can specify even very large constants in the source. > 123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789uwb > will be _BitInt with the minimum number of bits to store the above unsigned > constant. > > > That said, if C allows us to limit to 128bits then let's do that for now. > > 32bit targets will still see all the complication when we give that a stab. > > I'm afraid once we define BITINT_MAXWIDTH, it will become part of the ABI, so > we can't increase it afterwards. Quite likely yes (OTOH __BIGGEST_ALIGNMENT__ changed as well). That also means BITINT_MAXWIDTH should eventually be decided by the ABI groups? I also can hardly see any use for very big N other than "oh, cool". I mean, we don't have _Float(N) either for N == 65000 even though what would be cool as well. > Anyway, I'm afraid we probably don't have enough time to implement this > properly in stage1, so might need to target GCC 14 with it. Unless somebody > spends on it > the remaining 2 weeks full time. It's absolutely a GCC 14 task given the ABI and library issue.
Hi! On 10/28/22 12:51, rguenther at suse dot de wrote: > Quite likely yes (OTOH __BIGGEST_ALIGNMENT__ changed as well). That > also means BITINT_MAXWIDTH should eventually be decided by the ABI > groups? > > I also can hardly see any use for very big N other than "oh, cool". I > mean, we don't have _Float(N) either for N == 65000 even though what > would be cool as well. I do have a use. Okay, I don't need 8M bits, but 1k is something that would help me. Basically, it's a transparent bignum library, for which I can use most standard C features. BTW, it would also be nice if stdc_count_ones(3) would be implemented to support very wide _BitInt()s as an extension (C23 only guarantees support for _BitInt()s that match a standard or extended type). I have some program that works with matrices of 512x512, represented as arrays of 512 members of uint64_t[8], and it popcounts rows, which now means looping over an array of uint64_t[8] and using the builtin popcount. And I'm not sure if I could still optimize it a little bit more. If I could just call the type-generic stdc_count_ones(), and know that the implementation has written a quite optimal loop, that would be great (both for simplicity and performance). Cheers, Alex > >> Anyway, I'm afraid we probably don't have enough time to implement this >> properly in stage1, so might need to target GCC 14 with it. Unless somebody >> spends on it >> the remaining 2 weeks full time. > > It's absolutely a GCC 14 task given the ABI and library issue. >
I have an use case until 1k except I don't need division. It will in handy while translating P4 language (https://p4.org/p4-spec/docs/P4-16-v-1.2.3.html) to C. P4 supports any bit size you want and there are some uses for > 128 for crypto; usually just a storage area for the key at that point.
On Fri, 28 Oct 2022, rguenth at gcc dot gnu.org via Gcc-bugs wrote: > I wouldn't go with a new tree code, given semantics are INTEGER_TYPE it should > be an INTEGER_TYPE. Implementation note in that case: bit-precise integer types aren't allowed as underlying types for enums, so the code in c-parser.cc:c_parser_enum_specifier checking underlying types: else if (TREE_CODE (specs->type) != INTEGER_TYPE && TREE_CODE (specs->type) != BOOLEAN_TYPE) { error_at (enum_loc, "invalid %<enum%> underlying type"); would then need to check that the type isn't a bit-precise type.
On Fri, 28 Oct 2022, jakub at gcc dot gnu.org via Gcc-bugs wrote: > > That said, if C allows us to limit to 128bits then let's do that for now. > > 32bit targets will still see all the complication when we give that a stab. > > I'm afraid once we define BITINT_MAXWIDTH, it will become part of the ABI, so > we can't increase it afterwards. I don't think it's part of the ABI; I think it's always OK to increase BITINT_MAXWIDTH, as long as the wider types don't need more alignment than the previous choice of max_align_t. Thus, starting with a 128-bit limit (or indeed a 64-bit limit on 32-bit platforms, so that all the types fix within existing modes supported for arithmetic), and adding support for wider _BitInt later, would be a reasonable thing to do. (You still have ABI considerations even with such a limit: apart from the padding question, on x86_64 the ABI says _BitInt(128) is 64-bit aligned but __int128 is 128-bit aligned.) > Anyway, I'm afraid we probably don't have enough time to implement this > properly in stage1, so might need to target GCC 14 with it. Unless somebody > spends on it > the remaining 2 weeks full time. I think https://gcc.gnu.org/pipermail/gcc/2022-October/239704.html is still current as a list of C2x language features likely not to make it into GCC 13. (I hope to get auto and constexpr done in the next two weeks, and the other C2x language features not on that list are done.)
(In reply to joseph@codesourcery.com from comment #32) > On Fri, 28 Oct 2022, jakub at gcc dot gnu.org via Gcc-bugs wrote: > > > > That said, if C allows us to limit to 128bits then let's do that for now. > > > 32bit targets will still see all the complication when we give that a stab. > > > > I'm afraid once we define BITINT_MAXWIDTH, it will become part of the ABI, so > > we can't increase it afterwards. > > I don't think it's part of the ABI; I think it's always OK to increase > BITINT_MAXWIDTH, as long as the wider types don't need more alignment than > the previous choice of max_align_t. It's not part of the ABI until people put _BitInt(BITINT_MAXWIDTH) on ABI boundaries of their libraries. If a ridiculously large BITINT_MAXWIDTH does nothing more than discourages usages of _BitInt(BITINT_MAXWIDTH) in general, than that's already great. We don't need an other intmax. Also I don't want to think about the max N for _BitInt(N), similarly how I don't want to think about the max N for int[N]. There might be implementation limits, but it should be high enough so I don't have to think about those for everyday coding. > Thus, starting with a 128-bit limit (or indeed a 64-bit limit on 32-bit > platforms, so that all the types fix within existing modes supported for > arithmetic), and adding support for wider _BitInt later, would be a > reasonable thing to do. I disagree. > (You still have ABI considerations even with such a limit: apart from the > padding question, on x86_64 the ABI says _BitInt(128) is 64-bit aligned > but __int128 is 128-bit aligned.) > > > Anyway, I'm afraid we probably don't have enough time to implement this > > properly in stage1, so might need to target GCC 14 with it. Unless somebody > > spends on it > > the remaining 2 weeks full time. > > I think https://gcc.gnu.org/pipermail/gcc/2022-October/239704.html is > still current as a list of C2x language features likely not to make it > into GCC 13. (I hope to get auto and constexpr done in the next two > weeks, and the other C2x language features not on that list are done.)
I am currently using clangs support for up to 256 bit integers for crypto related use cases and also non-power of 2 integers such as 160 bits. These are not just used as storage, we are performing integer math on them and using the __builtin_checked family of functions. I understand that the standard family of checked functions that replace these builtin functions will be used instead when implemented on clang. Limiting this to 128 bit, while being standard complaint, would not allow us to compile on GCC. Thanks
Created attachment 55055 [details] gcc14-set-precision.patch Untested preparation patch which prepares fo the https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989#c25 idea of keeping 16-bit precision for all types but the new bit-precise integer types and 32-bit precision. Unfortunately it isn't just starting to use SET_TYPE_PRECISION when it is used as an lvalue, but unfortunately the current TYPE_PRECISION definition which is a unsigned:16 non-static data member for -Wsign-compare acts as either signed or unsigned int and no warning is emitted, while even if the new larger precision in some types was unsigned:31, using those two options in a conditional leads to -Wsign-compare warnings because all of sudden the macro is considered to be either int or unsigned depending on how exactly it is defined. There are more -Wsign-compare warnings if TYPE_PRECISION is signed int than when it is unsigned int, so I want to implement the latter and this patch also adjusts all spots I've noticed to avoid the -Wsign-compare warnings. Precision is never negative...
Created attachment 55056 [details] gcc14-bitint-wip.patch Just WIP on the top of the above patch, which does parsing of the _BitInt type specifier in C and introduces BITINT_TYPE (I'm afraid we can't use INTEGER_TYPE for that, both because it can have different calling/returning convention in different ABIs and because we need more than 16-bit precision for it as well), but still doesn't use it (right where it would create it stops for now and pretends it is integer). I've added also wb/WB suffix parsing on the libcpp side, but that is where I stopped today. Obviously for CPP_N_BITINT we need different interpretation of the number because cpp_interpret_integer can handle at most 128-bit integers (and of course even for the integers that fit into 128-bit with wb/WB suffixes we also want to use the right type; but I guess we can use INTEGER_CSTs for them). I'm afraid we'll need some other TREE_CODE for bit-precise integer constants which don't fit into widest_int (perhaps better for all that don't fit into 128 bits), because the amount of code that assumes wi::to_widest works on INTEGER_CSTs is huge. As I said earlier, I think something during gimplification or soon after it could remap small _BitInts (up to 128-bit resp. 64-bit when TImode isn't supported) to normal integral types except on the function boundaries (where ABI conventions can result in different rules for them), but probably we can't make INTEGER_TYPE <-> BITINT_TYPE conversions useless because _BitInt could be e.g. passed to varargs. Looking at what clang does, they seem to have raised the limit from 128 to 8388608, but in many cases they emit extremely terrible code. Everything is done without library support inline and even for huge numbers it doesn't even use any loops, so is extremely cache unfriendly. I think we should do something like that solely for very small cases, otherwise use loops and either let normal unrolling do its job, or say do 4 limbs in the loop body at a time or something similar. And would be nice if the ranger could at least discover ranges of how many real bits each SSA_NAME can contain (with bits above those being zero or sign extended) so that we could use more efficient additions/subtractions/multiplications/divisions etc.
If _BitInt constants aren't INTEGER_CST, then all places that expect that any integer constant expression is folded to an INTEGER_CST will need updating to handle whatever tree code is used for _BitInt constants. (In some places that may be needed for correctness, in other places - where a large value wouldn't actually be valid - only for proper diagnostics about an invalid value, if INTEGER_CST is still used for smaller _BitInt constants.)
I guess there are other options. If we could make wide_int/widest_int non-POD, one option would be to turn their storage into a union of the normal small case we use now everywhere (i.e. fixed one) and one where the val array is not stored directly in the storage but pointed to by some pointer. E.g. class GTY(()) wide_int_storage { private: HOST_WIDE_INT val[WIDE_INT_MAX_ELTS]; unsigned int len; unsigned int precision; could be private: union { HOST_WIDE_INT val[WIDE_INT_MAX_ELTS]; HOST_WIDE_INT *valp; }; unsigned int len; unsigned int precision; and decide which one is which based on len > WIDE_INT_MAX_ELTS or something similar. Or, if we can't affort to make it non-POD, perhaps valp would refer to obstack destroyed at the end of each pass or something similar. Another problem is with INTEGER_CST (note, if we lower this stuff before expansion hopefully we wouldn't need something similar for rtxes). Currently INTEGER_CST has: /* The number of HOST_WIDE_INTs in an INTEGER_CST. */ struct { /* The number of HOST_WIDE_INTs if the INTEGER_CST is accessed in its native precision. */ unsigned char unextended; /* The number of HOST_WIDE_INTs if the INTEGER_CST is extended to wider precisions based on its TYPE_SIGN. */ unsigned char extended; /* The number of HOST_WIDE_INTs if the INTEGER_CST is accessed in offset_int precision, with smaller integers being extended according to their TYPE_SIGN. This is equal to one of the two fields above but is cached for speed. */ unsigned char offset; } int_length; Now, this obviously limits the largest representable constants to 0xFF HOST_WIDE_INTs, i.e. at most 16320 bits. We have 8 spare bits there, so one possibility would be to add a flag there and if that flag is true, ignore int_length.{unextended,extended,offset} fields and instead stick that info somewhere into the val array. Or kill TREE_INT_CST_OFFSET_NUNITS (replace it with TREE_INT_CST_EXT_NUNITS (t) <= OFFSET_INT_ELTS ? TREE_INT_CST_EXT_NUNITS (t) : TREE_INT_CST_NUNITS (t)) and turn unextended/extended into unsigned short. Then we can handle at most _BitInt(4194240), slightly more than 2 times lower than what LLVM chose, I guess that would be still acceptable.
(In reply to Jakub Jelinek from comment #38) > I guess there are other options. > If we could make wide_int/widest_int non-POD, one option would be to turn > their storage into a union of the normal small case we use now everywhere > (i.e. fixed one) and one where the val array is not stored directly in the > storage but pointed to by some pointer. > E.g. > class GTY(()) wide_int_storage > { > private: > HOST_WIDE_INT val[WIDE_INT_MAX_ELTS]; > unsigned int len; > unsigned int precision; > could be > private: > union { HOST_WIDE_INT val[WIDE_INT_MAX_ELTS]; HOST_WIDE_INT *valp; }; > unsigned int len; > unsigned int precision; > and decide which one is which based on len > WIDE_INT_MAX_ELTS or something > similar. > Or, if we can't affort to make it non-POD, perhaps valp would refer to > obstack destroyed at the end of each pass or something similar. > Another problem is with INTEGER_CST (note, if we lower this stuff before > expansion hopefully we wouldn't need something similar for rtxes). > Currently INTEGER_CST has: > /* The number of HOST_WIDE_INTs in an INTEGER_CST. */ > struct { > /* The number of HOST_WIDE_INTs if the INTEGER_CST is accessed in > its native precision. */ > unsigned char unextended; > > /* The number of HOST_WIDE_INTs if the INTEGER_CST is extended to > wider precisions based on its TYPE_SIGN. */ > unsigned char extended; > > /* The number of HOST_WIDE_INTs if the INTEGER_CST is accessed in > offset_int precision, with smaller integers being extended > according to their TYPE_SIGN. This is equal to one of the two > fields above but is cached for speed. */ > unsigned char offset; > } int_length; > Now, this obviously limits the largest representable constants to 0xFF > HOST_WIDE_INTs, It might be possible to elide 'offset' given it is just a cache. Also 'extended' can possibly be computed as well.
Created attachment 55094 [details] gcc14-bitint-wip.patch So, on IRC we've agreed with Richi that given the limits we have in the compiler (what wide_int/widest_int can represent at most without making the types have optional arbitrary length indirect payload, what INTEGER_CST can handle (right now 255 64-bit limbs) and TYPE_PRECISION limitation (max 65535 precision)) it would be best to first try to implement _BitInt support with small BITINT_MAXWIDTH (in particular, what fits into wide_int, which is e.g. on x86_64 575 bits) and only when the implementation of that is complete, attempt to lift up some of the limits (start with the wide_int/widest_int one, INTEGER_CST could be handled by bumping the 2 counters from 8-bit to 16-bit and killing the cache, with that we'd be at 65535 as BITINT_MAXWIDTH and whether we'd want to grow it further is a question). This patch implements some WIP, as the testcases show, it can already do something, but doesn't have any of the argument/return value passing code implemented, nor middle-end needed changes (promoting as much as possible to small INTEGER_TYPEs early for small BITINT_TYPEs and adding a lowering pass which will turn the larger ones into loops etc.). Also, wb/uwb constants aren't really done yet.
(In reply to Jakub Jelinek from comment #40) > Created attachment 55094 [details] > gcc14-bitint-wip.patch > > So, on IRC we've agreed with Richi that given the limits we have in the > compiler > (what wide_int/widest_int can represent at most without making the types have > optional arbitrary length indirect payload, what INTEGER_CST can handle > (right > now 255 64-bit limbs) and TYPE_PRECISION limitation (max 65535 precision)) > it would be best to first try to implement _BitInt support with small > BITINT_MAXWIDTH (in particular, what fits into wide_int, which is e.g. on > x86_64 > 575 bits) and only when the implementation of that is complete, attempt to > lift > up some of the limits (start with the wide_int/widest_int one, INTEGER_CST > could > be handled by bumping the 2 counters from 8-bit to 16-bit and killing the > cache, > with that we'd be at 65535 as BITINT_MAXWIDTH and whether we'd want to grow > it > further is a question). > > This patch implements some WIP, as the testcases show, it can already do > something, but doesn't have any of the argument/return value passing code > implemented, nor middle-end needed changes (promoting as much as possible to > small INTEGER_TYPEs early for small BITINT_TYPEs and adding a lowering pass > which will turn the larger ones into loops etc.). Also, wb/uwb constants > aren't > really done yet. Another idea is to have a large BITINT_MAXWIDTH (up to what TYPE_PRECISION supports) but restrict constant folding to the cases we can represent in INTEGER_CST. For the cases where the language requires constant evaluation we'd then sorry (). I think we should be able to handle all-ones encoded and since constant initializers are restricted it should handle most practical cases already.
Created attachment 55141 [details] gcc14-bitint-wip.patch Further progress, _BitInt constants now seem to work (up to the __BITINT_MAXWIDTH__ limit, currently 575 bits) and folding can fold expressions involving those. No code generation yet though.
Created attachment 55148 [details] gcc14-bitint-wip.patch Another update. This version can emit _BitInt(N) values in non-automatic variable initializers, handles passing/returning _BitInt(N) and for N <= 64 (i.e. what fits into a single limb) from what I can see handling it in GIMPLE passes and and even expansion/RTL seems to work. Now, as discussed earlier, for N > GET_MODE_PRECISION (limb_mode) I think we want to lower it in some pass in between IPA and vectorization. For N which fits into DImode if limb is 32-bit (currently no target does that as we have just x86-64 support) or which fits into TImode for 64-bit if TImode is supported, I guess we want to map arithmetics to TImode arithmetics, for say 2-4x larger emit code for arithmetics (except perhaps multiplication/division) inline as straight line code and for even larger as loops. In the last case, a question is if we could use e.g. TARGET_MEM_REF for the variable offset in those loops on the vars even when they aren't TREE_ADDRESSABLE (but would force them into memory during expansion).
On Wed, 24 May 2023, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989 > > Jakub Jelinek <jakub at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Attachment #55141 [details]|0 |1 > is obsolete| | > > --- Comment #43 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > Created attachment 55148 [details] > --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55148&action=edit > gcc14-bitint-wip.patch > > Another update. This version can emit _BitInt(N) values in non-automatic > variable initializers, handles passing/returning _BitInt(N) and for N <= 64 > (i.e. what fits into a single limb) from what I can see handling it in GIMPLE > passes and and even expansion/RTL seems to work. > Now, as discussed earlier, for N > GET_MODE_PRECISION (limb_mode) I think we > want to lower it in some pass in between IPA and vectorization. For N which > fits into DImode if limb is 32-bit (currently no target does that as we have > just x86-64 support) or which fits into TImode for 64-bit if TImode is > supported, I guess we want to map arithmetics > to TImode arithmetics, for say 2-4x larger emit code for arithmetics (except > perhaps multiplication/division) inline as straight line code and for even > larger as loops. > In the last case, a question is if we could use e.g. TARGET_MEM_REF for the > variable offset in those loops on the vars even when they aren't > TREE_ADDRESSABLE (but would force them into memory during expansion). Note you should use TARGET_MEM_REF only when it describes the actual addressing mode you want to use. Otherwise just synthesize ARRAY_REFs like ARRAY_REF <VIEW_CONVERT_EXPR (limb[]) <limbs>, index> with an appropriate VLA libm[] array type. I'd do the lowering right before pass_complete_unrolli and generally emit loopy form (another pass placement required in the -Og pipeline).
Let's consider some simple testcase (where one doesn't really mix different _BitInt sizes etc.). _BitInt(512) foo (_BitInt(512) a, _BitInt(512) b, _BitInt(512) c, _BitInt(512) d) { return (a + b) - (c + d); } With the patch, this now ICEs during expansion, because while we can handle copying of even the larger _BitInt vars, we don't handle (nor plan to) +/- etc. during expansion for that, it would be in the earlier lowering pass. If I'd emit straight line code here, I suppose I could use BIT_FIELD_REFs/BIT_INSERT_EXPRs, but if I want loopy code, as you wrote perhaps ARRAY_REF on VCE could work fine for the input operands, but dunno what to use for the result of the operation, forcing it into a VAR_DECL I'm afraid will mean we can't coalesce it much, the above would force the 2 + results and 1 - result into VAR_DECLs. Could we e.g. allow BIT_INSERT_EXPRs or have some new ref for this purpose to update a single limb in a BITTYPE_INT SSA_NAME? Now, looking what we do right now, detailed expand dump before emergency dump shows: Partition map Partition 0 (_1 - 1 ) Partition 1 (_2 - 2 ) Partition 2 (_3 - 3 ) Partition 3 (a_4(D) - 4 ) Partition 4 (b_5(D) - 5 ) Partition 5 (c_6(D) - 6 ) Partition 6 (d_7(D) - 7 ) which I believe means it didn't actually coalesce anything at all. For the larger BITINT_TYPEs it will be very much desirable to coalesce as much as possible, given that none of the default def SSA_NAMEs are really use I'd think ideally we'd do a += b c += d result = a - c For at least multiplication/division and I assume conversions to/from floating point (and decimal), we'll need some library calls. One question is what ABI to use for them, whether to e.g. pass pointer to the limbs (and when -fbuilding-libgcc predefine macros on what mode is the limb mode, whether the limbs are ordered from least significant to most or vice versa, etc.) and in addition to that precision in bits for each argument and whether it is zero or sign extended from that, so that we could e.g. handle more efficiently _BitInt(16384) foo (unsigned _BitInt(2048) a, _BitInt(1024) b) { return (_BitInt(16384) a) * b; } by passing e.g. _mulwhatever (&res, 16384, &a, 2048, &b, -1024) where -1024 would mean 1024 bits sign extended, 2048 2048 bits zero extended, result is 16384 bits. And for GIMPLE a question is how to express it before expansion, whether we use some ifn that is then lowered.
On Wed, 24 May 2023, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989 > > --- Comment #45 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > Let's consider some simple testcase (where one doesn't really mix different > _BitInt sizes etc.). > _BitInt(512) > foo (_BitInt(512) a, _BitInt(512) b, _BitInt(512) c, _BitInt(512) d) > { > return (a + b) - (c + d); > } > With the patch, this now ICEs during expansion, because while we can handle > copying of even the larger _BitInt vars, we don't handle (nor plan to) +/- etc. > during expansion for that, it would be in the earlier lowering pass. > If I'd emit straight line code here, I suppose I could use > BIT_FIELD_REFs/BIT_INSERT_EXPRs, but if I want loopy code, as you wrote perhaps > ARRAY_REF on VCE could work fine for the input operands, but dunno what to use > for the > result of the operation, forcing it into a VAR_DECL I'm afraid will mean we > can't coalesce it much, the above would force the 2 + results and 1 - result > into VAR_DECLs. > Could we e.g. allow BIT_INSERT_EXPRs or have some new ref for this purpose to > update a single limb in a BITTYPE_INT SSA_NAME? I think for complex expressions that involve SSA temporaries the lowering pass has to be more complex as well and gather as much of the expression as possible so it can avoid _BitInt typed temporaries but instead create for (...) { limb_t tem1 = a[i] + b[i]; limb_t tem2 = c[i] + d[i]; limb_t tem3 = tem1 - tem2; res[i] = tem3; } but yes, for the result you want to force a VAR_DECL (I suppose DECL_RESULT for the above example will be one). I'd probably avoid rewriting user variables into SSA form and only have temporaries created by gimplifications in SSA form. You should be able to use DECL_NOT_GIMPLE_REG_P to force this and make sure update-address-taken leaves things this way unless, say, the user variable is only initialized by a constant?
But then the pass effectively has to do lifetime analysis of the _BitInt(N) for N > 128 etc. SSA_NAMEs and perform the partitioning of those SSA_NAMEs into VAR_DECLs/PARM_DECLs/RESULT_DECLs, so that we don't blow away the local stack; perhaps as you wrote with some local subgraphs turned into a loop which will handle multiple operations together instead of just one operation per loop. Or just use different VAR_DECLs but stick in clobbers where they will be dead and hope out of ssa can merge those. Anyway, more work than I hoped. Though, perhaps it can be also done incrementally, with bare minimum first and improvements later.
> Am 24.05.2023 um 16:18 schrieb jakub at gcc dot gnu.org <gcc-bugzilla@gcc.gnu.org>: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989 > > --- Comment #47 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > But then the pass effectively has to do lifetime analysis of the _BitInt(N) for > N > 128 etc. SSA_NAMEs and perform the partitioning of those SSA_NAMEs into > VAR_DECLs/PARM_DECLs/RESULT_DECLs, so that we don't blow away the local stack; > perhaps as you wrote with some local subgraphs turned into a loop which will > handle multiple operations together instead of just one operation per loop. > Or just use different VAR_DECLs but stick in clobbers where they will be dead > and hope out of ssa can merge those. > Anyway, more work than I hoped. > Though, perhaps it can be also done incrementally, with bare minimum first and > improvements later. Sure, this is just what I think users will expect. We don’t have the high level infrastructure to do this afterwards such as loop fusion and variable contraction (well, in theory graphite can do it but even there we lack actual transform bits).
Created attachment 55151 [details] gcc14-bitint-wip.patch Added a testcase with various operations with _BitInt(N) operands and tweaked c-typeck.cc/fold-const.cc to accept those.
Created attachment 55169 [details] gcc14-bitint-wip.patch Update, this time with addition of libgcc _BitInt multiplication libcall (but not really wiring it on the compiler side yet, that would be part of the new _BitInt lowering pass). The function currently is void __mulbitint3 (__bitint_limb *ret, int retprec, const __bitint_limb *u, int uprec, const __bitint_limb *v, int vprec); which allows mixing different precisions (at compile time, or at runtime using the bitint_reduce_prec function); while in GIMPLE before _BitInt lowering pass MULT_EXPR will obviously have same precision for result and both operands, the lowering pass could spot zero or sign extensions from narrower _BitInts for the operands, or VRP could figure out smaller ranges of values for the operands. Negative uprec or vprec would mean the operand is sign extended from precision -[uv]prec, positive it is zero extended from [uv]prec. u/v could be the same or overlapping, but as the function writes result before consuming all inputs, doesn't allow aliasing between operands and return value. Also, at least in the x86-64 psABI, _BitInt(N) for N < 64 is special and it isn't expected this function would be really used for multiplication of such _BitInts, but of course if say multiplicating say _BitInt(512) by _Bitint(24), it is expected the lowering pass would push those 24 bits into a 64-bit 64-bit aligned limb and pass 24 for that operand. For inputs it assumes bits above precision but still within a limb are uninitialized (and so zero or sign extends when reading it), for the output it always writes full limb (with hopefully proper zero/sign extensions). The implemented algorith is the base school book multiplication, if really needed, we could do Karatsuba for larger inputs. What do you think about this API? Shall I continue and create similar API for divmod? Also, wonder what to do about _BitInt(N) in __builtin_mul_overflow{,_p}. One option would be to say that negative retprec is a request to return a nonzero result for the overflow case, but wonder how much larger the routine would be in that case. Or if we should have two, one for multiplication and one for multiplication with overflow checking. Yet another possibility is to do a dumb thing on the compiler side, call the multiplication with a temporary result as large that it would never overflow and check for the overflow on the caller side.
Note, I've only tested it so far on _BitInt(256) a = 0x1234ab461289cdab8d111007b461289cdab8d1wb; _BitInt(256) b = 0x2385eabcd072311074bcaa385eabcd07111007b46128wb; _BitInt(384) c = (_BitInt(384)) 0x1234ab461289cdab8d111007b461289cdab8d1wb * 0x2385eabcd072311074bcaa385eabcd07111007b46128wb; _BitInt(384) d; extern void __mulbitint3 (unsigned long *, int, const unsigned long *, int, const unsigned long *, int); void foo () { __mulbitint3 (&d, 384, &a, 256, &b, 196); } multiplication, nothing else, guess it will be easier to test it when we can emit from the compiler. And obviously no testing of the big endian limb ordering handling until we add some arch that will support it (if we do that at all).
(In reply to H.J. Lu from comment #14) > (In reply to joseph@codesourcery.com from comment #13) > > https://gitlab.com/x86-psABIs/i386-ABI/-/issues/5 to request such an ABI > > for 32-bit x86. I don't know if there are other psABIs with public issue > > trackers where such issues can be filed (but we'll need some sensible > > default anyway for architectures where we can't get an ABI properly > > specified in an upstream-maintained ABI document). > > ia32 psABI will follow x86-64 psABI. Is it a good idea to use 64-bit limbs and 64-bit alignment for the ia32 ABI? I mean, it is fine to use that _BitInt(N) for N 33..64 has size/alignment/passing of long long, but wonder if for N > 64 the ABI shouldn't use 32-bit limbs, 32-bit alignments and passing as struct containing the 32-bit limbs.
Created attachment 55240 [details] gcc14-bitint-wip.patch Further updates. This introduces a new bitintlower (and bitintlower0) pass, categorizes _BitInt types into 4 categories (small, which are kept as is as they work out of the box, middle, which have already more than one limb, but there exists DImode or TImode type which is supported and covers the precision, here lowering is done by casting to INTEGER_TYPE and back, large which is up to double that size (so it will be lowered to straight line code) and huge, which will use loops. The lowering is so far implemented for the middle _BitInts. Added some runtime testsuite coverage for the small and middle _BitInts (so on x86-64 up to 128 bits).
On Fri, 2 Jun 2023, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989 > > Jakub Jelinek <jakub at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Attachment #55169 [details]|0 |1 > is obsolete| | > > --- Comment #53 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > Created attachment 55240 [details] > --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55240&action=edit > gcc14-bitint-wip.patch > > Further updates. This introduces a new bitintlower (and bitintlower0) pass, > categorizes > _BitInt types into 4 categories (small, which are kept as is as they work out > of the box, middle, which have already more than one limb, but there exists > DImode or TImode > type which is supported and covers the precision, here lowering is done by > casting to > INTEGER_TYPE and back, large which is up to double that size (so it will be > lowered to straight line code) and huge, which will use loops. The lowering is > so far implemented for the middle _BitInts. > Added some runtime testsuite coverage for the small and middle _BitInts (so on > x86-64 up to 128 bits). At least for -Os we probably want to consider moving everything but small and maybe middle to out of line library functions?
(In reply to rguenther@suse.de from comment #54) > At least for -Os we probably want to consider moving everything but > small and maybe middle to out of line library functions? Not sure about that, we need to judge the space savings vs. having all those routines in libgcc_s.so.1 where the size price would be paid by all processes even when they don't use the large/huge _BitInt at all. I certainly plan to have multiplication and division/modulo on libgcc_s.so.1. Admittedly, some entrypoints could be just in libgcc.a and not libgcc_s.so.1. Don't we already have a case for that - the DFP stuff? There are very cheap operations (say bitwise &/|/^/~) which have no dependencies in between limbs, then some with small dependencies (e.g. +/- or shifts or rotates by constant), but e.g. already shifts/rotates by variable count is already going to be ugly at least for the huge ones.
Created attachment 55244 [details] gcc14-bitint-wip-inc.patch Incremental patch on top of the above patch. I've tried to make some progress and implement the simplest large _BitInt cases, &/|/^/~, but ran into a problem there, both BIT_FIELD_REF and BIT_INSERT_EXPR disallow operating on non-mode precisions, while for _BitInt I think it would be really useful to use them on the large/huge _BitInts (which I will force into memory during expansion most likely). Sure, for huge _BitInts, what is handled in the loop will use either ARRAY_REF on VIEW_CONVERT_EXPR for operands or TARGET_MEM_REFs on VAR_DECLs for the results in the loop, but even for those there is the partial most significant limb in some cases that needs to be handled separately. So, do you think it is ok to make an exception for BIT_FIELD_REF/BIT_INSERT_EXPR and allow them on non-mode precision BITINT_TYPEs (the incremental patch enables that) plus handle it during the expansion? Another thing, started to think about PLUS_EXPR/MINUS_EXPR, we have __builtin_ia32_addcarryx_u64/__builtin_ia32_sbb_u64 builtins on x86-64, but from what I can see don't really pattern recognize even simple add + adc. Given: void foo (unsigned long *p, unsigned long *q, unsigned long *r) { unsigned long p0 = p[0], q0 = q[0]; unsigned long p1 = p[1], q1 = q[1]; unsigned long r0 = p0 + q0; unsigned long r1 = p1 + q1 + (r0 < p0); r[0] = r0; r[1] = r1; } void bar (unsigned long *p, unsigned long *q, unsigned long *r) { unsigned long p0 = p[0], q0 = q[0]; unsigned long p1 = p[1], q1 = q[1]; unsigned long p2 = p[2], q2 = q[2]; unsigned long r0 = p0 + q0; unsigned long r1 = p1 + q1 + (r0 < p0); unsigned long r2 = p2 + q2 + (r1 < p1 || r1 < q1); r[0] = r0; r[1] = r1; r[2] = r2; } llvm seems to pattern recognize foo, but doesn't pattern recognize bar as add; adc; adc (is that actually a correct C for that though?). So, shouldn't we implement the clang's https://clang.llvm.org/docs/LanguageExtensions.html#multiprecision-arithmetic-builtins builtins (add least the __builtin_{add,sub}c{,l,ll} builtins), lower them into ifns early (similarly to .{ADD,SUB}_OVERFLOW returning complex integer with 2 returns) and add optabs so that targets can implement those efficiently?
(In reply to Jakub Jelinek from comment #56) > Created attachment 55244 [details] > gcc14-bitint-wip-inc.patch > > Incremental patch on top of the above patch. > > I've tried to make some progress and implement the simplest large _BitInt > cases, > &/|/^/~, but ran into a problem there, both BIT_FIELD_REF and > BIT_INSERT_EXPR disallow > operating on non-mode precisions, while for _BitInt I think it would be > really useful > to use them on the large/huge _BitInts (which I will force into memory > during expansion most likely). Sure, for huge _BitInts, what is handled in > the loop will use either > ARRAY_REF on VIEW_CONVERT_EXPR for operands or TARGET_MEM_REFs on VAR_DECLs > for the results in the loop, but even for those there is the partial most > significant limb in some cases that needs to be handled separately. > > So, do you think it is ok to make an exception for > BIT_FIELD_REF/BIT_INSERT_EXPR and > allow them on non-mode precision BITINT_TYPEs (the incremental patch enables > that) plus > handle it during the expansion? The incremental patch doesn't implement the expansion part, right? The problem is that BIT_* are specified to work on the in-memory representation and a non-mode precision entity doesn't have this specified - you'd have to extend / shift that to some mode to be able to store it. So to extract from or insert into some bit-precision entity you have to perform this conversion somehow. Why do you have this anyway? Is it really that the ABIs(?) allow for the padding up to limb size of the partial limb to be not present (aka in unmapped memory?)? Why can't you work on the full libm and just "pollute" the padding at will but then also zero-extending on loads? > Another thing, started to think about PLUS_EXPR/MINUS_EXPR, we have > __builtin_ia32_addcarryx_u64/__builtin_ia32_sbb_u64 builtins on x86-64, but > from what > I can see don't really pattern recognize even simple add + adc. > > Given: > void > foo (unsigned long *p, unsigned long *q, unsigned long *r) > { > unsigned long p0 = p[0], q0 = q[0]; > unsigned long p1 = p[1], q1 = q[1]; > unsigned long r0 = p0 + q0; > unsigned long r1 = p1 + q1 + (r0 < p0); > r[0] = r0; > r[1] = r1; > } > > void > bar (unsigned long *p, unsigned long *q, unsigned long *r) > { > unsigned long p0 = p[0], q0 = q[0]; > unsigned long p1 = p[1], q1 = q[1]; > unsigned long p2 = p[2], q2 = q[2]; > unsigned long r0 = p0 + q0; > unsigned long r1 = p1 + q1 + (r0 < p0); > unsigned long r2 = p2 + q2 + (r1 < p1 || r1 < q1); > r[0] = r0; > r[1] = r1; > r[2] = r2; > } > > llvm seems to pattern recognize foo, but doesn't pattern recognize bar as > add; adc; adc > (is that actually a correct C for that though?). > > So, shouldn't we implement the clang's > https://clang.llvm.org/docs/LanguageExtensions.html#multiprecision- > arithmetic-builtins > builtins (add least the __builtin_{add,sub}c{,l,ll} builtins), lower them > into ifns early (similarly to .{ADD,SUB}_OVERFLOW returning complex integer > with 2 returns) and add optabs so that targets can implement those > efficiently? Improving code-gen for add-with carry would be indeed nice, I'm not sure we need the user-visible builtins though, matching the open-coded variants to appropriate IFNs would work. But can the _OVERFLOW variants not be used here, at least for unsigned?
(In reply to Richard Biener from comment #57) > (In reply to Jakub Jelinek from comment #56) > > Created attachment 55244 [details] > > gcc14-bitint-wip-inc.patch > > > > Incremental patch on top of the above patch. > > > > I've tried to make some progress and implement the simplest large _BitInt > > cases, > > &/|/^/~, but ran into a problem there, both BIT_FIELD_REF and > > BIT_INSERT_EXPR disallow > > operating on non-mode precisions, while for _BitInt I think it would be > > really useful > > to use them on the large/huge _BitInts (which I will force into memory > > during expansion most likely). Sure, for huge _BitInts, what is handled in > > the loop will use either > > ARRAY_REF on VIEW_CONVERT_EXPR for operands or TARGET_MEM_REFs on VAR_DECLs > > for the results in the loop, but even for those there is the partial most > > significant limb in some cases that needs to be handled separately. > > > > So, do you think it is ok to make an exception for > > BIT_FIELD_REF/BIT_INSERT_EXPR and > > allow them on non-mode precision BITINT_TYPEs (the incremental patch enables > > that) plus > > handle it during the expansion? > > The incremental patch doesn't implement the expansion part, right? The Not yet. > problem is that BIT_* are specified to work on the in-memory representation > and a non-mode precision entity doesn't have this specified - you'd have > to extend / shift that to some mode to be able to store it. One thing is that the checking and expansion constraints preclude using it even on full limbs of a _BitInt which has precision in multiples of limb precision. Say _BitInt(192) has on x86-64 3 64-bit limbs, but the type necessarily has BLKmode, because there are no 192-bit modes. If we allowed BIT_FIELD_REF/BIT_INSERT_EXPR on non-type_has_mode_precision_p BITINT_TYPEs, perhaps we could restrict it to the cases we really need and which can be easily implemented. That is, they'd need to extract or insert bits within the same single limb, making it effectively operate on mode precision of the limb for all the limbs other than the most significant partial one if any, and in the case of the most significant one it could either ignore the padding bits above it or sign/zero extend into the padding bits when touching the MSB bit (depending on if target says those bits are well defined or undefined). > Improving code-gen for add-with carry would be indeed nice, I'm not sure > we need the user-visible builtins though, matching the open-coded variants > to appropriate IFNs would work. But can the _OVERFLOW variants not be > used here, at least for unsigned? Yeah, just noticed the clang builtins are badly designed, see PR79173 for that, so will try to introduce a new ifns and pattern detect them somewhere.
(In reply to Jakub Jelinek from comment #58) > (In reply to Richard Biener from comment #57) > > (In reply to Jakub Jelinek from comment #56) > > > Created attachment 55244 [details] > > > gcc14-bitint-wip-inc.patch > > > > > > Incremental patch on top of the above patch. > > > > > > I've tried to make some progress and implement the simplest large _BitInt > > > cases, > > > &/|/^/~, but ran into a problem there, both BIT_FIELD_REF and > > > BIT_INSERT_EXPR disallow > > > operating on non-mode precisions, while for _BitInt I think it would be > > > really useful > > > to use them on the large/huge _BitInts (which I will force into memory > > > during expansion most likely). Sure, for huge _BitInts, what is handled in > > > the loop will use either > > > ARRAY_REF on VIEW_CONVERT_EXPR for operands or TARGET_MEM_REFs on VAR_DECLs > > > for the results in the loop, but even for those there is the partial most > > > significant limb in some cases that needs to be handled separately. > > > > > > So, do you think it is ok to make an exception for > > > BIT_FIELD_REF/BIT_INSERT_EXPR and > > > allow them on non-mode precision BITINT_TYPEs (the incremental patch enables > > > that) plus > > > handle it during the expansion? > > > > The incremental patch doesn't implement the expansion part, right? The > > Not yet. > > > problem is that BIT_* are specified to work on the in-memory representation > > and a non-mode precision entity doesn't have this specified - you'd have > > to extend / shift that to some mode to be able to store it. > > One thing is that the checking and expansion constraints preclude using it > even on > full limbs of a _BitInt which has precision in multiples of limb precision. > Say _BitInt(192) has on x86-64 3 64-bit limbs, but the type necessarily has > BLKmode, > because there are no 192-bit modes. > If we allowed BIT_FIELD_REF/BIT_INSERT_EXPR on non-type_has_mode_precision_p > BITINT_TYPEs, perhaps we could restrict it to the cases we really need and > which can be easily implemented. That is, they'd need to extract or insert > bits within the same single limb, making it effectively operate on mode > precision of the limb for all the limbs other than the most significant > partial one if any, and in the case of the most significant one it could > either ignore the padding bits above it or sign/zero extend > into the padding bits when touching the MSB bit (depending on if target says > those bits are well defined or undefined). Oh, so BITINT_TYPE is INTEGRAL_TYPE_P but not INTEGER_TYPE (I think we don't have any BLKmode integer types?). I think the intent was to restrict the operation on actual mode entities, BLKmode means memory where it isn't necessary to restrict things. So you could add a BLKmode exception here (but then for _BitInt<63> you will likely get DImode?) Can't you use a MEM_REF to extract limb-size INTEGER_TYPE from the _BitInt<> and then operate on those with BIT_FIELD_REF and BIT_INSERT_EXPR? Of course when the whole _BitInt<> is a SSA name MEM_REF won't work (but when you use ARRAY_REF/VIEW_CONVERT the same holds true) > > Improving code-gen for add-with carry would be indeed nice, I'm not sure > > we need the user-visible builtins though, matching the open-coded variants > > to appropriate IFNs would work. But can the _OVERFLOW variants not be > > used here, at least for unsigned? > > Yeah, just noticed the clang builtins are badly designed, see PR79173 for > that, > so will try to introduce a new ifns and pattern detect them somewhere.
(In reply to Richard Biener from comment #59) > Oh, so BITINT_TYPE is INTEGRAL_TYPE_P but not INTEGER_TYPE (I think we > don't have any BLKmode integer types?). Yes. Some BITINT_TYPEs have BLKmode. > I think the intent was to > restrict the operation on actual mode entities, BLKmode means memory > where it isn't necessary to restrict things. So you could add > a BLKmode exception here (but then for _BitInt<63> you will likely > get DImode?) Sure, _BitInt<63> has DImode, _BitInt<127> has TImode if it is supported. TYPE_MODE is set according to the rules for structures (so that it would help with function_arg etc. implementation on some targets), so I think say OImode for _BitInt<254> isn't impossible. > Can't you use a MEM_REF to extract limb-size INTEGER_TYPE from the > _BitInt<> and then operate on those with BIT_FIELD_REF and BIT_INSERT_EXPR? > Of course when the whole _BitInt<> is a SSA name MEM_REF won't work > (but when you use ARRAY_REF/VIEW_CONVERT the same holds true) I wanted to avoid forcing the smaller _BitInt results into VAR_DECLs and only do it for the ones where I'd use loops (the huge category). The plan for loops is to do 2 limbs per iteration initially, plus if there is odd number of limbs or even with partial limb 1-2 limbs done after the loop. So, the large category where loop isn't used would be up to 3 full limbs or 3 full limbs + 1 partial.
On Mon, 5 Jun 2023, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989 > > --- Comment #60 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > (In reply to Richard Biener from comment #59) > > Oh, so BITINT_TYPE is INTEGRAL_TYPE_P but not INTEGER_TYPE (I think we > > don't have any BLKmode integer types?). > > Yes. Some BITINT_TYPEs have BLKmode. > > > I think the intent was to > > restrict the operation on actual mode entities, BLKmode means memory > > where it isn't necessary to restrict things. So you could add > > a BLKmode exception here (but then for _BitInt<63> you will likely > > get DImode?) > > Sure, _BitInt<63> has DImode, _BitInt<127> has TImode if it is supported. > TYPE_MODE is set according to the rules for structures (so that it would help > with function_arg etc. implementation on some targets), so I think say OImode > for _BitInt<254> isn't impossible. > > > Can't you use a MEM_REF to extract limb-size INTEGER_TYPE from the > > _BitInt<> and then operate on those with BIT_FIELD_REF and BIT_INSERT_EXPR? > > Of course when the whole _BitInt<> is a SSA name MEM_REF won't work > > (but when you use ARRAY_REF/VIEW_CONVERT the same holds true) > > I wanted to avoid forcing the smaller _BitInt results into VAR_DECLs and only > do it > for the ones where I'd use loops (the huge category). > The plan for loops is to do 2 limbs per iteration initially, plus if there is > odd number of limbs or even with partial limb 1-2 limbs done after the loop. > So, the large > category where loop isn't used would be up to 3 full limbs or 3 full limbs + 1 > partial. So for the large case you are not using BIT_FIELD_REF on _BitInt<>? But for the small case like _BitInt<63> with DImode you want to do that and also the variables are likely in SSA form, right? How is endianess defined? Probably per limb? Consider unsigned _BitInt<16> a, b; unsigned _BitInt<32> c; c = ((_BitInt<32>)a << 16) | (_BitInt<32>)b; (not sure whether the cast are required). It's all difficult enough if you don't need to wrap your heads around padding. Note that followup optimization passes will refrain from touching the !type_has_mode_precision cases because of padding. So I think it would be good to work on full-limb precision for the actual operations. It should be possible to VIEW_CONVERT a _BitInt<63> to _BitInt<64>, aka VIEW_CONVERT to the modes precision bit-int variant here (or to the actual integer mode integer type which would be even better).
What the patch including incremental one currently does is: 1) small _BitInt (on x86-64 N <= 64) - the BITINT_TYPEs are kept as is in the IL and expanded, they always have non-BLKmode (QI/HI/SI/DI) and are handled like any other INTEGER_TYPEs (except preserved in calls to ensure correct ABI passing) 2) middle _BitInt (on x86-64 N <= 128) - I keep in the IL just copy operations and casts between them and INTEGER_TYPE (TImode in this case), actual arithmetics is done on the INTEGER_TYPE 3) large _BitInt (on x86-64 that will be <= 255) - the intent was using BIT_FIELD_REFs/BIT_INSERT_EXPR to make the IL simple and perform stuff on the up to 4 limbs in this case in straight line code 4) huge _BitInt (on x86-64 N > 255) use loops, VAR_DECL destination, VCE+ARRAY_REF on the sources, dunno yet if I can get good code by making the VAR_DECL clobbered immediately after I load a SSA_NAME from it (whether out of SSA/expansion could then extend the lifetime of the VAR_DECL, or if I should have some pass do that, or the bitint pass figure out the last use and put clobber only after that, or even replace the SSA_NAME uses with accesses to VAR_DECL Anyway, I think I'll work now on the add/sub with carry now and continue on _BitInt only after that.
(In reply to Jakub Jelinek from comment #62) > What the patch including incremental one currently does is: > 1) small _BitInt (on x86-64 N <= 64) - the BITINT_TYPEs are kept as is in > the IL > and expanded, they always have non-BLKmode (QI/HI/SI/DI) and are handled > like any > other INTEGER_TYPEs (except preserved in calls to ensure correct ABI > passing) > 2) middle _BitInt (on x86-64 N <= 128) - I keep in the IL just copy > operations and > casts between them and INTEGER_TYPE (TImode in this case), actual > arithmetics is > done on the INTEGER_TYPE > 3) large _BitInt (on x86-64 that will be <= 255) - the intent was using > BIT_FIELD_REFs/BIT_INSERT_EXPR to make the IL simple and perform stuff on > the > up to 4 limbs in this case in straight line code So these large _BitInt already have BLKmode? If so I'd suggest to initially handle them like the huge _BitInt code but "unrolled" and iterate on the code-gen later - I can have a look once one can play with the actual code and testcases. > 4) huge _BitInt (on x86-64 N > 255) use loops, VAR_DECL destination, > VCE+ARRAY_REF > on the sources, > dunno yet if I can get good code by making the VAR_DECL clobbered > immediately after > I load a SSA_NAME from it (whether out of SSA/expansion could then extend > the > lifetime of the VAR_DECL, or if I should have some pass do that, or the > bitint pass > figure out the last use and put clobber only after that, or even replace > the SSA_NAME > uses with accesses to VAR_DECL > > Anyway, I think I'll work now on the add/sub with carry now and continue on > _BitInt only after that.
Created attachment 55327 [details] gcc14-bitint-wip.patch Some further progress. I found that out of SSA coalescing coalesces only a very small subset of SSA_NAMEs, for _BitInt we need to coalesce significantly more, try to use as few VAR_DECL arrays as possible so that we don't blow away stack sizes. So, I'm trying to find the large/huge _BitInt SSA_NAMEs, quickly find out some which won't be needed as they could be handled inside of a single loop (to be improved later) and then doing aggressive coalesing on those and eventually map those SSA_NAMEs to VAR_DECLs. On void foo (_BitInt(192) *x, _BitInt(192) *y, _BitInt(135) *z, _BitInt(135) *w) { _BitInt(192) a; if (x[0] == y[0]) a = 123wb; else if (x[0] == y[1]) a = y[2]; else if (x[0] == y[2]) a = y[3]; else a = 0wb; x[4] = a; x[5] = x[0] == y[0] ? x[6] : x[0] == y[1] ? x[7] : x[0] == y[2] ? x[8] : x[9]; x[0] &= y[0]; x[1] |= y[1]; x[2] ^= y[2]; x[3] = ~y[3]; z[0] &= w[0]; z[1] |= w[1]; z[2] ^= w[2]; z[3] = ~w[3]; } I'm seeing weird results though, e.g. _1 = *x_32(D); _2 = *y_33(D); if (_1 == _2) but After Coalescing: Partition map Partition 0 (_1 - 1 2 3 4 5 6 7 8 10 11 13 14 16 29 30 34 35 37 38 39 40 ) Partition 1 (_9 - 9 ) Partition 2 (_12 - 12 ) Partition 3 (_15 - 15 ) Partition 4 (_17 - 17 ) Partition 5 (_18 - 18 19 21 22 24 25 27 ) Partition 6 (_20 - 20 ) Partition 7 (_23 - 23 ) Partition 8 (_26 - 26 ) Partition 9 (_28 - 28 ) Partition 10 (x_32(D) - 32 ) Partition 11 (y_33(D) - 33 ) Partition 12 (z_46(D) - 46 ) Partition 13 (w_47(D) - 47 ) Obviously, _1 and _2 need to conflict because they have overlapping live ranges (sure, later on loads from memory should be handled in a smarter way, no need to copy it into another array if at the point of a single use within the same bb (at least) the memory couldn't be clobbered yet).
Created attachment 55329 [details] gcc14-bitint-wip.patch Sorry for false alarm, that has been my screw-up on the coalescing side, now fixed. Here is an updated version, which already creates the temporary variables for each of the partitions, so next step will be start implementing the operations. One thing to figure out I have are loads from memory into large/huge _BitInt. I think we could in that case avoid copying into a temporary VAR_DECL if we can prove that in all the use stmts of them the memory they are loading from couldn't be clobbered (and for the case of a loop merging multiple operations together the last statement from those), but those statements might very well not have vops, so unsure how to find out the current vop SSA_NAME so that I can ask alias oracle.
Created attachment 55364 [details] gcc14-bitint-wip.patch Updated patch. This can already do some simple lowering of the large/huge _BitInt operations, like: void foo (_BitInt(192) *x, _BitInt(192) *y, _BitInt(135) *z, _BitInt(135) *w) { x[0] &= y[0]; x[1] |= y[1]; x[2] ^= y[2]; x[3] = ~y[3]; z[0] &= w[0]; z[1] |= w[1]; z[2] ^= w[2]; z[3] = ~w[3]; } _BitInt(517) a, b, c, d, e, f; void bar (void) { a &= b; c |= b; d ^= b; e = ~f; } Additions/subtractions/left shift by small constant next.
Created attachment 55376 [details] gcc14-bitint-wip.patch Further update which handles additions/subtractions/negations.
Created attachment 55386 [details] gcc14-bitint-wip.patch Further progress, this handles also constants, left shifts by small amount (0 to limb_prec - 1) and ==/!= comparisons and calls.
Created attachment 55392 [details] gcc14-bitint-wip.patch Now with some runtime test coverage for +,-,|,^,&,~,<<small,==,!= for unsigned _BitInt(N) for N 135, 192, 255, 256, 259, 508, 512, 575 to test large/huge with various numbers of limbs and partial limbs, plus associated fixes so that it works correctly.
For right shifts, I wonder if we shouldn't emit inline (perhaps with exception of -Os) something like: __attribute__((noipa)) void ashiftrt575 (unsigned long *p, unsigned long *q, int n) { int prec = 575; int n1 = n & 63; int n2 = n / 64; int n3 = n1 != 0; int n4 = (-n1) & 63; unsigned long ext; int i; for (i = n2; i < prec / 64 - n3; ++i) p[i - n2] = (q[i] >> n1) | (q[i + n3] << n4); ext = ((signed long) (q[prec / 64] << (64 - (prec & 63)))) >> (64 - (prec & 63)); if (n1 && i == prec / 64 - n3) { p[i - n2] = (q[i] >> n1) | (ext << n4); ++i; } i -= n2; p[i] = ((signed long) ext) >> n1; ext = ((signed long) ext) >> 63; for (++i; i < prec / 64 + 1; ++i) p[i] = ext; } __attribute__((noipa)) void lshiftrt575 (unsigned long *p, unsigned long *q, int n) { int prec = 575; int n1 = n & 63; int n2 = n / 64; int n3 = n1 != 0; int n4 = (-n1) & 63; unsigned long ext; int i; for (i = n2; i < prec / 64 - n3; ++i) p[i - n2] = (q[i] >> n1) | (q[i + n3] << n4); ext = q[prec / 64] & ((1UL << (prec % 64)) - 1); if (n1 && i == prec / 64 - n3) { p[i - n2] = (q[i] >> n1) | (ext << n4); ++i; } i -= n2; p[i] = ext >> n1; ext = 0; for (++i; i < prec / 64 + 1; ++i) p[i] = 0; } (for _BitInt(575) and 64-bit limb little endian). If the shift count is constant, it will allow further optimizations, and if e.g. get_nonzero_bits tells us that n is variable but multiple of limb precision, we can optimize some more as well. Looking at what LLVM does, they seem to sign extend in memory to twice as many bits and then just use an unrolled loop without any conditionals, but that doesn't look well for memory usage etc.
Created attachment 55416 [details] gcc14-bitint-wip.patch Updated patch which handles newly arbitrary shits and debug info.
Created attachment 55427 [details] gcc14-bitint-wip.patch Testsuite coverage for shifts and </<=/>/>= comparisons and associated fixes discovered by that.
Created attachment 55435 [details] gcc14-bitint-wip.patch WIP on casts, casts from non-_BitInt integers or small/middle _BitInt to large/huge _BitInt tested (at least when used as operands of mergeable operations or comparisons/shifts/stores), cast between different precision large/huge _BitInt implemented but so far untested, casts from large/huge _BitInt to non-_BitInt integers or small/middle _BitInt yet to be implemented. After that multiplication/division/modulo, then casts from/to floating point.
Created attachment 55482 [details] gcc14-bitint-wip.patch Further progress, bitint-22.c ICEs, so there is still further work needed in handle_cast, but getting closer. Also, fixed up the liveness analysis during bitint coalescing.
Created attachment 55499 [details] gcc14-bitint-wip.patch Cast fixes, now it passes the whole testsuite.
Created attachment 55500 [details] gcc14-bitint-wip.patch Now with support for INTEGER_CST PHI arguments. Will start work on large/huge _BitInt multiplication/division next.
Created attachment 55522 [details] gcc14-bitint-wip.patch Working multiplication now, division/modulo next.
Created attachment 55530 [details] gcc14-bitint-wip.patch Division/modulo should now work too.
Created attachment 55538 [details] gcc14-bitint-wip.patch double -> large/huge signed/unsigned _BitInt support. Works on 8 conversions, will need to add larger testcase coverage and when happy with double, add it for float, long double and __float128 as well. And then large/huge signed/unsigned _BitInt support -> floating point.
Created attachment 55542 [details] gcc14-bitint-wip.patch Now float,double,long double,__float128 -> {signed,unsigned} _BitInt(N) conversions seem to work (at least on the testsuite coverage).
Created attachment 55545 [details] gcc14-bitint-wip.patch _BitInt -> double conversion (float, long double, __float128, _Float16 and __bf16 conversions still to be implemented).
Created attachment 55561 [details] gcc14-bitint-wip.patch Remaining _BitInt to floating point conversions.
Created attachment 55562 [details] gcc14-bitint-wip.patch Now with support for passing large/huge _BitInt(N) INTEGER_CSTs as function arguments (although the RTL could be improved later), -fnon-call-exceptions support for large/huge _BitInt(N) loads/stores/divide/modulo and large/huge _BitInt(N) -> floating point conversions and support for uninited large/huge _BitInt SSA_NAMEs. Next will be ubsan and __builtin_*_overflow.
Created attachment 55567 [details] gcc14-bitint-wip.patch Actually implemented support for switches first. The switchlower support pass has most of the support, so all we need is if we detect large/huge _BitInt indexed switch is to lower it at the start of the bitintlower pass with small tweak in the switchlower pass to transform jump tables from ones indexed by large/huge _BitInt into ones indexed by unsigned long long; switchlower never creates clusters with range which doesn't fit into 64 bits, which makes this possible.
Created attachment 55572 [details] gcc14-bitint-wip.patch At least the x86-64 _BitInt psABI says that the padding bits are undefined and the various other psABI proposals do that as well. Though, when looking at RTL expansion, we were doing REDUCE_BIT_FIELD after operations, meaning that that we effectively relied on those bits at least for small/middle _BitInt to be sign or zero extended. This change tries to force sign/zero extensions when reading _BitInt from memory, parameters etc.
Created attachment 55592 [details] gcc14-bitint-wip.patch small/medium _BitInt __builtin_{add,sub,mul}_overflow support (with testsuite coverage) and large/huge _BitInt __builtin_mul_overflow support (just compile tested on a simple testcase, more testing will need to wait until __builtin_{add,sub}_overflow support is added for large/huge _BitInt.
Created attachment 55596 [details] gcc14-bitint-wip.patch large/huge _BitInt __builtin_{add,sub}_overflow mostly implemented (I've left 2 spots to finish - gcc_unreachable () - which only trigger rarely). Though, e.g. in bitint-41.c test still t113sub t122mul t125mul t127mul t160sub t171mul t174mul t176mul functions abort, so to be debugged next week, then ubsan, inline asm and then hopefully submit.
Created attachment 55628 [details] gcc14-bitint-wip.patch Updated version which passes all the __builtin_*_overflow{,_p} tests. I also used gcov on gimple-lower-bitint.cc to make sure testsuite coverage covers almost everything in the file.
Created attachment 55637 [details] gcc14-bitint-wip.patch Updated patch with -fsanitize=undefined _BitInt support. Some of the runtime messages are inaccurate and some are totally incorrect, but I'm afraid I can't do much until libubsan adds support for _BitInt, which I've requested in https://github.com/llvm/llvm-project/issues/64100 For +-* overflow the messages look good until (inclusive) _BitInt(128) on 64-bit arches (or _BitInt(64) on 32-bit ones), larger print <unknown> instead of numbers and think it is unsigned integer overflow rather than signed (but I think that is better than what clang does when stuff just crashes with what it emits or prints random numbers). For / overflow, again up to _BitInt(128) it works fine, otherwise prints division by zero rather than minimum / -1. For shifts with non-mode precision _BitInts, even small ones, there are various inaccuracies, because libubsan think the mode precision is the precision of the type.
Created attachment 55642 [details] gcc14-bitint-wip.patch Inline asm support with large/huge _BitInt (limited usefulness and makes mostly sense with g constraint), abs/absu/min/max fixes (had a bug in one testcase which prevented from those bugs to be seen) and one .{ADD,SUB}_OVERFLOW fix; all the torture bitint run tests now pass even with -fsanitize=undefined. Have to do something about stmt_ends_bb_p calls with large/huge _BitInt lhs and deal with debuginfo, then bootstrap/regtest it as whole and submit.
Created attachment 55649 [details] gcc14-bitint.patch Full patch including ChangeLog I'll submit after testing finishes.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:b129d6b5f5f13995d57d677afcb3e94d0d9c327f commit r14-3119-gb129d6b5f5f13995d57d677afcb3e94d0d9c327f Author: Jakub Jelinek <jakub@redhat.com> Date: Thu Aug 10 09:22:03 2023 +0200 expr: Small optimization [PR102989] Small optimization to avoid testing modifier multiple times. 2023-08-10 Jakub Jelinek <jakub@redhat.com> PR c/102989 * expr.cc (expand_expr_real_1) <case MEM_REF>: Add an early return for EXPAND_WRITE or EXPAND_MEMORY modifiers to avoid testing it multiple times.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:d5ad55a83d504df582d1e6f1c168454a028c0437 commit r14-3120-gd5ad55a83d504df582d1e6f1c168454a028c0437 Author: Jakub Jelinek <jakub@redhat.com> Date: Thu Aug 10 09:23:08 2023 +0200 lto-streamer-in: Adjust assert [PR102989] With _BitInt(575) or any other _BitInt(513) or larger constants we can run into this assertion. MAX_BITSIZE_MODE_ANY_INT is just a value from which WIDE_INT_MAX_PRECISION is derived. 2023-08-10 Jakub Jelinek <jakub@redhat.com> PR c/102989 * lto-streamer-in.cc (lto_input_tree_1): Assert TYPE_PRECISION is up to WIDE_INT_MAX_PRECISION rather than MAX_BITSIZE_MODE_ANY_INT.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:8afe9d5d2fdd047cbd4e3531170af6b66d30e74a commit r14-3128-g8afe9d5d2fdd047cbd4e3531170af6b66d30e74a Author: Jakub Jelinek <jakub@redhat.com> Date: Thu Aug 10 17:29:23 2023 +0200 phiopt: Fix phiopt ICE on vops [PR102989] I've ran into ICE on gcc.dg/torture/bitint-42.c with -O1 or -Os when enabling expensive tests, and unfortunately I can't reproduce without _BitInt. The IL before phiopt3 has: <bb 87> [local count: 203190070]: # .MEM_428 = VDEF <.MEM_367> bitint.159 = VIEW_CONVERT_EXPR<unsigned long[8]>(*.LC3); goto <bb 89>; [100.00%] <bb 88> [local count: 203190070]: # .MEM_427 = VDEF <.MEM_367> bitint.159 = VIEW_CONVERT_EXPR<unsigned long[8]>(*.LC4); <bb 89> [local count: 406380139]: # .MEM_368 = PHI <.MEM_428(87), .MEM_427(88)> # VUSE <.MEM_368> _123 = VIEW_CONVERT_EXPR<unsigned long[8]>(r495[i_107].D.2780)[0]; and factor_out_conditional_operation is called on the vop PHI, it sees it has exactly two operands and defining statements of both PHI arguments are converts (VCEs in this case), so it thinks it is a good idea to try to optimize that and while doing that it constructs void type SSA_NAMEs and the like. 2023-08-10 Jakub Jelinek <jakub@redhat.com> PR c/102989 * tree-ssa-phiopt.cc (single_non_singleton_phi_for_edges): Never return virtual phis and return NULL if there is a virtual phi where the arguments from E0 and E1 edges aren't equal.
Just as a heads up: there is an ongoing conversation at the x86 psABI about adjusting `_BitInt(128)` to have the same alignment as `__int128`, which would help address some of the issues mentioned here. Please join in the discussion if you have any comments: https://groups.google.com/g/x86-64-abi/c/-JeR9HgUU20
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:4f4fa2501186e43d115238ae938b3df322c9e02a commit r14-3745-g4f4fa2501186e43d115238ae938b3df322c9e02a Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Sep 6 17:25:49 2023 +0200 Middle-end _BitInt support [PR102989] The following patch introduces the middle-end part of the _BitInt support, a new BITINT_TYPE, handling it where needed, except the lowering pass and sanitizer support. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 * tree.def (BITINT_TYPE): New type. * tree.h (TREE_CHECK6, TREE_NOT_CHECK6): Define. (NUMERICAL_TYPE_CHECK, INTEGRAL_TYPE_P): Include BITINT_TYPE. (BITINT_TYPE_P): Define. (CONSTRUCTOR_BITFIELD_P): Return true even for BLKmode bit-fields if they have BITINT_TYPE type. (tree_check6, tree_not_check6): New inline functions. (any_integral_type_check): Include BITINT_TYPE. (build_bitint_type): Declare. * tree.cc (tree_code_size, wide_int_to_tree_1, cache_integer_cst, build_zero_cst, type_hash_canon_hash, type_cache_hasher::equal, type_hash_canon): Handle BITINT_TYPE. (bitint_type_cache): New variable. (build_bitint_type): New function. (signed_or_unsigned_type_for, verify_type_variant, verify_type): Handle BITINT_TYPE. (tree_cc_finalize): Free bitint_type_cache. * builtins.cc (type_to_class): Handle BITINT_TYPE. (fold_builtin_unordered_cmp): Handle BITINT_TYPE like INTEGER_TYPE. * cfgexpand.cc (expand_debug_expr): Punt on BLKmode BITINT_TYPE INTEGER_CSTs. * convert.cc (convert_to_pointer_1, convert_to_real_1, convert_to_complex_1): Handle BITINT_TYPE like INTEGER_TYPE. (convert_to_integer_1): Likewise. For BITINT_TYPE don't check GET_MODE_PRECISION (TYPE_MODE (type)). * doc/generic.texi (BITINT_TYPE): Document. * doc/tm.texi.in (TARGET_C_BITINT_TYPE_INFO): New. * doc/tm.texi: Regenerated. * dwarf2out.cc (base_type_die, is_base_type, modified_type_die, gen_type_die_with_usage): Handle BITINT_TYPE. (rtl_for_decl_init): Punt on BLKmode BITINT_TYPE INTEGER_CSTs or handle those which fit into shwi. * expr.cc (expand_expr_real_1): Define EXTEND_BITINT macro, reduce to bitfield precision reads from BITINT_TYPE vars, parameters or memory locations. Expand large/huge BITINT_TYPE INTEGER_CSTs into memory. * fold-const.cc (fold_convert_loc, make_range_step): Handle BITINT_TYPE. (extract_muldiv_1): For BITINT_TYPE use TYPE_PRECISION rather than GET_MODE_SIZE (SCALAR_INT_TYPE_MODE). (native_encode_int, native_interpret_int, native_interpret_expr): Handle BITINT_TYPE. * gimple-expr.cc (useless_type_conversion_p): Make BITINT_TYPE to some other integral type or vice versa conversions non-useless. * gimple-fold.cc (gimple_fold_builtin_memset): Punt for BITINT_TYPE. (clear_padding_unit): Mention in comment that _BitInt types don't need to fit either. (clear_padding_bitint_needs_padding_p): New function. (clear_padding_type_may_have_padding_p): Handle BITINT_TYPE. (clear_padding_type): Likewise. * internal-fn.cc (expand_mul_overflow): For unsigned non-mode precision operands force pos_neg? to 1. (expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT, expand_BITINTTOFLOAT): New functions. * internal-fn.def (MULBITINT, DIVMODBITINT, FLOATTOBITINT, BITINTTOFLOAT): New internal functions. * internal-fn.h (expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT, expand_BITINTTOFLOAT): Declare. * match.pd (non-equality compare simplifications from fold_binary): Punt if TYPE_MODE (arg1_type) is BLKmode. * pretty-print.h (pp_wide_int): Handle printing of large precision wide_ints which would buffer overflow digit_buffer. * stor-layout.cc (finish_bitfield_representative): For bit-fields with BITINT_TYPE, prefer representatives with precisions in multiple of limb precision. (layout_type): Handle BITINT_TYPE. Handle COMPLEX_TYPE with BLKmode element type and assert it is BITINT_TYPE. * target.def (bitint_type_info): New C target hook. * target.h (struct bitint_info): New type. * targhooks.cc (default_bitint_type_info): New function. * targhooks.h (default_bitint_type_info): Declare. * tree-pretty-print.cc (dump_generic_node): Handle BITINT_TYPE. Handle printing large wide_ints which would buffer overflow digit_buffer. * tree-ssa-sccvn.cc: Include target.h. (eliminate_dom_walker::eliminate_stmt): Punt for large/huge BITINT_TYPE. * tree-switch-conversion.cc (jump_table_cluster::emit): For more than 64-bit BITINT_TYPE subtract low bound from expression and cast to 64-bit integer type both the controlling expression and case labels. * typeclass.h (enum type_class): Add bitint_type_class enumerator. * varasm.cc (output_constant): Handle BITINT_TYPE INTEGER_CSTs. * vr-values.cc (check_for_binary_op_overflow): Use widest2_int rather than widest_int. (simplify_using_ranges::simplify_internal_call_using_ranges): Use unsigned_type_for rather than build_nonstandard_integer_type.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:a9d6c7fbeb374365058ffe2b9815d2b4b7193d38 commit r14-3746-ga9d6c7fbeb374365058ffe2b9815d2b4b7193d38 Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Sep 6 17:27:41 2023 +0200 _BitInt lowering support [PR102989] The following patch adds a new bitintlower lowering pass which lowers most operations on medium _BitInt into operations on corresponding integer types, large _BitInt into straight line code operating on 2 or more limbs and finally huge _BitInt into a loop plus optional straight line code. As the only supported architecture is little-endian, the lowering only supports little-endian for now, because it would be impossible to test it all for big-endian. Rest is written with any endian support in mind, but of course only little-endian has been actually tested. I hope it is ok to add big-endian support to the lowering pass incrementally later when first big-endian target shows with the backend support. There are 2 possibilities of adding such support, one would be minimal one, just tweak limb_access function and perhaps one or two other spots and transform there the indexes from little endian (index 0 is least significant) to big endian for just the memory access. Advantage is I think maintainance costs, disadvantage is that the loops will still iterate from 0 to some number of limbs and we'd rely on IVOPTs or something similar changing it later if needed. Or we could make those indexes endian related everywhere, though I'm afraid that would be several hundreds of changes. For switches indexed by large/huge _BitInt the patch invokes what the switch lowering pass does (but only on those specific switches, not all of them); the switch lowering breaks the switches into clusters and none of the clusters can have a range which doesn't fit into 64-bit UWHI, everything else will be turned into a tree of comparisons. For clusters normally emitted as smaller switches, because we already have a guarantee that the low .. high range is at most 64 bits, the patch forces subtraction of the low and turns it into a 64-bit switch. This is done before the actual pass starts. Similarly, we cancel lowering of certain constructs like ABS_EXPR, ABSU_EXPR, MIN_EXPR, MAX_EXPR and COND_EXPR and turn those back to simpler comparisons etc., so that fewer operations need to be lowered later. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 * Makefile.in (OBJS): Add gimple-lower-bitint.o. * passes.def: Add pass_lower_bitint after pass_lower_complex and pass_lower_bitint_O0 after pass_lower_complex_O0. * tree-pass.h (PROP_gimple_lbitint): Define. (make_pass_lower_bitint_O0, make_pass_lower_bitint): Declare. * gimple-lower-bitint.h: New file. * tree-ssa-live.h (struct _var_map): Add bitint member. (init_var_map): Adjust declaration. (region_contains_p): Handle map->bitint like map->outofssa_p. * tree-ssa-live.cc (init_var_map): Add BITINT argument, initialize map->bitint and set map->outofssa_p to false if it is non-NULL. * tree-ssa-coalesce.cc: Include gimple-lower-bitint.h. (build_ssa_conflict_graph): Call build_bitint_stmt_ssa_conflicts if map->bitint. (create_coalesce_list_for_region): For map->bitint ignore SSA_NAMEs not in that bitmap, and allow res without default def. (compute_optimized_partition_bases): In map->bitint mode try hard to coalesce any SSA_NAMEs with the same size. (coalesce_bitint): New function. (coalesce_ssa_name): In map->bitint mode, or map->bitmap into used_in_copies and call coalesce_bitint. * gimple-lower-bitint.cc: New file.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:b38deff6127778fed453bb647e32738ba5c78e33 commit r14-3747-gb38deff6127778fed453bb647e32738ba5c78e33 Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Sep 6 17:29:17 2023 +0200 i386: Enable _BitInt on x86-64 [PR102989] The following patch enables _BitInt support on x86-64, the only target which has _BitInt specified in psABI. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 * config/i386/i386.cc (classify_argument): Handle BITINT_TYPE. (ix86_bitint_type_info): New function. (TARGET_C_BITINT_TYPE_INFO): Redefine.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:95521e15b6ef00c192a1bbd7c13b5f35395c7c9e commit r14-3748-g95521e15b6ef00c192a1bbd7c13b5f35395c7c9e Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Sep 6 17:30:07 2023 +0200 ubsan: _BitInt -fsanitize=undefined support [PR102989] The following patch introduces some -fsanitize=undefined support for _BitInt, but some of the diagnostics is limited by lack of proper support in the library. I've filed https://github.com/llvm/llvm-project/issues/64100 to request proper support, for now some of the diagnostics might have less or more confusing or inaccurate wording but UB should still be diagnosed when it happens. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 gcc/ * internal-fn.cc (expand_ubsan_result_store): Add LHS, MODE and DO_ERROR arguments. For non-mode precision BITINT_TYPE results check if all padding bits up to mode precision are zeros or sign bit copies and if not, jump to DO_ERROR. (expand_addsub_overflow, expand_neg_overflow, expand_mul_overflow): Adjust expand_ubsan_result_store callers. * ubsan.cc: Include target.h and langhooks.h. (ubsan_encode_value): Pass BITINT_TYPE values which fit into pointer size converted to pointer sized integer, pass BITINT_TYPE values which fit into TImode (if supported) or DImode as those integer types or otherwise for now punt (pass 0). (ubsan_type_descriptor): Handle BITINT_TYPE. For pstyle of UBSAN_PRINT_FORCE_INT use TK_Integer (0x0000) mode with a TImode/DImode precision rather than TK_Unknown used otherwise for large/huge BITINT_TYPEs. (instrument_si_overflow): Instrument BITINT_TYPE operations even when they don't have mode precision. * ubsan.h (enum ubsan_print_style): New enumerator. gcc/c-family/ * c-ubsan.cc (ubsan_instrument_shift): Use UBSAN_PRINT_FORCE_INT for type0 type descriptor.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:7a610d44d855424518ecb4429ea5226ed2c32543 commit r14-3749-g7a610d44d855424518ecb4429ea5226ed2c32543 Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Sep 6 17:31:23 2023 +0200 libgcc: Generated tables for _BitInt <-> _Decimal* conversions [PR102989] The following patch adds a header with generated helper tables to support computation of powers of 10 from 10^0 to 10^6111 inclusive into a sufficiently large array of _BitInt limbs. This is split from the rest of the libgcc _BitInt support because it is quite large and together it would run into gcc-patches mail length limits. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 libgcc/ * soft-fp/bitintpow10.h: New file.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:2ce182e258d3ab11310442d5f4dd1d063018aca9 commit r14-3750-g2ce182e258d3ab11310442d5f4dd1d063018aca9 Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Sep 6 17:33:05 2023 +0200 libgcc _BitInt support [PR102989] This patch adds the library helpers for multiplication, division + modulo and casts from and to floating point (both binary and decimal). As described in the intro, the first step is try to reduce further the passed in precision by skipping over most significant limbs with just zeros or sign bit copies. For multiplication and division I've implemented a simple algorithm, using something smarter like Karatsuba or Toom N-Way might be faster for very large _BitInts (which we don't support right now anyway), but could mean more code in libgcc, which maybe isn't what people are willing to accept. For the to/from floating point conversions the patch uses soft-fp, because it already has tons of handy macros which can be used for that. In theory it could be implemented using {,unsigned} long long or {,unsigned} __int128 to/from floating point conversions with some frexp before/after, but at that point we already need to force it into integer registers and analyze it anyway. Plus, for 32-bit arches there is no __int128 that could be used for XF/TF mode stuff. I know that soft-fp is owned by glibc and I think the op-common.h change should be propagated there, but the bitint stuff is really GCC specific and IMHO doesn't belong into the glibc copy. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 libgcc/ * config/aarch64/t-softfp (softfp_extras): Use += rather than :=. * config/i386/64/t-softfp (softfp_extras): Likewise. * config/i386/libgcc-glibc.ver (GCC_14.0.0): Export _BitInt support routines. * config/i386/t-softfp (softfp_extras): Add fixxfbitint and bf, hf and xf mode floatbitint. (CFLAGS-floatbitintbf.c, CFLAGS-floatbitinthf.c): Add -msse2. * config/riscv/t-softfp32 (softfp_extras): Use += rather than :=. * config/rs6000/t-e500v1-fp (softfp_extras): Likewise. * config/rs6000/t-e500v2-fp (softfp_extras): Likewise. * config/t-softfp (softfp_floatbitint_funcs): New. (softfp_bid_list): New. (softfp_func_list): Add sf and df mode from and to _BitInt libcalls. (softfp_bid_file_list): New. (LIB2ADD_ST): Add $(softfp_bid_file_list). * config/t-softfp-sfdftf (softfp_extras): Add fixtfbitint and floatbitinttf. * config/t-softfp-tf (softfp_extras): Likewise. * libgcc2.c (bitint_reduce_prec): New inline function. (BITINT_INC, BITINT_END): Define. (bitint_mul_1, bitint_addmul_1): New helper functions. (__mulbitint3): New function. (bitint_negate, bitint_submul_1): New helper functions. (__divmodbitint4): New function. * libgcc2.h (LIBGCC2_UNITS_PER_WORD): When building _BitInt support libcalls, redefine depending on __LIBGCC_BITINT_LIMB_WIDTH__. (__mulbitint3, __divmodbitint4): Declare. * libgcc-std.ver.in (GCC_14.0.0): Export _BitInt support routines. * Makefile.in (lib2funcs): Add _mulbitint3. (LIB2_DIVMOD_FUNCS): Add _divmodbitint4. * soft-fp/bitint.h: New file. * soft-fp/fixdfbitint.c: New file. * soft-fp/fixsfbitint.c: New file. * soft-fp/fixtfbitint.c: New file. * soft-fp/fixxfbitint.c: New file. * soft-fp/floatbitintbf.c: New file. * soft-fp/floatbitintdf.c: New file. * soft-fp/floatbitinthf.c: New file. * soft-fp/floatbitintsf.c: New file. * soft-fp/floatbitinttf.c: New file. * soft-fp/floatbitintxf.c: New file. * soft-fp/op-common.h (_FP_FROM_INT): Add support for rsize up to 4 * _FP_W_TYPE_SIZE rather than just 2 * _FP_W_TYPE_SIZE. * soft-fp/bitintpow10.c: New file. * soft-fp/fixsdbitint.c: New file. * soft-fp/fixddbitint.c: New file. * soft-fp/fixtdbitint.c: New file. * soft-fp/floatbitintsd.c: New file. * soft-fp/floatbitintdd.c: New file. * soft-fp/floatbitinttd.c: New file.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:8c984a1c3693df63520558631c827bb2c2d8b5bc commit r14-3751-g8c984a1c3693df63520558631c827bb2c2d8b5bc Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Sep 6 17:34:49 2023 +0200 C _BitInt support [PR102989] This patch adds the C FE support, c-family support, small libcpp change so that 123wb and 42uwb suffixes are handled plus glimits.h change to define BITINT_MAXWIDTH macro. The previous patches really do nothing without this, which enables all the support. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 gcc/ * glimits.h (BITINT_MAXWIDTH): Define if __BITINT_MAXWIDTH__ is predefined. gcc/c-family/ * c-common.cc (c_common_reswords): Add _BitInt as keyword. (unsafe_conversion_p): Handle BITINT_TYPE like INTEGER_TYPE. (c_common_signed_or_unsigned_type): Handle BITINT_TYPE. (c_common_truthvalue_conversion, c_common_get_alias_set, check_builtin_function_arguments): Handle BITINT_TYPE like INTEGER_TYPE. (sync_resolve_size): Add ORIG_FORMAT argument. If FETCH && !ORIG_FORMAT, type is BITINT_TYPE, return -1 if size isn't one of 1, 2, 4, 8 or 16 or if it is 16 but TImode is not supported. (atomic_bitint_fetch_using_cas_loop): New function. (resolve_overloaded_builtin): Adjust sync_resolve_size caller. If -1 is returned, use atomic_bitint_fetch_using_cas_loop to lower it. Formatting fix. (keyword_begins_type_specifier): Handle RID_BITINT. * c-common.h (enum rid): Add RID_BITINT enumerator. * c-cppbuiltin.cc (c_cpp_builtins): For C call targetm.c.bitint_type_info and predefine __BITINT_MAXWIDTH__ and for -fbuilding-libgcc also __LIBGCC_BITINT_LIMB_WIDTH__ and __LIBGCC_BITINT_ORDER__ macros if _BitInt is supported. * c-lex.cc (interpret_integer): Handle CPP_N_BITINT. * c-pretty-print.cc (c_pretty_printer::simple_type_specifier, c_pretty_printer::direct_abstract_declarator, c_pretty_printer::direct_declarator, c_pretty_printer::declarator): Handle BITINT_TYPE. (pp_c_integer_constant): Handle printing of large precision wide_ints which would buffer overflow digit_buffer. * c-warn.cc (conversion_warning, warnings_for_convert_and_check, warnings_for_convert_and_check): Handle BITINT_TYPE like INTEGER_TYPE. gcc/c/ * c-convert.cc (c_convert): Handle BITINT_TYPE like INTEGER_TYPE. * c-decl.cc (check_bitfield_type_and_width): Allow BITINT_TYPE bit-fields. (finish_struct): Prefer to use BITINT_TYPE for BITINT_TYPE bit-fields if possible. (declspecs_add_type): Formatting fixes. Handle cts_bitint. Adjust for added union in *specs. Handle RID_BITINT. (finish_declspecs): Handle cts_bitint. Adjust for added union in *specs. * c-parser.cc (c_keyword_starts_typename, c_token_starts_declspecs, c_parser_declspecs, c_parser_gnu_attribute_any_word): Handle RID_BITINT. (c_parser_omp_clause_schedule): Handle BITINT_TYPE like INTEGER_TYPE. * c-tree.h (enum c_typespec_keyword): Mention _BitInt in comment. Add cts_bitint enumerator. (struct c_declspecs): Move int_n_idx and floatn_nx_idx into a union and add bitint_prec there as well. * c-typeck.cc (c_common_type, comptypes_internal): Handle BITINT_TYPE. (perform_integral_promotions): Promote BITINT_TYPE bit-fields to their declared type. (build_array_ref, build_unary_op, build_conditional_expr, build_c_cast, convert_for_assignment, digest_init, build_binary_op): Handle BITINT_TYPE. * c-fold.cc (c_fully_fold_internal): Handle BITINT_TYPE like INTEGER_TYPE. * c-aux-info.cc (gen_type): Handle BITINT_TYPE. libcpp/ * expr.cc (interpret_int_suffix): Handle wb and WB suffixes. * include/cpplib.h (CPP_N_BITINT): Define.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:faff31701d50fab08d75fbb13affc82cff74a72c commit r14-3752-gfaff31701d50fab08d75fbb13affc82cff74a72c Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Sep 6 17:36:41 2023 +0200 testsuite part 1 for _BitInt support [PR102989] This patch adds first part of the testsuite support. When creating the testcases, I've been using https://defuse.ca/big-number-calculator.htm tool, a randombitint tool I wrote (posted as a reply to the first series) plus LLVM trunk on godbolt and the WIP GCC support checking if both compilers agree on stuff (and in case of differences tried constant evaluation etc.). The whole testsuite has been also tested with make -j32 -k check-gcc GCC_TEST_RUN_EXPENSIVE=1 \ RUNTESTFLAGS='GCC_TEST_RUN_EXPENSIVE=1 --target_board=unix\{-m32,-m64\} ubsan.exp=bitint*.c dg.exp=bitint* dg-torture.exp=bitint*' to verify it in all modes, normally I'm limitting the torture tests to just -O0 and -O2 because they are quite large and expensive. Generally it is needed to test different _BitInt precisions because they are lowered differently (the small vs. medium vs. large vs. huge, precision of multiples of limb precision or some other etc.). 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 gcc/testsuite/ * lib/target-supports.exp (check_effective_target_bitint, check_effective_target_bitint128, check_effective_target_bitint575): New effective targets. * gcc.dg/bitint-1.c: New test. * gcc.dg/bitint-2.c: New test. * gcc.dg/bitint-3.c: New test. * gcc.dg/bitint-4.c: New test. * gcc.dg/bitint-5.c: New test. * gcc.dg/bitint-6.c: New test. * gcc.dg/bitint-7.c: New test. * gcc.dg/bitint-8.c: New test. * gcc.dg/bitint-9.c: New test. * gcc.dg/bitint-10.c: New test. * gcc.dg/bitint-11.c: New test. * gcc.dg/bitint-12.c: New test. * gcc.dg/bitint-13.c: New test. * gcc.dg/bitint-14.c: New test. * gcc.dg/bitint-15.c: New test. * gcc.dg/bitint-16.c: New test. * gcc.dg/bitint-17.c: New test. * gcc.dg/bitint-18.c: New test. * gcc.dg/torture/bitint-1.c: New test. * gcc.dg/torture/bitint-2.c: New test. * gcc.dg/torture/bitint-3.c: New test. * gcc.dg/torture/bitint-4.c: New test. * gcc.dg/torture/bitint-5.c: New test. * gcc.dg/torture/bitint-6.c: New test. * gcc.dg/torture/bitint-7.c: New test. * gcc.dg/torture/bitint-8.c: New test. * gcc.dg/torture/bitint-9.c: New test. * gcc.dg/torture/bitint-10.c: New test. * gcc.dg/torture/bitint-11.c: New test. * gcc.dg/torture/bitint-12.c: New test. * gcc.dg/torture/bitint-13.c: New test. * gcc.dg/torture/bitint-14.c: New test. * gcc.dg/torture/bitint-15.c: New test. * gcc.dg/torture/bitint-16.c: New test. * gcc.dg/torture/bitint-17.c: New test. * gcc.dg/torture/bitint-18.c: New test. * gcc.dg/torture/bitint-19.c: New test.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:a2f50aa2c578eb0572935e61818e1f2b18b53fd6 commit r14-3753-ga2f50aa2c578eb0572935e61818e1f2b18b53fd6 Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Sep 6 17:37:53 2023 +0200 testsuite part 2 for _BitInt support [PR102989] This is second part of the testcase additions in order to fit into mailing lists limits. Most of these tests are for the floating point conversions, atomics, __builtin_*_overflow and -fsanitize=undefined. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 gcc/testsuite/ * gcc.dg/torture/bitint-20.c: New test. * gcc.dg/torture/bitint-21.c: New test. * gcc.dg/torture/bitint-22.c: New test. * gcc.dg/torture/bitint-23.c: New test. * gcc.dg/torture/bitint-24.c: New test. * gcc.dg/torture/bitint-25.c: New test. * gcc.dg/torture/bitint-26.c: New test. * gcc.dg/torture/bitint-27.c: New test. * gcc.dg/torture/bitint-28.c: New test. * gcc.dg/torture/bitint-29.c: New test. * gcc.dg/torture/bitint-30.c: New test. * gcc.dg/torture/bitint-31.c: New test. * gcc.dg/torture/bitint-32.c: New test. * gcc.dg/torture/bitint-33.c: New test. * gcc.dg/torture/bitint-34.c: New test. * gcc.dg/torture/bitint-35.c: New test. * gcc.dg/torture/bitint-36.c: New test. * gcc.dg/torture/bitint-37.c: New test. * gcc.dg/torture/bitint-38.c: New test. * gcc.dg/torture/bitint-39.c: New test. * gcc.dg/torture/bitint-40.c: New test. * gcc.dg/torture/bitint-41.c: New test. * gcc.dg/torture/bitint-42.c: New test. * gcc.dg/atomic/stdatomic-bitint-1.c: New test. * gcc.dg/atomic/stdatomic-bitint-2.c: New test. * gcc.dg/dfp/bitint-1.c: New test. * gcc.dg/dfp/bitint-2.c: New test. * gcc.dg/dfp/bitint-3.c: New test. * gcc.dg/dfp/bitint-4.c: New test. * gcc.dg/dfp/bitint-5.c: New test. * gcc.dg/dfp/bitint-6.c: New test. * gcc.dg/ubsan/bitint-1.c: New test. * gcc.dg/ubsan/bitint-2.c: New test. * gcc.dg/ubsan/bitint-3.c: New test.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:f76ae4369cb6f38e17510704e5b6e53847d2a648 commit r14-3754-gf76ae4369cb6f38e17510704e5b6e53847d2a648 Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Sep 6 17:39:15 2023 +0200 C _BitInt incremental fixes [PR102989] On Wed, Aug 09, 2023 at 09:17:57PM +0000, Joseph Myers wrote: > > - _Complex _BitInt(N) isn't supported; again mainly because none of the psABIs > > mention how those should be passed/returned; in a limited way they are > > supported internally because the internal functions into which > > __builtin_{add,sub,mul}_overflow{,_p} is lowered return COMPLEX_TYPE as a > > hack to return 2 values without using references/pointers > > What happens when the usual arithmetic conversions are applied to > operands, one of which is a complex integer type and the other of which is > a wider _BitInt type? I don't see anything in the code to disallow this > case (which would produce an expression with a _Complex _BitInt type), or > any testcases for it. I've added a sorry for that case (+ return the narrower COMPLEX_TYPE). Also added testcase to verify we don't create VECTOR_TYPEs of BITINT_TYPE even if they have mode precision and suitable size (others were rejected already before). > Other testcases I think should be present (along with any corresponding > changes needed to the code itself): > > * Verifying that the new integer constant suffix is rejected for C++. Done. > * Verifying appropriate pedwarn-if-pedantic for the new constant suffix > for versions of C before C2x (and probably for use of _BitInt type > specifiers before C2x as well) - along with the expected -Wc11-c2x-compat > handling (in C2x mode) / -pedantic -Wno-c11-c2x-compat in older modes. Done. Here is an incremental patch which does that. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 gcc/c/ * c-decl.cc (finish_declspecs): Emit pedwarn_c11 on _BitInt. * c-typeck.cc (c_common_type): Emit sorry for common type between _Complex integer and larger _BitInt and return the _Complex integer. gcc/c-family/ * c-attribs.cc (type_valid_for_vector_size): Reject vector types with BITINT_TYPE elements even if they have mode precision and suitable size. gcc/testsuite/ * gcc.dg/bitint-19.c: New test. * gcc.dg/bitint-20.c: New test. * gcc.dg/bitint-21.c: New test. * gcc.dg/bitint-22.c: New test. * gcc.dg/bitint-23.c: New test. * gcc.dg/bitint-24.c: New test. * gcc.dg/bitint-25.c: New test. * gcc.dg/bitint-26.c: New test. * gcc.dg/bitint-27.c: New test. * g++.dg/ext/bitint1.C: New test. * g++.dg/ext/bitint2.C: New test. * g++.dg/ext/bitint3.C: New test. * g++.dg/ext/bitint4.C: New test. libcpp/ * expr.cc (cpp_classify_number): Diagnose wb literal suffixes for -pedantic* before C2X or -Wc11-c2x-compat.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:f6e0ec5696ec5f52baed71fe23f978bcef80d458 commit r14-3755-gf6e0ec5696ec5f52baed71fe23f978bcef80d458 Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Sep 6 17:42:37 2023 +0200 libgcc _BitInt helper documentation [PR102989] On Mon, Aug 21, 2023 at 05:32:04PM +0000, Joseph Myers wrote: > I think the libgcc functions (i.e. those exported by libgcc, to which > references are generated by the compiler) need documenting in libgcc.texi. > Internal functions or macros in the libgcc patch need appropriate comments > specifying their semantics; especially FP_TO_BITINT and FP_FROM_BITINT > which have a lot of arguments and no comments saying what the semantics of > the macros and their arguments are supposed to me. Here is an incremental patch which does that. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 gcc/ * doc/libgcc.texi (Bit-precise integer arithmetic functions): Document general rules for _BitInt support library functions and document __mulbitint3 and __divmodbitint4. (Conversion functions): Document __fix{s,d,x,t}fbitint, __floatbitint{s,d,x,t,h,b}f, __bid_fix{s,d,t}dbitint and __bid_floatbitint{s,d,t}d. libgcc/ * libgcc2.c (bitint_negate): Add function comment. * soft-fp/bitint.h (bitint_negate): Add function comment. (FP_TO_BITINT, FP_FROM_BITINT): Add comment explaining the macros.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:c62c82dc98dcb7420498b7114bf4cd2ec1a81405 commit r14-3756-gc62c82dc98dcb7420498b7114bf4cd2ec1a81405 Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Sep 6 17:47:49 2023 +0200 Add further _BitInt <-> floating point tests [PR102989] Here are just the testsuite additions from libgcc _BitInt patch review. On Fri, Sep 01, 2023 at 09:48:22PM +0000, Joseph Myers wrote: > 1. Test overflowing conversions to integers (including from inf or NaN) > raise FE_INVALID. (Note: it's not specified in the standard whether > inexact conversions to integers raise FE_INEXACT or not, so testing that > seems less important.) This is in gcc.dg/bitint-28.c (FE_INVALID) and gcc.dg/bitint-29.c (FE_INEXACT) for binary and dfp/bitint-8.c new tests. > 2. Test conversions from integers to floating point raise FE_INEXACT when > inexact, together with FE_OVERFLOW when overflowing (while exact > conversions don't raise exceptions). This is in gcc.dg/bitint-30.c new test. > 3. Test conversions from integers to floating point respect the rounding > mode. This is in gcc.dg/bitint-31.c new test. > 4. Test converting floating-point values in the range (-1.0, 0.0] to both > unsigned and signed _BitInt; I didn't see such tests for binary floating > types, only for decimal types, and the decimal tests didn't include tests > of negative zero itself as the value converted to _BitInt. This is done as incremental changes to existing tests. > 5. Test conversions of noncanonical BID zero to integers (these tests > would be specific to BID). See below for a bug in this area. This is done in dfp/bitint-7.c test. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 * gcc.dg/torture/bitint-21.c (main): Add tests for -1 for signed only, -1 + epsilon, another (-1, 0) range value and -0. * gcc.dg/torture/bitint-22.c (main): Likewise. * gcc.dg/bitint-28.c: New test. * gcc.dg/bitint-29.c: New test. * gcc.dg/bitint-30.c: New test. * gcc.dg/bitint-31.c: New test. * gcc.dg/dfp/bitint-1.c (main): Add tests for -1 for signed only, -1 + epsilon and -0. * gcc.dg/dfp/bitint-2.c (main): Likewise. * gcc.dg/dfp/bitint-3.c (main): Likewise. * gcc.dg/dfp/bitint-7.c: New test. * gcc.dg/dfp/bitint-8.c: New test.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:3ad9948b3e716885ce66bdf1c8e053880a843a2b commit r14-3757-g3ad9948b3e716885ce66bdf1c8e053880a843a2b Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Sep 6 17:49:44 2023 +0200 _BitInt profile fixes [PR102989] On Thu, Aug 24, 2023 at 03:14:32PM +0200, Jan Hubicka via Gcc-patches wrote: > this patch extends verifier to check that all probabilities and counts are > initialized if profile is supposed to be present. This is a bit complicated > by the posibility that we inline !flag_guess_branch_probability function > into function with profile defined and in this case we need to stop > verification. For this reason I added flag to cfg structure tracking this. This patch broke a couple of _BitInt tests (in the admittedly still uncommitted series - still waiting for review of the C FE bits). Here is a minimal patch to make it work again, though I'm not sure if in the if_then_else and if_then_if_then_else cases I shouldn't scale count of the other bbs as well. if_then method creates if (COND) new_bb1; in a middle of some pre-existing bb (with PROB that COND is true), if_then_else if (COND) new_bb1; else new_bb2; and if_then_if_then_else if (COND1) { if (COND2) new_bb2; else new_bb1; } with PROB1 and PROB2 probabilities that COND1 and COND2 are true. The lowering happens shortly after IPA. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 * gimple-lower-bitint.cc (bitint_large_huge::if_then_else, bitint_large_huge::if_then_if_then_else): Use make_single_succ_edge rather than make_edge, initialize bb->count.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:dce6f6a974d4ecce8491c989c35e23c59223f762 commit r14-3758-gdce6f6a974d4ecce8491c989c35e23c59223f762 Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Sep 6 17:50:49 2023 +0200 Handle BITINT_TYPE in build_{,minus_}one_cst [PR102989] Recent match.pd changes trigger ICE in build_minus_one_cst, apparently I forgot to handle BITINT_TYPE in these (while I've handled it in build_zero_cst). 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 * tree.cc (build_one_cst, build_minus_one_cst): Handle BITINT_TYPE like INTEGER_TYPE.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:52e2aaaa70e847d240fb68a27c88ee60189515a6 commit r14-3759-g52e2aaaa70e847d240fb68a27c88ee60189515a6 Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Sep 6 17:52:24 2023 +0200 Additional _BitInt test coverage [PR102989] On Tue, Sep 05, 2023 at 10:40:26PM +0000, Joseph Myers wrote: > Additional tests I think should be added (for things I expect should > already work): > > * Tests for BITINT_MAXWIDTH in <limits.h>. Test that it's defined for > C2x, but not defined for C11/C17 (the latter independent of whether the > target has _BitInt support). Test the value as well: _BitInt > (BITINT_MAXWIDTH) should be OK (both signed and unsigned) but _BitInt > (BITINT_MAXWIDTH + 1) should not be OK. Also test that BITINT_MAXWIDTH >= > ULLONG_MAX. > > * Test _BitInt (N) where N is a constexpr variable or enum constant (I > expect these should work - the required call to convert_lvalue_to_rvalue > for constexpr to work is present - but I don't see such tests in the > testsuite). > > * Test that -funsigned-bitfields does not affect the signedness of _BitInt > (N) bit-fields (the standard wording isn't entirely clear, but that's > what's implemented in the patches). > > * Test the errors for _Sat used with _BitInt (though such a test might not > actually run at present because no target supports both features). The following patch does that plus for most of the new changes in the C _BitInt support patch requested in patch review it also does testsuite coverage. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 * gcc.dg/bitint-2.c (foo): Add tests for constexpr var or enumerator arguments of _BitInt. * gcc.dg/bitint-31.c: Remove forgotten 0 &&. * gcc.dg/bitint-32.c: New test. * gcc.dg/bitint-33.c: New test. * gcc.dg/bitint-34.c: New test. * gcc.dg/bitint-35.c: New test. * gcc.dg/bitint-36.c: New test. * gcc.dg/fixed-point/bitint-1.c: New test.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:18c90eaa25363d34b5bef444fbbad04f5da2522d commit r14-3774-g18c90eaa25363d34b5bef444fbbad04f5da2522d Author: Jakub Jelinek <jakub@redhat.com> Date: Thu Sep 7 11:17:04 2023 +0200 middle-end: Avoid calling targetm.c.bitint_type_info inside of gcc_assert [PR102989] On Thu, Sep 07, 2023 at 10:36:02AM +0200, Thomas Schwinge wrote: > Minor comment/question: are we doing away with the property that > 'assert'-like "calls" must not have side effects? Per 'gcc/system.h', > this is "OK" for 'gcc_assert' for '#if ENABLE_ASSERT_CHECKING' or > '#elif (GCC_VERSION >= 4005)' -- that is, GCC 4.5, which is always-true, > thus the "offending" '#else' is never active. However, it's different > for standard 'assert' and 'gcc_checking_assert', so I'm not sure if > that's a good property for 'gcc_assert' only? For example, see also > <https://gcc.gnu.org/PR6906> "warn about asserts with side effects", or > recent <https://gcc.gnu.org/PR111144> > "RFE: could -fanalyzer warn about assertions that have side effects?". You're right, the #define gcc_assert(EXPR) ((void)(0 && (EXPR))) fallback definition is incompatible with the way I've used it, so for --disable-checking built by non-GCC it would not work properly. 2023-09-07 Jakub Jelinek <jakub@redhat.com> PR c/102989 * expr.cc (expand_expr_real_1): Don't call targetm.c.bitint_type_info inside gcc_assert, as later code relies on it filling info variable. * gimple-fold.cc (clear_padding_bitint_needs_padding_p, clear_padding_type): Likewise. * varasm.cc (output_constant): Likewise. * fold-const.cc (native_encode_int, native_interpret_int): Likewise. * stor-layout.cc (finish_bitfield_representative, layout_type): Likewise. * gimple-lower-bitint.cc (bitint_precision_kind): Likewise.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:0d00385eaf72ccacff17935b0d214a26773e095f commit r14-4592-g0d00385eaf72ccacff17935b0d214a26773e095f Author: Jakub Jelinek <jakub@redhat.com> Date: Thu Oct 12 16:01:12 2023 +0200 wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989] As mentioned in the _BitInt support thread, _BitInt(N) is currently limited by the wide_int/widest_int maximum precision limitation, which is depending on target 191, 319, 575 or 703 bits (one less than WIDE_INT_MAX_PRECISION). That is fairly low limit for _BitInt, especially on the targets with the 191 bit limitation. The following patch bumps that limit to 16319 bits on all arches (which support _BitInt at all), which is the limit imposed by INTEGER_CST representation (unsigned char members holding number of HOST_WIDE_INT limbs). In order to achieve that, wide_int is changed from a trivially copyable type which contained just an inline array of WIDE_INT_MAX_ELTS (3, 5, 9 or 11 limbs depending on target) limbs into a non-trivially copy constructible, copy assignable and destructible type which for the usual small cases (up to WIDE_INT_MAX_INL_ELTS which is the former WIDE_INT_MAX_ELTS) still uses an inline array of limbs, but for larger precisions uses heap allocated limb array. This makes wide_int unusable in GC structures, so for dwarf2out which was the only place which needed it there is a new rwide_int type (restricted wide_int) which supports only up to RWIDE_INT_MAX_ELTS limbs inline and is trivially copyable (dwarf2out should never deal with large _BitInt constants, those should have been lowered earlier). Similarly, widest_int has been changed from a trivially copyable type which contained also an inline array of WIDE_INT_MAX_ELTS limbs (but unlike wide_int didn't contain precision and assumed that to be WIDE_INT_MAX_PRECISION) into a non-trivially copy constructible, copy assignable and destructible type which has always WIDEST_INT_MAX_PRECISION precision (32640 bits currently, twice as much as INTEGER_CST limitation allows) and unlike wide_int decides depending on get_len () value whether it uses an inline array (again, up to WIDE_INT_MAX_INL_ELTS) or heap allocated one. In wide-int.h this means we need to estimate an upper bound on how many limbs will wide-int.cc (usually, sometimes wide-int.h) need to write, heap allocate if needed based on that estimation and upon set_len which is done at the end if we guessed over WIDE_INT_MAX_INL_ELTS and allocated dynamically, while we actually need less than that copy/deallocate. The unexact guesses are needed because the exact computation of the length in wide-int.cc is sometimes quite complex and especially canonicalize at the end can decrease it. widest_int is again because of this not usable in GC structures, so cfgloop.h has been changed to use fixed_wide_int_storage <WIDE_INT_MAX_INL_PRECISION> and punt if we'd have larger _BitInt based iterators, programs having more than 128-bit iterators will be hopefully rare and I think it is fine to treat loops with more than 2^127 iterations as effectively possibly infinite, omp-general.cc is changed to use fixed_wide_int_storage <1024>, as it better should support scores with the same precision on all arches. Code which used WIDE_INT_PRINT_BUFFER_SIZE sized buffers for printing wide_int/widest_int into buffer had to be changed to use XALLOCAVEC for larger lengths. On x86_64, the patch in --enable-checking=yes,rtl,extra configured bootstrapped cc1plus enlarges the .text section by 1.01% - from 0x25725a5 to 0x25e5555 and similarly at least when compiling insn-recog.cc with the usual bootstrap option slows compilation down by 1.01%, user 4m22.046s and 4m22.384s on vanilla trunk vs. 4m25.947s and 4m25.581s on patched trunk. I'm afraid some code size growth and compile time slowdown is unavoidable in this case, we use wide_int and widest_int everywhere, and while the rare cases are marked with UNLIKELY macros, it still means extra checks for it. The patch also regresses +FAIL: gm2/pim/fail/largeconst.mod, -O +FAIL: gm2/pim/fail/largeconst.mod, -O -g +FAIL: gm2/pim/fail/largeconst.mod, -O3 -fomit-frame-pointer +FAIL: gm2/pim/fail/largeconst.mod, -O3 -fomit-frame-pointer -finline-functions +FAIL: gm2/pim/fail/largeconst.mod, -Os +FAIL: gm2/pim/fail/largeconst.mod, -g +FAIL: gm2/pim/fail/largeconst2.mod, -O +FAIL: gm2/pim/fail/largeconst2.mod, -O -g +FAIL: gm2/pim/fail/largeconst2.mod, -O3 -fomit-frame-pointer +FAIL: gm2/pim/fail/largeconst2.mod, -O3 -fomit-frame-pointer -finline-functions +FAIL: gm2/pim/fail/largeconst2.mod, -Os +FAIL: gm2/pim/fail/largeconst2.mod, -g tests, which previously were rejected with error: constant literal â12345678912345678912345679123456789123456789123456789123456789123456791234567891234567891234567891234567891234567912345678912345678912345678912345678912345679123456789123456789â exceeds internal ZTYPE range kind of errors, but now are accepted. Seems the FE tries to parse constants into widest_int in that case and only diagnoses if widest_int overflows, that seems wrong, it should at least punt if stuff doesn't fit into WIDE_INT_MAX_PRECISION, but perhaps far less than that, if it wants support for middle-end for precisions above 128-bit, it better should be using BITINT_TYPE. Will file a PR and defer to Modula2 maintainer. 2023-10-12 Jakub Jelinek <jakub@redhat.com> PR c/102989 * wide-int.h: Adjust file comment. (WIDE_INT_MAX_INL_ELTS): Define to former value of WIDE_INT_MAX_ELTS. (WIDE_INT_MAX_INL_PRECISION): Define. (WIDE_INT_MAX_ELTS): Change to 255. Assert that WIDE_INT_MAX_INL_ELTS is smaller than WIDE_INT_MAX_ELTS. (RWIDE_INT_MAX_ELTS, RWIDE_INT_MAX_PRECISION, WIDEST_INT_MAX_ELTS, WIDEST_INT_MAX_PRECISION): Define. (WI_BINARY_RESULT_VAR, WI_UNARY_RESULT_VAR): Change write_val callers to pass 0 as a new argument. (class widest_int_storage): Likewise. (widest_int, widest2_int): Change typedefs to use widest_int_storage rather than fixed_wide_int_storage. (enum wi::precision_type): Add INL_CONST_PRECISION enumerator. (struct binary_traits): Add partial specializations for INL_CONST_PRECISION. (generic_wide_int): Add needs_write_val_arg static data member. (int_traits): Likewise. (wide_int_storage): Replace val non-static data member with a union u of it and HOST_WIDE_INT *valp. Declare copy constructor, copy assignment operator and destructor. Add unsigned int argument to write_val. (wide_int_storage::wide_int_storage): Initialize precision to 0 in the default ctor. Remove unnecessary {}s around STATIC_ASSERTs. Assert in non-default ctor T's precision_type is not INL_CONST_PRECISION and allocate u.valp for large precision. Add copy constructor. (wide_int_storage::~wide_int_storage): New. (wide_int_storage::operator=): Add copy assignment operator. In assignment operator remove unnecessary {}s around STATIC_ASSERTs, assert ctor T's precision_type is not INL_CONST_PRECISION and if precision changes, deallocate and/or allocate u.valp. (wide_int_storage::get_val): Return u.valp rather than u.val for large precision. (wide_int_storage::write_val): Likewise. Add an unused unsigned int argument. (wide_int_storage::set_len): Use write_val instead of writing val directly. (wide_int_storage::from, wide_int_storage::from_array): Adjust write_val callers. (wide_int_storage::create): Allocate u.valp for large precisions. (wi::int_traits <wide_int_storage>::get_binary_precision): New. (fixed_wide_int_storage::fixed_wide_int_storage): Make default ctor defaulted. (fixed_wide_int_storage::write_val): Add unused unsigned int argument. (fixed_wide_int_storage::from, fixed_wide_int_storage::from_array): Adjust write_val callers. (wi::int_traits <fixed_wide_int_storage>::get_binary_precision): New. (WIDEST_INT): Define. (widest_int_storage): New template class. (wi::int_traits <widest_int_storage>): New. (trailing_wide_int_storage::write_val): Add unused unsigned int argument. (wi::get_binary_precision): Use wi::int_traits <WI_BINARY_RESULT (T1, T2)>::get_binary_precision rather than get_precision on get_binary_result. (wi::copy): Adjust write_val callers. Don't call set_len if needs_write_val_arg. (wi::bit_not): If result.needs_write_val_arg, call write_val again with upper bound estimate of len. (wi::sext, wi::zext, wi::set_bit): Likewise. (wi::bit_and, wi::bit_and_not, wi::bit_or, wi::bit_or_not, wi::bit_xor, wi::add, wi::sub, wi::mul, wi::mul_high, wi::div_trunc, wi::div_floor, wi::div_ceil, wi::div_round, wi::divmod_trunc, wi::mod_trunc, wi::mod_floor, wi::mod_ceil, wi::mod_round, wi::lshift, wi::lrshift, wi::arshift): Likewise. (wi::bswap, wi::bitreverse): Assert result.needs_write_val_arg is false. (gt_ggc_mx, gt_pch_nx): Remove generic template for all generic_wide_int, instead add functions and templates for each storage of generic_wide_int. Make functions for generic_wide_int <wide_int_storage> and templates for generic_wide_int <widest_int_storage <N>> deleted. (wi::mask, wi::shifted_mask): Adjust write_val calls. * wide-int.cc (zeros): Decrease array size to 1. (BLOCKS_NEEDED): Use CEIL. (canonize): Use HOST_WIDE_INT_M1. (wi::from_buffer): Pass 0 to write_val. (wi::to_mpz): Use CEIL. (wi::from_mpz): Likewise. Pass 0 to write_val. Use WIDE_INT_MAX_INL_ELTS instead of WIDE_INT_MAX_ELTS. (wi::mul_internal): Use WIDE_INT_MAX_INL_PRECISION instead of MAX_BITSIZE_MODE_ANY_INT in automatic array sizes, for prec above WIDE_INT_MAX_INL_PRECISION estimate precision from lengths of operands. Use XALLOCAVEC allocated buffers for prec above WIDE_INT_MAX_INL_PRECISION. (wi::divmod_internal): Likewise. (wi::lshift_large): For len > WIDE_INT_MAX_INL_ELTS estimate it from xlen and skip. (rshift_large_common): Remove xprecision argument, add len argument with len computed in caller. Don't return anything. (wi::lrshift_large, wi::arshift_large): Compute len here and pass it to rshift_large_common, for lengths above WIDE_INT_MAX_INL_ELTS using estimations from xlen if possible. (assert_deceq, assert_hexeq): For lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. (test_printing): Use WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION. * wide-int-print.h (WIDE_INT_PRINT_BUFFER_SIZE): Use WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION. * wide-int-print.cc (print_decs, print_decu, print_hex): For lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. * tree.h (wi::int_traits<extended_tree <N>>): Change precision_type to INL_CONST_PRECISION for N == ADDR_MAX_PRECISION. (widest_extended_tree): Use WIDEST_INT_MAX_PRECISION instead of WIDE_INT_MAX_PRECISION. (wi::ints_for): Use int_traits <extended_tree <N> >::precision_type instead of hard coded CONST_PRECISION. (widest2_int_cst): Use WIDEST_INT_MAX_PRECISION instead of WIDE_INT_MAX_PRECISION. (wi::extended_tree <N>::get_len): Use WIDEST_INT_MAX_PRECISION rather than WIDE_INT_MAX_PRECISION. (wi::ints_for::zero): Use wi::int_traits <wi::extended_tree <N> >::precision_type instead of wi::CONST_PRECISION. * tree.cc (build_replicated_int_cst): Formatting fix. Use WIDE_INT_MAX_INL_ELTS rather than WIDE_INT_MAX_ELTS. * print-tree.cc (print_node): Don't print TREE_UNAVAILABLE on INTEGER_CSTs, TREE_VECs or SSA_NAMEs. * double-int.h (wi::int_traits <double_int>::precision_type): Change to INL_CONST_PRECISION from CONST_PRECISION. * poly-int.h (struct poly_coeff_traits): Add partial specialization for wi::INL_CONST_PRECISION. * cfgloop.h (bound_wide_int): New typedef. (struct nb_iter_bound): Change bound type from widest_int to bound_wide_int. (struct loop): Change nb_iterations_upper_bound, nb_iterations_likely_upper_bound and nb_iterations_estimate type from widest_int to bound_wide_int. * cfgloop.cc (record_niter_bound): Return early if wi::min_precision of i_bound is too large for bound_wide_int. Adjustments for the widest_int to bound_wide_int type change in non-static data members. (get_estimated_loop_iterations, get_max_loop_iterations, get_likely_max_loop_iterations): Adjustments for the widest_int to bound_wide_int type change in non-static data members. * tree-vect-loop.cc (vect_transform_loop): Likewise. * tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations): Use XALLOCAVEC allocated buffer for i_bound len above WIDE_INT_MAX_INL_ELTS. (record_estimate): Return early if wi::min_precision of i_bound is too large for bound_wide_int. Adjustments for the widest_int to bound_wide_int type change in non-static data members. (wide_int_cmp): Use bound_wide_int instead of widest_int. (bound_index): Use bound_wide_int instead of widest_int. (discover_iteration_bound_by_body_walk): Likewise. Use widest_int::from to convert it to widest_int when passed to record_niter_bound. (maybe_lower_iteration_bound): Use widest_int::from to convert it to widest_int when passed to record_niter_bound. (estimate_numbers_of_iteration): Don't record upper bound if loop->nb_iterations has too large precision for bound_wide_int. (n_of_executions_at_most): Use widest_int::from. * tree-ssa-loop-ivcanon.cc (remove_redundant_iv_tests): Adjust for the widest_int to bound_wide_int changes. * match.pd (fold_sign_changed_comparison simplification): Use wide_int::from on wi::to_wide instead of wi::to_widest. * value-range.h (irange::maybe_resize): Avoid using memcpy on non-trivially copyable elements. * value-range.cc (irange_bitmask::dump): Use XALLOCAVEC allocated buffer for mask or value len above WIDE_INT_PRINT_BUFFER_SIZE. * fold-const.cc (fold_convert_const_int_from_int, fold_unary_loc): Use wide_int::from on wi::to_wide instead of wi::to_widest. * tree-ssa-ccp.cc (bit_value_binop): Zero extend r1max from width before calling wi::udiv_trunc. * lto-streamer-out.cc (output_cfg): Adjustments for the widest_int to bound_wide_int type change in non-static data members. * lto-streamer-in.cc (input_cfg): Likewise. (lto_input_tree_1): Use WIDE_INT_MAX_INL_ELTS rather than WIDE_INT_MAX_ELTS. For length above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. Formatting fix. * data-streamer-in.cc (streamer_read_wide_int, streamer_read_widest_int): Likewise. * tree-affine.cc (aff_combination_expand): Use placement new to construct name_expansion. (free_name_expansion): Destruct name_expansion. * gimple-ssa-strength-reduction.cc (struct slsr_cand_d): Change index type from widest_int to offset_int. (class incr_info_d): Change incr type from widest_int to offset_int. (alloc_cand_and_find_basis, backtrace_base_for_ref, restructure_reference, slsr_process_ref, create_mul_ssa_cand, create_mul_imm_cand, create_add_ssa_cand, create_add_imm_cand, slsr_process_add, cand_abs_increment, replace_mult_candidate, replace_unconditional_candidate, incr_vec_index, create_add_on_incoming_edge, create_phi_basis_1, replace_conditional_candidate, record_increment, record_phi_increments_1, phi_incr_cost_1, phi_incr_cost, lowest_cost_path, total_savings, ncd_with_phi, ncd_of_cand_and_phis, nearest_common_dominator_for_cands, insert_initializers, all_phi_incrs_profitable_1, replace_one_candidate, replace_profitable_candidates): Use offset_int rather than widest_int and wi::to_offset rather than wi::to_widest. * real.cc (real_to_integer): Use WIDE_INT_MAX_INL_ELTS rather than 2 * WIDE_INT_MAX_ELTS and for words above that use XALLOCAVEC allocated buffer. * tree-ssa-loop-ivopts.cc (niter_for_exit): Use placement new to construct tree_niter_desc and destruct it on failure. (free_tree_niter_desc): Destruct tree_niter_desc if value is non-NULL. * gengtype.cc (main): Remove widest_int handling. * graphite-isl-ast-to-gimple.cc (widest_int_from_isl_expr_int): Use WIDEST_INT_MAX_ELTS instead of WIDE_INT_MAX_ELTS. * gimple-ssa-warn-alloca.cc (pass_walloca::execute): Use WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION and assert get_len () fits into it. * value-range-pretty-print.cc (vrange_printer::print_irange_bitmasks): For mask or value lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. * gimple-ssa-sprintf.cc (adjust_range_for_overflow): Use wide_int::from on wi::to_wide instead of wi::to_widest. * omp-general.cc (score_wide_int): New typedef. (omp_context_compute_score): Use score_wide_int instead of widest_int and adjust for those changes. (struct omp_declare_variant_entry): Change score and score_in_declare_simd_clone non-static data member type from widest_int to score_wide_int. (omp_resolve_late_declare_variant, omp_resolve_declare_variant): Use score_wide_int instead of widest_int and adjust for those changes. (omp_lto_output_declare_variant_alt): Likewise. (omp_lto_input_declare_variant_alt): Likewise. * godump.cc (go_output_typedef): Assert get_len () is smaller than WIDE_INT_MAX_INL_ELTS. gcc/c-family/ * c-warn.cc (match_case_to_enum_1): Use wi::to_wide just once instead of 3 times, assert get_len () is smaller than WIDE_INT_MAX_INL_ELTS. gcc/testsuite/ * gcc.dg/bitint-38.c: New test.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:cb0119242317c2a6f3127b4acff6aadbfd1dfbc4 commit r14-4635-gcb0119242317c2a6f3127b4acff6aadbfd1dfbc4 Author: Jakub Jelinek <jakub@redhat.com> Date: Sat Oct 14 09:35:44 2023 +0200 middle-end: Allow _BitInt(65535) [PR102989] The following patch lifts further restrictions which limited _BitInt to at most 16319 bits up to 65535. The problem was mainly in INTEGER_CST representation, which had 3 unsigned char members to describe lengths in number of 64-bit limbs, which it wanted to fit into 32 bits. This patch removes the third one which was just a cache to save a few compile time cycles for wi::to_offset and enlarges the other two members to unsigned short. Furthermore, the same problem has been in some uses of trailing_wide_int* (in value-range-storage*) and value-range-storage* itself, while other uses of trailing_wide_int* have been fine (e.g. CONST_POLY_INT, where no constants will be larger than 3/5/9/11 limbs depending on target, so 255 limit is plenty). The patch turns all those length representations to be unsigned short for consistency, so value-range-storage* can handle even 16320-65535 bits BITINT_TYPE ranges. The cc1plus growth is about 16K, so not really significant for 38M .text section. Note, the reason for the new limit is unsigned int precision : 16; TYPE_PRECISION limit, if we wanted to overcome that, TYPE_PRECISION would need to use some other member for BITINT_TYPE from all the others and we could reach that way 4194239 limit (65535 * 64 - 1, again implied by INTEGER_CST and value-range-storage*). Dunno if that is worth it or if it is something we want to do for GCC 14 though. 2023-10-14 Jakub Jelinek <jakub@redhat.com> PR c/102989 gcc/ * tree-core.h (struct tree_base): Remove int_length.offset member, change type of int_length.unextended and int_length.extended from unsigned char to unsigned short. * tree.h (TREE_INT_CST_OFFSET_NUNITS): Remove. (wi::extended_tree <N>::get_len): Don't use TREE_INT_CST_OFFSET_NUNITS, instead compute it at runtime from TREE_INT_CST_EXT_NUNITS and TREE_INT_CST_NUNITS. * tree.cc (wide_int_to_tree_1): Don't assert TREE_INT_CST_OFFSET_NUNITS value. (make_int_cst): Don't initialize TREE_INT_CST_OFFSET_NUNITS. * wide-int.h (WIDE_INT_MAX_ELTS): Change from 255 to 1024. (WIDEST_INT_MAX_ELTS): Change from 510 to 2048, adjust comment. (trailing_wide_int_storage): Change m_len type from unsigned char * to unsigned short *. (trailing_wide_int_storage::trailing_wide_int_storage): Change second argument from unsigned char * to unsigned short *. (trailing_wide_ints): Change m_max_len type from unsigned char to unsigned short. Change m_len element type from struct{unsigned char len;} to unsigned short. (trailing_wide_ints <N>::operator []): Remove .len from m_len accesses. * value-range-storage.h (irange_storage::lengths_address): Change return type from const unsigned char * to const unsigned short *. (irange_storage::write_lengths_address): Change return type from unsigned char * to unsigned short *. * value-range-storage.cc (irange_storage::write_lengths_address): Likewise. (irange_storage::lengths_address): Change return type from const unsigned char * to const unsigned short *. (write_wide_int): Change len argument type from unsigned char *& to unsigned short *&. (irange_storage::set_irange): Change len variable type from unsigned char * to unsigned short *. (read_wide_int): Change len argument type from unsigned char to unsigned short. Use trailing_wide_int_storage <unsigned short> instead of trailing_wide_int_storage and trailing_wide_int <unsigned short> instead of trailing_wide_int. (irange_storage::get_irange): Change len variable type from unsigned char * to unsigned short *. (irange_storage::size): Multiply n by sizeof (unsigned short) in len_size variable initialization. (irange_storage::dump): Change len variable type from unsigned char * to unsigned short *. gcc/cp/ * module.cc (trees_out::start, trees_in::start): Remove TREE_INT_CST_OFFSET_NUNITS handling. gcc/testsuite/ * gcc.dg/bitint-38.c: Change into dg-do run test, in addition to checking the addition, division and right shift results at compile time check it also at runtime. * gcc.dg/bitint-39.c: New test.
This comment is to acknowledge the bug in cc1gm2 regarding the false positives: gm2/pim/fail/largeconst.mod gm2/pim/fail/largeconst2.mod when encountering large ZTYPE constants. Will fix - and thanks for the data type hint.
Created attachment 56482 [details] modula2: proposed fix to fix largeconst.mod Here is a patch set for the modula2 fe which re-implements the ZTYPE overflow detection. Bootstrapped on x86_64 all regressions pass. The ZTYPE in iso modula2 is used to denote intemediate ordinal type const expressions and these are always converted into the approriate language or user ordinal type prior to code generation. The increase of bits supported by _BitInt causes the modula2 largeconst.mod regression failure tests to pass. The largeconst.mod test has been increased to fail, however the char at a time overflow check is now too slow to detect failure. The overflow detection for the ZTYPE has been rewritten to check against exceeding WIDE_INT_MAX_PRECISION (many orders of magnitude faster). gcc/m2/ChangeLog: * gm2-compiler/SymbolTable.mod (OverflowZType): Import from m2expr. (ConstantStringExceedsZType): Remove import. (GetConstLitType): Replace ConstantStringExceedsZType with OverflowZType. * gm2-gcc/m2decl.cc (m2decl_ConstantStringExceedsZType): Remove. (m2decl_BuildConstLiteralNumber): Re-write. * gm2-gcc/m2decl.def (ConstantStringExceedsZType): Remove. * gm2-gcc/m2decl.h (m2decl_ConstantStringExceedsZType): Remove. * gm2-gcc/m2expr.cc (m2expr_StrToWideInt): Rewrite to check overflow. (m2expr_OverflowZType): New function. (ToWideInt): New function. * gm2-gcc/m2expr.def (OverflowZType): New procedure function declaration. * gm2-gcc/m2expr.h (m2expr_OverflowZType): New prototype. gcc/testsuite/ChangeLog: * gm2/pim/fail/largeconst.mod: Updated foo to an outrageous value. * gm2/pim/fail/largeconst2.mod: Duplicate test removed.
The master branch has been updated by Gaius Mulley <gaius@gcc.gnu.org>: https://gcc.gnu.org/g:9693459e030977d6e906ea7eb587ed09ee4fddbd commit r14-5054-g9693459e030977d6e906ea7eb587ed09ee4fddbd Author: Gaius Mulley <gaiusmod2@gmail.com> Date: Wed Nov 1 09:05:10 2023 +0000 PR modula2/102989: reimplement overflow detection in ztype though WIDE_INT_MAX_PRECISION The ZTYPE in iso modula2 is used to denote intemediate ordinal type const expressions and these are always converted into the approriate language or user ordinal type prior to code generation. The increase of bits supported by _BitInt causes the modula2 largeconst.mod regression failure tests to pass. The largeconst.mod test has been increased to fail, however the char at a time overflow check is now too slow to detect failure. The overflow detection for the ZTYPE has been rewritten to check against exceeding WIDE_INT_MAX_PRECISION (many orders of magnitude faster). gcc/m2/ChangeLog: PR modula2/102989 * gm2-compiler/SymbolTable.mod (OverflowZType): Import from m2expr. (ConstantStringExceedsZType): Remove import. (GetConstLitType): Replace ConstantStringExceedsZType with OverflowZType. * gm2-gcc/m2decl.cc (m2decl_ConstantStringExceedsZType): Remove. (m2decl_BuildConstLiteralNumber): Re-write. * gm2-gcc/m2decl.def (ConstantStringExceedsZType): Remove. * gm2-gcc/m2decl.h (m2decl_ConstantStringExceedsZType): Remove. * gm2-gcc/m2expr.cc (m2expr_StrToWideInt): Rewrite to check overflow. (m2expr_OverflowZType): New function. (ToWideInt): New function. * gm2-gcc/m2expr.def (OverflowZType): New procedure function declaration. * gm2-gcc/m2expr.h (m2expr_OverflowZType): New prototype. gcc/testsuite/ChangeLog: PR modula2/102989 * gm2/pim/fail/largeconst.mod: Updated foo to an outrageous value. * gm2/pim/fail/largeconst2.mod: Duplicate test removed. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>