Since the following binutils revision: commit c34d1cc9200ae24dc7572aaf77d80276c0490e9b Author: Jan Beulich <jbeulich@suse.com> Date: Fri Feb 24 13:56:57 2023 +0100 x86: restrict insn templates accepting negative 8-bit immediates I noticed the following rejected code (reduced from ffmpeg-4 package): $ cat libavformat.i void av_log(); typedef signed char __int8_t; typedef unsigned char __uint8_t; typedef unsigned int __uint32_t; typedef __int8_t int8_t; typedef __uint8_t uint8_t; typedef __uint32_t uint32_t; typedef struct AVCodecParameters { uint8_t *extradata; int extradata_size; } AVCodecParameters; static inline uint32_t NEG_USR32(uint32_t a, int8_t s) { __asm__("shrl %1, %0\n\t" : "+r"(a) : "ic"((uint8_t)(-s))); return a; } typedef struct GetBitContext { } GetBitContext; static inline unsigned int get_bits(GetBitContext *s, int n) { register unsigned int tmp; unsigned int __attribute__((unused)) re_cache; tmp = NEG_USR32(re_cache, n); return tmp; }; typedef struct AVOption { } AVOption; typedef struct AVOutputFormat { int (*init)(struct AVFormatContext *); } AVOutputFormat; typedef struct AVStream { AVCodecParameters *codecpar; } AVStream; typedef struct AVProgram { void *priv_data; AVStream **streams; } AVFormatContext; typedef struct ADTSContext { } ADTSContext; static int adts_decode_extradata(AVFormatContext *s, ADTSContext *adts, const uint8_t *buf, int size) { GetBitContext gb; if (get_bits(&gb, 1)) { av_log(s, 16, "Extension flag is not allowed in ADTS\n"); } } static int adts_init(AVFormatContext *s) { ADTSContext *adts = s->priv_data; AVCodecParameters *par = s->streams[0]->codecpar; return adts_decode_extradata(s, adts, par->extradata, par->extradata_size); } static const AVOption options[] = { }; AVOutputFormat ff_adts_muxer = { .init = adts_init, }; $ gcc libavformat.i -w -S -O2 && grep shr libavformat.s && /home/marxin/Programming/binutils/objdir/gas/as-new libavformat.s shrl $-1, %eax libavformat.i: Assembler messages: libavformat.i:14: Error: operand type mismatch for `shr' So we emit -1 even though the user used cast to uint8_t: ((uint8_t)(-s))
How does that look like a gcc bug? It is either a binutils bug for not accepting it anymore, or ffmpeg-4 bug for relying on the negative shifts. GCC inline asm has always worked like that, the operand is 8-bit and in GCC constants are always sign-extended. If you try just static inline unsigned int foo (unsigned int a, signed char s) { asm volatile ("# %1" : "+r" (a) : "ic" ((unsigned char) -s)); return a; } void bar (void) { foo (0, 1); } I get the same behavior of # $-1 with trunk or GCC 3.2. In the assembly, if you have a spot which accepts 8-bit quantity, one shouldn't care if it is signed or unsigned. If you care about the upper bits, you shouldn't pretend the operand is 8-bit but say 32-bit by adding (int) cast to it.
Well, I just noticed clang transforms it into: $ clang libavformat.i -w -S -O2 -o /dev/stdout | grep shr shrl $255, %eax
(In reply to Martin Liška from comment #2) > Well, I just noticed clang transforms it into: > $ clang libavformat.i -w -S -O2 -o /dev/stdout | grep shr > shrl $255, %eax Well, GNU inline assembly is a GNU extension, how clang implemented the GNU extension doesn't change anything on what GCC should and shouldn't do. That said, if binutils rejects negative immediates and not also too large immediates, it is completely wrong. IMHO it should reject neither, it is a runtime thing what will happen. E.g. int foo (int x) { return x >> -1; } and similar can appear in dead code just fine, the important thing is not to trigger it at runtime. GCC uses for 32-bit shifts on x86 actually "cI" constraint rather than "ci" and so negative or too large shift counts will be reloaded into %cl register and so it itself doesn't care.
(In reply to Jakub Jelinek from comment #1) > GCC inline asm has always worked like that, the operand is 8-bit and in GCC > constants are always sign-extended. But then ... > If you try just > static inline unsigned int > foo (unsigned int a, signed char s) > { > asm volatile ("# %1" : "+r" (a) : "ic" ((unsigned char) -s)); this should be sign-extension of the supplied value, i.e. the (potentially negative) value cast to "unsigned char". And gas will be happy to accept that. > In the assembly, if you have a spot which accepts 8-bit quantity, one > shouldn't care if it is signed or unsigned. I firmly disagree: I assume you did look at the description of the gas change. A negative value for SHL may (by some) be viewed as reasonable (and as said in the submission, I could have been convinced to further relax things, provided fair arguments), but a negative value for RCL firmly is rubbish and hence should have been rejected years ago.
(In reply to jbeulich from comment #4) > > In the assembly, if you have a spot which accepts 8-bit quantity, one > > shouldn't care if it is signed or unsigned. > > I firmly disagree: I assume you did look at the description of the gas > change. A negative value for SHL may (by some) be viewed as reasonable (and > as said in the submission, I could have been convinced to further relax > things, provided fair arguments), but a negative value for RCL firmly is > rubbish and hence should have been rejected years ago. GCC doesn't even have that information at all, at the RTL level it doesn't know if it was signed or unsigned. All it has is the constraint and operand for it, like (reg:QI 126) or (const_int -1). As I said earlier, constants are always sign-extended from their mode. One could e.g. have during expansion (set (reg:QI 126) (const_int -1)) and later on asm_operands with "ic" and (reg:QI 126). Same assignment for int8_t x = -1 or int8_t x = 255 or uint8_t x = -1 or uint8_t x = 255, at GIMPLE one can differentiate that based on types, at RTL one has just mode. So there really is nothing that can be changed on the GCC side even if we wanted to (and IMHO we don't, we want to preserve existing behavior).
(In reply to Jakub Jelinek from comment #5) > GCC doesn't even have that information at all, at the RTL level it doesn't > know > if it was signed or unsigned. > All it has is the constraint and operand for it, like (reg:QI 126) or > (const_int -1). > As I said earlier, constants are always sign-extended from their mode. > One could e.g. have during expansion (set (reg:QI 126) (const_int -1)) > and later on asm_operands with "ic" and (reg:QI 126). Same assignment for > int8_t x = -1 or int8_t x = 255 or uint8_t x = -1 or uint8_t x = 255, at > GIMPLE one can differentiate that based on types, at RTL one has just mode. While for int8_t x = -1 or int8_t x = 255 I can see that the result is as described, for uint8_t x = -1 or uint8_t x = 255 (or, as in the example, a constant the was cast to an unsigned 8-bit type) why is it not (const int 255)?
Maybe what would help is a discussion in the context of the binutils patch, despite it already having been committed. As said there and earlier here, there may be reasons to adjust (back) some of what it did, but there needs to be a sufficiently consistent pattern behind that and at least RCL/RCR, according to whichever pattern, have to remain unsigned-only.
(In reply to jbeulich from comment #6) > (In reply to Jakub Jelinek from comment #5) > > GCC doesn't even have that information at all, at the RTL level it doesn't > > know > > if it was signed or unsigned. > > All it has is the constraint and operand for it, like (reg:QI 126) or > > (const_int -1). > > As I said earlier, constants are always sign-extended from their mode. > > One could e.g. have during expansion (set (reg:QI 126) (const_int -1)) > > and later on asm_operands with "ic" and (reg:QI 126). Same assignment for > > int8_t x = -1 or int8_t x = 255 or uint8_t x = -1 or uint8_t x = 255, at > > GIMPLE one can differentiate that based on types, at RTL one has just mode. > > While for int8_t x = -1 or int8_t x = 255 I can see that the result is as > described, for uint8_t x = -1 or uint8_t x = 255 (or, as in the example, a > constant the was cast to an unsigned 8-bit type) why is it not (const int > 255)? Because RTL doesn't have the notion of signed/unsigned types, only modes (which don't have a sign). For many operations there is no difference in how they behave with signed and unsigned values, say PLUS works the same. And where it matters, the signed vs. unsigned is encoded in the code of the operation (there is say arithmetic right shift and logical right shift). (const_int 255) is invalid where 8-bit quantity is required.
(In reply to Jakub Jelinek from comment #1) > How does that look like a gcc bug? It is either a binutils bug for not > accepting it anymore, or ffmpeg-4 bug for relying on the negative shifts. While I'm not sure in how far reduction from original code has discarded too much context, the impression I'm getting is that they use inline assembly because if the same way expressed in a similar way in C, the compiler would warn. And then, rather than making the expression match C standard requirements, assembly code was used instead to silence that diagnostic.
https://gcc.gnu.org/onlinedocs/gccint/Constants.html#Constants if you want to see it in GCC documentation. "Constants generated for modes with fewer bits than in HOST_WIDE_INT must be sign extended to full width (e.g., with gen_int_mode)." and, in asm_operands the operand mode is only present as GET_MODE on the operand, if it is constant, it has VOIDmode mode and the original mode is lost. Anyway, it is pointless to discuss this further, there is really nothing that can be done on the GCC side.
(In reply to jbeulich from comment #9) > (In reply to Jakub Jelinek from comment #1) > > How does that look like a gcc bug? It is either a binutils bug for not > > accepting it anymore, or ffmpeg-4 bug for relying on the negative shifts. > > While I'm not sure in how far reduction from original code has discarded too > much context, the impression I'm getting is that they use inline assembly > because if the same way expressed in a similar way in C, the compiler would > warn. And then, rather than making the expression match C standard > requirements, assembly code was used instead to silence that diagnostic. Of course the code could use whatever_32bit >> (count & 31) and GCC wouldn't warn and on arches like x86 where the shr instruction does effectively the masking would fold the masking into the actual shift instruction. Plus, it would mask constants at compile time to the right range.
(In reply to jbeulich from comment #9) > (In reply to Jakub Jelinek from comment #1) > > How does that look like a gcc bug? It is either a binutils bug for not > > accepting it anymore, or ffmpeg-4 bug for relying on the negative shifts. > > While I'm not sure in how far reduction from original code has discarded too Hope I haven't reduced it much, the original inline functions is defined here: https://git.ffmpeg.org/gitweb/ffmpeg.git/blob/HEAD:/libavcodec/x86/mathops.h#l124
We could file an x86 backend enhancement request based on those comments, int foo (int x, int y) { return x << (y & 31); } int bar (int x, int y) { return x << (32 + y); } int baz (int x, int y) { return x << (-y & 31); } int qux (int x, int y) { return x << (32 - y); } where we optimize away the & 31 masks, but actually not the additions of 32. It actually doesn't change anything on the number of instructions in this case, but bar needs 1 more bytes than equivalent foo and qux even 3 more bytes than equivalent baz.
(In reply to Jakub Jelinek from comment #5) > GCC doesn't even have that information at all, at the RTL level it doesn't > know > if it was signed or unsigned. > All it has is the constraint and operand for it, like (reg:QI 126) or > (const_int -1). Lack of information at a certain layer doesn't mean this isn't a bug. It merely makes whatever possible bug one that's hard to fix. Furthermore you did refer to gcc internal doc as to justifying the behavior. But can you also point me at non-internal doc making explicit that e.g. static inline unsigned aux(unsigned i, unsigned char j) { #if 1 asm("add %1,%0" : "+rm" (i) : "i" (j)); return i; #else return i + j; #endif } unsigned test(unsigned i, unsigned j) { return aux(i, 255) * aux(j, -1); } does not do (at -O1 or -O2) what one would expect (again on x86)? The example is intentionally over-simplified (and hence won't build at -O0); anything more involved could be taken care of by using assembler macros, where it is not overly difficult to separately deal with immediates and registers. In the absence of such documentation I would also view the "has always been like that" argument as at least questionable.
(In reply to jbeulich from comment #14) > (In reply to Jakub Jelinek from comment #5) > > GCC doesn't even have that information at all, at the RTL level it doesn't > > know > > if it was signed or unsigned. > > All it has is the constraint and operand for it, like (reg:QI 126) or > > (const_int -1). > > Lack of information at a certain layer doesn't mean this isn't a bug. It > merely makes whatever possible bug one that's hard to fix. Well, we are talking about behavior of an extension that behaved since its introduction that way and lots of code in the wild can rely on that behavior. So, why would you all of sudden consider it a bug? > Furthermore you did refer to gcc internal doc as to justifying the behavior. > But can you also point me at non-internal doc making explicit that e.g. > > static inline unsigned aux(unsigned i, unsigned char j) { > #if 1 > asm("add %1,%0" : "+rm" (i) : "i" (j)); This is just misunderstanding on how inline asm works. GCC (intentionally) treats inline asm as a black box, text in which it does replacements according to the rules, it is a very low level interface and needs the author know what they are doing. Above you're mixing a 32-bit argument with 8-bit argument in an instruction which expects probably 2 32-bit arguments or at least both arguments with the same width. Just try to pass 2 variables to it and use "ri" and you'll see assembler errors, add %dl, %eax and the like. There are various modifiers (often target specific) which allows one to say a particular operand should be printed as if in this mode, when you don't use them, it means the operand mode is to be used. So, for add above one should really use "i" ((int) j) if you want it 32-bit. Similarly how ffmpeg should be using "rI" instead of "ri" that it uses... 'I' Integer constant in the range 0 ... 31, for 32-bit shifts. even documents it that way.
(In reply to Jakub Jelinek from comment #15) > Above you're mixing a 32-bit argument with 8-bit argument in an instruction > which > expects probably 2 32-bit arguments or at least both arguments with the same > width. > Just try to pass 2 variables to it and use "ri" and you'll see assembler > errors, > add %dl, %eax and the like. Of course, and I did say the example was over-simplified. If it helps, consider (as also indicated) invoking a macro instead, which then inspects the operands and decides what insn to produce. This could be particularly interesting with the .insn that I'm in the process of preparing to add to x86 gas, where one would then inspect arguments in order to select a suitable major opcode. Since x86 has different possible encodings for "add immediate", the wrongly represented value would then lead to silent bad code generation. And btw - what "size" to assign to e.g. a sign-extended 8-bit immediate is at least ambiguous. I can only repeat: Unless the anomaly is properly called out in non-internal documentation, I continue to think there's a bug here. And the reference to Clang getting it right, which you simply put off, isn't entirely meaningless imo (I agree we're talking about a GNU extension here, but that doesn't imply only GNU tools can get it right).
(In reply to jbeulich from comment #16) > I can only repeat: Unless the anomaly is properly called out in non-internal > documentation, I continue to think there's a bug here. And the reference to For inline asm, the only important things besides the exact values are the mode (for integral scalar modes that determines mostly how many bits it has), constraints (which determine where it can be passed and limit set of valid values) and optional modifiers which change how it is printed. > Clang getting it right, which you simply put off, isn't entirely meaningless > imo (I agree we're talking about a GNU extension here, but that doesn't > imply only GNU tools can get it right). Clang definitely doesn't implement the GNU extension correctly, far from it, because it broke the basic premise that it is a black box for the compiler on which it performs a text replacement. It instead parses and assembles the text, so behaves completely differently from how the extension was defined. So it can't be then safely used for optimization barriers, one can't use there something not really valid in the assembler etc.