The bug is quite simple: when using -masm=intel and a global named "and", as does not accept the output of the compiler. gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 preprocessed file is --- cut begin --- # 1 "a.c" # 1 "<interne>" # 1 "<command-line>" # 1 "a.c" int and = 0; int main() { return and; } --- cut end --- compiler output is --- cut begin --- gcc -v -masm=intel -save-temps a.c Utilisation des specs internes. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/i686-linux-gnu/4.6/lto-wrapper Target: i686-linux-gnu Configuré avec: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.3-1ubuntu5' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --enable-targets=all --disable-werror --with-arch-32=i686 --with-tune=generic --enable-checking=release --build=i686-linux-gnu --host=i686-linux-gnu --target=i686-linux-gnu Modèle de thread: posix gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) COLLECT_GCC_OPTIONS='-v' '-masm=intel' '-save-temps' '-mtune=generic' '-march=i686' /usr/lib/gcc/i686-linux-gnu/4.6/cc1 -E -quiet -v -imultilib . -imultiarch i386-linux-gnu a.c -masm=intel -mtune=generic -march=i686 -fpch-preprocess -fstack-protector -o a.i le répertoire « /usr/local/include/i386-linux-gnu » est ignoré car inexistant le répertoire « /usr/lib/gcc/i686-linux-gnu/4.6/../../../../i686-linux-gnu/include » est ignoré car inexistant la recherche pour #include "..." débute ici : la recherche pour #include <...> débute ici: /usr/lib/gcc/i686-linux-gnu/4.6/include /usr/local/include /usr/lib/gcc/i686-linux-gnu/4.6/include-fixed /usr/include/i386-linux-gnu /usr/include Fin de la liste de recherche. COLLECT_GCC_OPTIONS='-v' '-masm=intel' '-save-temps' '-mtune=generic' '-march=i686' /usr/lib/gcc/i686-linux-gnu/4.6/cc1 -fpreprocessed a.i -quiet -dumpbase a.c -masm=intel -mtune=generic -march=i686 -auxbase a -version -fstack-protector -o a.s GNU C (Ubuntu/Linaro 4.6.3-1ubuntu5) version 4.6.3 (i686-linux-gnu) compiled by GNU C version 4.6.3, GMP version 5.0.2, MPFR version 3.1.0-p3, MPC version 0.9 heuristiques GGC: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 GNU C (Ubuntu/Linaro 4.6.3-1ubuntu5) version 4.6.3 (i686-linux-gnu) compiled by GNU C version 4.6.3, GMP version 5.0.2, MPFR version 3.1.0-p3, MPC version 0.9 heuristiques GGC: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: 09c248eab598b9e2acb117da4cdbd785 COLLECT_GCC_OPTIONS='-v' '-masm=intel' '-save-temps' '-mtune=generic' '-march=i686' as --32 -o a.o a.s a.s: Assembler messages: a.s:21: Error: invalid use of operator "and" --- cut end ---
Hello all, I would like to report that I hit upon a related issue in GCC 10.0.1. Besides complaining on "and", the assembly pass also complains if I use a symbol which happens to be the same as register name, e.g. "bx". $ gcc-10 --version gcc-10 (Ubuntu 10-20200411-0ubuntu1) 10.0.1 20200411 (experimental) [master revision bb87d5cc77d:75961caccb7:f883c46b4877f637e0fa5025b4d6b5c9040ec566] Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ cat test.c int bx[16]; int f(unsigned x) { return bx[x]; } $ gcc-10 -c test.c -O3 -masm=intel /tmp/ccGtGi2X.s: Assembler messages: /tmp/ccGtGi2X.s:12: Error: invalid use of register The offending line in the assembly code says lea rax, bx[rip] The problem does _not_ go away even if I quote the symbol name by hand in the assembly output, e.g. lea rax, "bx"[rip] Thank you!
(In reply to tk from comment #1) > the assembly pass also complains if I use a > symbol which happens to be the same as register name, e.g. "bx". It was filed previously as PR87986 and PR95652. BTW, GCC does not include an assembler.
The problem is that the intel asm syntax is just badly defined (broken by design). I'm not aware of any compiler that would emit for such testcases something that could be assembled correctly with gas.
I have found that if I manually change lea rax, bx[rip] to something like lea rax, __bx[rip] ... .weakref __bx, bx the assembly pass succeeds, with the correct results. (It seems that the names "bx" and "and" only pose problems when they are used within expressions. If they are used in a context which unequivocally demands a symbol, then gas can parse them.) Thank you!
It is far easier to use (the default) assembler syntax that is properly designed and doesn't have flaws like this.
(In reply to Jakub Jelinek from comment #3) > The problem is that the intel asm syntax is just badly defined (broken by > design). I'm not aware of any compiler that would emit for such testcases > something that could be assembled correctly with gas. At the risk of stating the obvious, Intel syntax implies that global symbols would have a prefix character appended, typically an underscore, or that otherwise global symbols avoid the assembler recognized identifiers. This sadly is a growing set as new register extensions get added. IOW people wanting to avoid having to rename their symbols eventually would need to also restrict the set of recognized register names via suitable .arch directives.(In reply to Jakub Jelinek from comment #5) > It is far easier to use (the default) assembler syntax that is properly > designed and doesn't have flaws like this. While in the general case I agree, there are downsides when it comes to wanting to make use of macros to "stand in" for instructions, and then wanting to e.g. derive symbols from macro arguments specifying registers. Such macros need to go to some length to get rid of the % character.
For the problem originally reported here (operator name space collision) a workaround could be introduced (e.g. a new operand to .intel_syntax to allow suppressing the recognition of MASM-like operands). I don't, however, see any option for register names (besides ".intel_syntax prefix" of course, but aiui this wouldn't work with the output gcc produces, and of course this mode isn't really "Intel Syntax" anymore anyway).
*** Bug 98488 has been marked as a duplicate of this bug. ***
*** Bug 87986 has been marked as a duplicate of this bug. ***
*** Bug 95652 has been marked as a duplicate of this bug. ***
I have a rough plan on the gas side, but that will then need a gcc side change as well: For a couple of years we have had quoted symbol names there. While this doesn't currently work right in a number of cases (including the one needed here) the plan is to make e.g. mov eax, "ecx" not be treated the same as mov eax, ecx but considering "ecx" a symbol name due to the quotation. Obviously gcc's configure mechanism would then need to detect the assemblers capability of understanding this, and quote symbol names accordingly (perhaps universally rather than special-casing any particular names). While this isn't MASM-compatible (MASM treats "ecx" in such a case as an immediate), I view this as less of a problem than using e.g. Arm's model of enclosing a register name in parentheses to designate it as a symbol name: mov eax, (ecx) MASM treats this the same as with no parentheses present, and I consider this form to be more likely to be used in code ported from MASM than double quoted literals used as immediate constants. (Regardless of the choice in the end it may turn out necessary to hide the new behavior behind a new command line option and/or directive extension.)
*** Bug 109726 has been marked as a duplicate of this bug. ***
dup notwithstanding, I think I had better copy my recommendation here for reference: This is how MSVC handles such names: (https://gcc.godbolt.org/z/TonjYaxqj) ``` static int* volatile rip; static unsigned int volatile eax; int get_value(void) { return rip[eax]; } ``` MSVC outputs: ``` get_value PROC ; COMDAT mov ecx, DWORD PTR eax mov rax, QWORD PTR rip mov eax, DWORD PTR [rax+rcx*4] ret 0 get_value ENDP ``` GCC outputs: ``` get_value: mov rdx, QWORD PTR rip[rip] mov eax, DWORD PTR eax[rip] mov eax, DWORD PTR [rdx+rax*4] ret ``` In the case of MSVC, `DWORD PTR eax` is unambiguously parsed as the label `eax` and `DWORD PTR [eax]` is unambiguously parsed as the register `eax`. The address of all labels are always relative to RIP, but it is implied, and brackets are not written explicitly. Maybe GCC can follow MSVC to omit the RIP register and brackets. The x86_64 memory reference syntax matches x86 with the only change in semantics of the immediate offset (for x86_64 it is relative to the next instruction, while for i686 it is absolute), but the opcode is the same.
(In reply to LIU Hao from comment #13) > MSVC outputs: > ``` > get_value PROC ; COMDAT > mov ecx, DWORD PTR eax > mov rax, QWORD PTR rip > mov eax, DWORD PTR [rax+rcx*4] > ret 0 > get_value ENDP > ``` Which as least MASM up to 12.x won't assemble. For one it complains about "rip" being undeclared. And then the load of "ecx" is _not_ a memory access (i.e. the "DWORD PTR" is ignored there). Which is in line with it also objecting to something like "extrn eax:dword". I say this because I'd be happy to help this on the gas side, but only without breaking MASM compatibility. My present plan for gas is (as already outlined in #11) to make quoted identifiers unambiguously mean symbols, not registers. But of course that would still require a gcc side change as well. Unfortunately there continue to be inconsistencies in gas with quoted identifiers in general, and it's not entirely clear yet whether those may need addressing first.
> Which as least MASM up to 12.x won't assemble. For one it complains about > "rip" being undeclared. And then the load of "ecx" is _not_ a memory access > (i.e. the "DWORD PTR" is ignored there). Which is in line with it also > objecting to something like "extrn eax:dword". This is accepted by ML64: ``` PUBLIC main EXTRN rip:DWORD _TEXT SEGMENT main PROC mov eax, DWORD PTR rip ret 0 main ENDP _TEXT ENDS END ``` Does it make sense to create kinda compatibility mode for ML, in addition to MASM, if they are deemed to be incompatible? > I say this because I'd be happy to help this on the gas side, but only > without breaking MASM compatibility. My present plan for gas is (as already > outlined in #11) to make quoted identifiers unambiguously mean symbols, not > registers. But of course that would still require a gcc side change as well. > Unfortunately there continue to be inconsistencies in gas with quoted > identifiers in general, and it's not entirely clear yet whether those may > need addressing first. That quoting thing will be yet another extension. I think we had better keep extensions as few as possible.
(In reply to LIU Hao from comment #15) > This is accepted by ML64: > > ``` > PUBLIC main > EXTRN rip:DWORD > _TEXT SEGMENT > main PROC > mov eax, DWORD PTR rip > ret 0 > main ENDP > _TEXT ENDS > END > ``` Which version? And did you try other register names? Unfortunately the newest I have access to right now is 12.x, and as said in #14 register names other than "rip" won't work there when (attempted to be) used as symbols. Clearly there's little point in dealing with "rip" alone. > Does it make sense to create kinda compatibility mode for ML, in addition to > MASM, if they are deemed to be incompatible? ML == MASM, at least for me (ML and ML64 are merely the names of the [non-ancient] executables).
Yeah. It looks to me like the Microsoft compiler doesn't actually uses the assembler (like LLVM). Given the C source: ``` extern int rax; int main() { return rax; } ``` which compiled without errors: ``` > cl /O2 /c test.c /Fatest.asm Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30148 for x64 Copyright (C) Microsoft Corporation. All rights reserved. test.c ``` and produced this assembly file ``` include listing.inc INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC main EXTRN rax:DWORD _TEXT SEGMENT main PROC ; COMDAT mov eax, DWORD PTR rax ret 0 main ENDP _TEXT ENDS END ``` which can't be assembled ``` > ml64 /c test.asm Microsoft (R) Macro Assembler (x64) Version 14.29.30148.0 Copyright (C) Microsoft Corporation. All rights reserved. Assembling: test.asm test.asm(9) : error A2008:syntax error : rax test.asm(16) : error A2032:invalid use of register ```
Would it make any sense to have GAS be more permissive about such labels, 1. unconditionally? or 2. when input is from a pipe? or 3. when a special option is in effect e.g. `--output-from-gcc`?
(In reply to jbeulich from comment #11) > I have a rough plan on the gas side, but that will then need a gcc side > change as well: For a couple of years we have had quoted symbol names there. > While this doesn't currently work right in a number of cases (including the > one needed here) the plan is to make e.g. > > mov eax, "ecx" > > not be treated the same as > > mov eax, ecx > > but considering "ecx" a symbol name due to the quotation. Obviously gcc's I don't like double quotes here, because it looks something different, like in C. Would it make some sense if we take the approach for MIPS and AArch64 [1], so mov eax, %ecx or mov eax, :ecx denotes `ecx` is the name of a label, and otherwise a register. Also, such a prefix should be optional, so people who write assembly can omit it if they carefully avoid such names. [1] https://maskray.me/blog/2023-05-08-assemblers
(In reply to LIU Hao from comment #19) > (In reply to jbeulich from comment #11) > > I have a rough plan on the gas side, but that will then need a gcc side > > change as well: For a couple of years we have had quoted symbol names there. > > While this doesn't currently work right in a number of cases (including the > > one needed here) the plan is to make e.g. > > > > mov eax, "ecx" > > > > not be treated the same as > > > > mov eax, ecx > > > > but considering "ecx" a symbol name due to the quotation. Obviously gcc's > > I don't like double quotes here, because it looks something different, like > in C. This is assembly; I don't see how (dis)similarity with C would matter. I also don't see how your example is any different in this regard from mov eax, "symbol" which gas has been supporting for quite some time. > Would it make some sense if we take the approach for MIPS and AArch64 > [1], so > > mov eax, %ecx > > or > > mov eax, :ecx > > denotes `ecx` is the name of a label, and otherwise a register. Also, such a > prefix should be optional, so people who write assembly can omit it if they > carefully avoid such names. I can't find any indication of such syntax being supported by gas for either of these architectures. % on MIPS and : on Arm64 actually are involved in relocation specifiers instead. Are you suggesting to overload them? (Note that % is out of game here, for being the register prefix on x86.)
(In reply to jbeulich from comment #20) > This is assembly; I don't see how (dis)similarity with C would matter. I > also don't see how your example is any different in this regard from > > mov eax, "symbol" > > which gas has been supporting for quite some time. oh really? I thought it would have to be implemented. If it's readily available, we can start making use of it right now.
(In reply to LIU Hao from comment #21) > oh really? I thought it would have to be implemented. If it's readily > available, we can start making use of it right now. Well, the general symbol part of it is there (with a few quirks, which I don't think would matter here). This missing part for quoted symbols matching register names was posted, see e.g. https://sourceware.org/pipermail/binutils/2023-May/127318.html.
Changes to GCC should look like this I suspect (I didn't test this): ``` diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index fbd33a6bfd1..de80c7a805f 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -14080,7 +14080,11 @@ ix86_print_operand_address_as (FILE *file, rtx addr, if (flag_pic) output_pic_addr_const (file, disp, 0); else if (GET_CODE (disp) == LABEL_REF) - output_asm_label (disp); + { + putc ('\"', file); + output_asm_label (disp); + putc ('\"', file); + } else if (CONST_INT_P (disp)) offset = disp; else ``` It's a bit strange that `output_asm_label` writes output via a global `FILE*`.
I've composed a proposal to address this issue: https://github.com/lhmouse/mcfgthread/wiki/Formalized-Intel-Syntax-for-x86#the-proposal The proposal is to treat names between `ptr` and `[` as symbols, and to treat to treat names between `[` and `]` as registers. This lea rax, bx[rip] should be rejected due to invalidity, while lea rax, BYTE PTR bx[rip] can be parsed as referencing the symbol `bx` with no ambiguity.
Created attachment 57191 [details] Draft patch This is a draft patch, bootstrapped on {i686,x86_64}-w64-mingw32 successfully. Haven't run tests though.
Created attachment 57199 [details] Draft patch Ver. 2 1. Fix a typo in `ASM_OUTPUT_SYMBOL_REF` (`x` => `SYM`) 2. For Intel syntax, if the name does not start with a `*`, then it is taken as a symbol, and is quoted. 3. If the name starts with a `*`, then it is a request for verbatim output. According to comments in 'dwarf2cfi.cc' which say 'dwarf2out.cc might give us a label expression (e.g. .LVL548-1) as second argument. If so, make it a subexpression, ... ' so the name may be a combined expression. In this case parse it for `+` or `-` where the symbol stops, then quote the symbol and print the remaining part verbatim.
is this really a meta-bug? Normally meta-bugs depend on other bugs...
*** Bug 118268 has been marked as a duplicate of this bug. ***
*** Bug 118389 has been marked as a duplicate of this bug. ***
Created attachment 60542 [details] proposed patch for master Quote symbols so they are not to be mistaken for registers. This requires Binutils 2.26, where the fix was originally done for ARM: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=d02603dc201f80cd9d2a1f4b1a16110b1e04222b Bootstrapped on i686-w64-mingw32 with DWARF2 exception model, and on x86_64-w64-mingw32 with SEH exception model, both patched to use Intel syntax by default. Also bootstrapped on x86_64-linux-gnu with default AT&T syntax, and verified that it produces expected assembly with `-masm=intel`.