Bug 53929 - [meta-bug] -masm=intel with global symbol
Summary: [meta-bug] -masm=intel with global symbol
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.6.3
: P3 minor
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: assemble-failure, wrong-code
: 87986 95652 98488 109726 118268 118389 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-07-11 15:39 UTC by Louis
Modified: 2025-02-20 17:12 UTC (History)
15 users (show)

See Also:
Host:
Target: x86_64-*-*, i?86
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-12-31 00:00:00


Attachments
Draft patch (677 bytes, patch)
2024-01-23 01:11 UTC, LIU Hao
Details | Diff
Draft patch Ver. 2 (917 bytes, patch)
2024-01-24 02:16 UTC, LIU Hao
Details | Diff
proposed patch for master (2.01 KB, patch)
2025-02-20 17:12 UTC, LIU Hao
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Louis 2012-07-11 15:39:29 UTC
The bug is quite simple: when using -masm=intel and a global named "and", as does not accept the output of the compiler.




gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3

preprocessed file is
--- cut begin ---
# 1 "a.c"
# 1 "<interne>"
# 1 "<command-line>"
# 1 "a.c"
int and = 0;
int main()
{
  return and;
}
--- cut end ---

compiler output is
--- cut begin ---
gcc -v -masm=intel -save-temps a.c
Utilisation des specs internes.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/i686-linux-gnu/4.6/lto-wrapper
Target: i686-linux-gnu
Configuré avec: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.3-1ubuntu5' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --enable-targets=all --disable-werror --with-arch-32=i686 --with-tune=generic --enable-checking=release --build=i686-linux-gnu --host=i686-linux-gnu --target=i686-linux-gnu
Modèle de thread: posix
gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) 
COLLECT_GCC_OPTIONS='-v' '-masm=intel' '-save-temps' '-mtune=generic' '-march=i686'
 /usr/lib/gcc/i686-linux-gnu/4.6/cc1 -E -quiet -v -imultilib . -imultiarch i386-linux-gnu a.c -masm=intel -mtune=generic -march=i686 -fpch-preprocess -fstack-protector -o a.i
le répertoire « /usr/local/include/i386-linux-gnu » est ignoré car inexistant
le répertoire « /usr/lib/gcc/i686-linux-gnu/4.6/../../../../i686-linux-gnu/include » est ignoré car inexistant
la recherche pour #include "..." débute ici :
la recherche pour #include <...> débute ici:
 /usr/lib/gcc/i686-linux-gnu/4.6/include
 /usr/local/include
 /usr/lib/gcc/i686-linux-gnu/4.6/include-fixed
 /usr/include/i386-linux-gnu
 /usr/include
Fin de la liste de recherche.
COLLECT_GCC_OPTIONS='-v' '-masm=intel' '-save-temps' '-mtune=generic' '-march=i686'
 /usr/lib/gcc/i686-linux-gnu/4.6/cc1 -fpreprocessed a.i -quiet -dumpbase a.c -masm=intel -mtune=generic -march=i686 -auxbase a -version -fstack-protector -o a.s
GNU C (Ubuntu/Linaro 4.6.3-1ubuntu5) version 4.6.3 (i686-linux-gnu)
	compiled by GNU C version 4.6.3, GMP version 5.0.2, MPFR version 3.1.0-p3, MPC version 0.9
heuristiques GGC: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU C (Ubuntu/Linaro 4.6.3-1ubuntu5) version 4.6.3 (i686-linux-gnu)
	compiled by GNU C version 4.6.3, GMP version 5.0.2, MPFR version 3.1.0-p3, MPC version 0.9
heuristiques GGC: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 09c248eab598b9e2acb117da4cdbd785
COLLECT_GCC_OPTIONS='-v' '-masm=intel' '-save-temps' '-mtune=generic' '-march=i686'
 as --32 -o a.o a.s
a.s: Assembler messages:
a.s:21: Error: invalid use of operator "and"
--- cut end ---
Comment 1 tk 2020-09-05 07:49:26 UTC
Hello all,

I would like to report that I hit upon a related issue in GCC 10.0.1.  Besides complaining on "and", the assembly pass also complains if I use a symbol which happens to be the same as register name, e.g. "bx".

$ gcc-10 --version
gcc-10 (Ubuntu 10-20200411-0ubuntu1) 10.0.1 20200411 (experimental) [master revision bb87d5cc77d:75961caccb7:f883c46b4877f637e0fa5025b4d6b5c9040ec566]
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ cat test.c
int bx[16];

int f(unsigned x)
{
	return bx[x];
}
$ gcc-10 -c test.c -O3 -masm=intel
/tmp/ccGtGi2X.s: Assembler messages:
/tmp/ccGtGi2X.s:12: Error: invalid use of register

The offending line in the assembly code says
	lea	rax, bx[rip]

The problem does _not_ go away even if I quote the symbol name by hand in the assembly output, e.g.
	lea	rax, "bx"[rip]

Thank you!
Comment 2 Arseny Solokha 2020-09-06 18:42:04 UTC
(In reply to tk from comment #1)
> the assembly pass also complains if I use a
> symbol which happens to be the same as register name, e.g. "bx".

It was filed previously as PR87986 and PR95652. BTW, GCC does not include an assembler.
Comment 3 Jakub Jelinek 2020-09-07 10:21:04 UTC
The problem is that the intel asm syntax is just badly defined (broken by design).  I'm not aware of any compiler that would emit for such testcases something that could be assembled correctly with gas.
Comment 4 tk 2020-09-07 10:45:10 UTC
I have found that if I manually change
	lea	rax, bx[rip]
to something like
	lea	rax, __bx[rip]
	...
	.weakref __bx, bx
the assembly pass succeeds, with the correct results.

(It seems that the names "bx" and "and" only pose problems when they are used within expressions.  If they are used in a context which unequivocally demands a symbol, then gas can parse them.)

Thank you!
Comment 5 Jakub Jelinek 2020-09-07 10:54:04 UTC
It is far easier to use (the default) assembler syntax that is properly designed and doesn't have flaws like this.
Comment 6 jbeulich 2020-10-26 10:17:47 UTC
(In reply to Jakub Jelinek from comment #3)
> The problem is that the intel asm syntax is just badly defined (broken by
> design).  I'm not aware of any compiler that would emit for such testcases
> something that could be assembled correctly with gas.

At the risk of stating the obvious, Intel syntax implies that global symbols would have a prefix character appended, typically an underscore, or that otherwise global symbols avoid the assembler recognized identifiers. This sadly is a growing set as new register extensions get added. IOW people wanting to avoid having to rename their symbols eventually would need to also restrict the set of recognized register names via suitable .arch directives.(In reply to Jakub Jelinek from comment #5)

> It is far easier to use (the default) assembler syntax that is properly
> designed and doesn't have flaws like this.

While in the general case I agree, there are downsides when it comes to wanting to make use of macros to "stand in" for instructions, and then wanting to e.g. derive symbols from macro arguments specifying registers. Such macros need to go to some length to get rid of the % character.
Comment 7 jbeulich 2020-10-26 10:23:26 UTC
For the problem originally reported here (operator name space collision) a workaround could be introduced (e.g. a new operand to .intel_syntax to allow suppressing the recognition of MASM-like operands). I don't, however, see any option for register names (besides ".intel_syntax prefix" of course, but aiui this wouldn't work with the output gcc produces, and of course this mode isn't really "Intel Syntax" anymore anyway).
Comment 8 H.J. Lu 2020-12-31 15:07:06 UTC
*** Bug 98488 has been marked as a duplicate of this bug. ***
Comment 9 H.J. Lu 2020-12-31 15:08:00 UTC
*** Bug 87986 has been marked as a duplicate of this bug. ***
Comment 10 H.J. Lu 2020-12-31 15:08:30 UTC
*** Bug 95652 has been marked as a duplicate of this bug. ***
Comment 11 jbeulich 2021-06-10 07:25:03 UTC
I have a rough plan on the gas side, but that will then need a gcc side change as well: For a couple of years we have had quoted symbol names there. While this doesn't currently work right in a number of cases (including the one needed here) the plan is to make e.g.

    mov eax, "ecx"

not be treated the same as

    mov eax, ecx

but considering "ecx" a symbol name due to the quotation. Obviously gcc's configure mechanism would then need to detect the assemblers capability of understanding this, and quote symbol names accordingly (perhaps universally rather than special-casing any particular names).

While this isn't MASM-compatible (MASM treats "ecx" in such a case as an immediate), I view this as less of a problem than using e.g. Arm's model of enclosing a register name in parentheses to designate it as a symbol name:

    mov eax, (ecx)

MASM treats this the same as with no parentheses present, and I consider this form to be more likely to be used in code ported from MASM than double quoted literals used as immediate constants. (Regardless of the choice in the end it may turn out necessary to hide the new behavior behind a new command line option and/or directive extension.)
Comment 12 Andrew Pinski 2023-05-04 04:20:30 UTC
*** Bug 109726 has been marked as a duplicate of this bug. ***
Comment 13 LIU Hao 2023-05-04 04:37:18 UTC
dup notwithstanding, I think I had better copy my recommendation here for reference:



This is how MSVC handles such names:
(https://gcc.godbolt.org/z/TonjYaxqj)

```
static int* volatile rip;
static unsigned int volatile eax;

int
get_value(void)
  {
    return rip[eax];
  }
```

MSVC outputs:
```
get_value PROC                                      ; COMDAT
        mov     ecx, DWORD PTR eax
        mov     rax, QWORD PTR rip
        mov     eax, DWORD PTR [rax+rcx*4]
        ret     0
get_value ENDP
```

GCC outputs:
```
get_value:
        mov     rdx, QWORD PTR rip[rip]
        mov     eax, DWORD PTR eax[rip]
        mov     eax, DWORD PTR [rdx+rax*4]
        ret
```

In the case of MSVC, `DWORD PTR eax` is unambiguously parsed as the label `eax` and `DWORD PTR [eax]` is unambiguously parsed as the register `eax`. The address of all labels are always relative to RIP, but it is implied, and brackets are not written explicitly.


Maybe GCC can follow MSVC to omit the RIP register and brackets. The x86_64 memory reference syntax matches x86 with the only change in semantics of the immediate offset (for x86_64 it is relative to the next instruction, while for i686 it is absolute), but the opcode is the same.
Comment 14 jbeulich 2023-05-04 06:21:03 UTC
(In reply to LIU Hao from comment #13)
> MSVC outputs:
> ```
> get_value PROC                                      ; COMDAT
>         mov     ecx, DWORD PTR eax
>         mov     rax, QWORD PTR rip
>         mov     eax, DWORD PTR [rax+rcx*4]
>         ret     0
> get_value ENDP
> ```

Which as least MASM up to 12.x won't assemble. For one it complains about "rip" being undeclared. And then the load of "ecx" is _not_ a memory access (i.e. the "DWORD PTR" is ignored there). Which is in line with it also objecting to something like "extrn eax:dword".

I say this because I'd be happy to help this on the gas side, but only without breaking MASM compatibility. My present plan for gas is (as already outlined in #11) to make quoted identifiers unambiguously mean symbols, not registers. But of course that would still require a gcc side change as well. Unfortunately there continue to be inconsistencies in gas with quoted identifiers in general, and it's not entirely clear yet whether those may need addressing first.
Comment 15 LIU Hao 2023-05-04 07:02:37 UTC
> Which as least MASM up to 12.x won't assemble. For one it complains about
> "rip" being undeclared. And then the load of "ecx" is _not_ a memory access
> (i.e. the "DWORD PTR" is ignored there). Which is in line with it also
> objecting to something like "extrn eax:dword".

This is accepted by ML64:

```
PUBLIC	main
EXTRN	rip:DWORD
_TEXT	SEGMENT
main	PROC
	mov	eax, DWORD PTR rip
	ret	0
main	ENDP
_TEXT	ENDS
END
```

Does it make sense to create kinda compatibility mode for ML, in addition to MASM, if they are deemed to be incompatible? 


> I say this because I'd be happy to help this on the gas side, but only
> without breaking MASM compatibility. My present plan for gas is (as already
> outlined in #11) to make quoted identifiers unambiguously mean symbols, not
> registers. But of course that would still require a gcc side change as well.
> Unfortunately there continue to be inconsistencies in gas with quoted
> identifiers in general, and it's not entirely clear yet whether those may
> need addressing first.

That quoting thing will be yet another extension. I think we had better keep extensions as few as possible.
Comment 16 jbeulich 2023-05-04 07:15:43 UTC
(In reply to LIU Hao from comment #15)
> This is accepted by ML64:
> 
> ```
> PUBLIC	main
> EXTRN	rip:DWORD
> _TEXT	SEGMENT
> main	PROC
> 	mov	eax, DWORD PTR rip
> 	ret	0
> main	ENDP
> _TEXT	ENDS
> END
> ```

Which version? And did you try other register names? Unfortunately the newest I have access to right now is 12.x, and as said in #14 register names other than "rip" won't work there when (attempted to be) used as symbols. Clearly there's little point in dealing with "rip" alone.

> Does it make sense to create kinda compatibility mode for ML, in addition to
> MASM, if they are deemed to be incompatible? 

ML == MASM, at least for me (ML and ML64 are merely the names of the [non-ancient] executables).
Comment 17 LIU Hao 2023-05-04 07:24:12 UTC
Yeah. It looks to me like the Microsoft compiler doesn't actually uses the assembler (like LLVM).

Given the C source:
```
extern int rax;
int main() { return rax; }
```

which compiled without errors:
```
> cl /O2 /c test.c /Fatest.asm
Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30148 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

test.c
```

and produced this assembly file
```
include listing.inc

INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES

PUBLIC	main
EXTRN	rax:DWORD
_TEXT	SEGMENT
main	PROC						; COMDAT
	mov	eax, DWORD PTR rax
	ret	0
main	ENDP
_TEXT	ENDS
END
```

which can't be assembled
```
> ml64 /c test.asm
Microsoft (R) Macro Assembler (x64) Version 14.29.30148.0
Copyright (C) Microsoft Corporation.  All rights reserved.

 Assembling: test.asm
test.asm(9) : error A2008:syntax error : rax
test.asm(16) : error A2032:invalid use of register
```
Comment 18 LIU Hao 2023-05-04 13:14:47 UTC
Would it make any sense to have GAS be more permissive about such labels,

1. unconditionally? or
2. when input is from a pipe? or
3. when a special option is in effect e.g. `--output-from-gcc`?
Comment 19 LIU Hao 2023-05-11 10:58:55 UTC
(In reply to jbeulich from comment #11)
> I have a rough plan on the gas side, but that will then need a gcc side
> change as well: For a couple of years we have had quoted symbol names there.
> While this doesn't currently work right in a number of cases (including the
> one needed here) the plan is to make e.g.
> 
>     mov eax, "ecx"
> 
> not be treated the same as
> 
>     mov eax, ecx
> 
> but considering "ecx" a symbol name due to the quotation. Obviously gcc's

I don't like double quotes here, because it looks something different, like in C. Would it make some sense if we take the approach for MIPS and AArch64 [1], so

  mov eax, %ecx

or

  mov eax, :ecx

denotes `ecx` is the name of a label, and otherwise a register. Also, such a prefix should be optional, so people who write assembly can omit it if they carefully avoid such names.


[1] https://maskray.me/blog/2023-05-08-assemblers
Comment 20 jbeulich 2023-05-11 11:35:12 UTC
(In reply to LIU Hao from comment #19)
> (In reply to jbeulich from comment #11)
> > I have a rough plan on the gas side, but that will then need a gcc side
> > change as well: For a couple of years we have had quoted symbol names there.
> > While this doesn't currently work right in a number of cases (including the
> > one needed here) the plan is to make e.g.
> > 
> >     mov eax, "ecx"
> > 
> > not be treated the same as
> > 
> >     mov eax, ecx
> > 
> > but considering "ecx" a symbol name due to the quotation. Obviously gcc's
> 
> I don't like double quotes here, because it looks something different, like
> in C.

This is assembly; I don't see how (dis)similarity with C would matter. I also don't see how your example is any different in this regard from

    mov eax, "symbol"

which gas has been supporting for quite some time.

> Would it make some sense if we take the approach for MIPS and AArch64
> [1], so
> 
>   mov eax, %ecx
> 
> or
> 
>   mov eax, :ecx
> 
> denotes `ecx` is the name of a label, and otherwise a register. Also, such a
> prefix should be optional, so people who write assembly can omit it if they
> carefully avoid such names.

I can't find any indication of such syntax being supported by gas for either of these architectures. % on MIPS and : on Arm64 actually are involved in relocation specifiers instead. Are you suggesting to overload them?

(Note that % is out of game here, for being the register prefix on x86.)
Comment 21 LIU Hao 2023-05-11 12:19:44 UTC
(In reply to jbeulich from comment #20)
> This is assembly; I don't see how (dis)similarity with C would matter. I
> also don't see how your example is any different in this regard from
> 
>     mov eax, "symbol"
> 
> which gas has been supporting for quite some time.

oh really? I thought it would have to be implemented. If it's readily available, we can start making use of it right now.
Comment 22 jbeulich 2023-05-11 14:30:13 UTC
(In reply to LIU Hao from comment #21)
> oh really? I thought it would have to be implemented. If it's readily
> available, we can start making use of it right now.

Well, the general symbol part of it is there (with a few quirks, which I don't think would matter here). This missing part for quoted symbols matching register names was posted, see e.g. https://sourceware.org/pipermail/binutils/2023-May/127318.html.
Comment 23 LIU Hao 2023-05-11 14:48:08 UTC
Changes to GCC should look like this I suspect (I didn't test this):

```
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index fbd33a6bfd1..de80c7a805f 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -14080,7 +14080,11 @@ ix86_print_operand_address_as (FILE *file, rtx addr,
              if (flag_pic)
                output_pic_addr_const (file, disp, 0);
              else if (GET_CODE (disp) == LABEL_REF)
-               output_asm_label (disp);
+               {
+                 putc ('\"', file);
+                 output_asm_label (disp);
+                 putc ('\"', file);
+               }
              else if (CONST_INT_P (disp))
                offset = disp;
              else
```

It's a bit strange that `output_asm_label` writes output via a global `FILE*`.
Comment 24 LIU Hao 2024-01-18 05:38:13 UTC
I've composed a proposal to address this issue:

  https://github.com/lhmouse/mcfgthread/wiki/Formalized-Intel-Syntax-for-x86#the-proposal


The proposal is to treat names between `ptr` and `[` as symbols, and to treat to treat names between `[` and `]` as registers. This

   lea	rax, bx[rip]

should be rejected due to invalidity, while

   lea	rax, BYTE PTR bx[rip]

can be parsed as referencing the symbol `bx` with no ambiguity.
Comment 25 LIU Hao 2024-01-23 01:11:36 UTC
Created attachment 57191 [details]
Draft patch

This is a draft patch, bootstrapped on {i686,x86_64}-w64-mingw32 successfully. Haven't run tests though.
Comment 26 LIU Hao 2024-01-24 02:16:57 UTC
Created attachment 57199 [details]
Draft patch Ver. 2

1. Fix a typo in `ASM_OUTPUT_SYMBOL_REF`  (`x` => `SYM`)
2. For Intel syntax, if the name does not start with a `*`, then it is taken as a symbol,
   and is quoted.
3. If the name starts with a `*`, then it is a request for verbatim output. According to
   comments in 'dwarf2cfi.cc' which say 'dwarf2out.cc might give us a label expression
   (e.g. .LVL548-1) as second argument. If so, make it a subexpression, ... ' so the name
   may be a combined expression. In this case parse it for `+` or `-` where the symbol
   stops, then quote the symbol and print the remaining part verbatim.
Comment 27 Eric Gallager 2024-01-31 20:07:01 UTC
is this really a meta-bug? Normally meta-bugs depend on other bugs...
Comment 28 Andrew Pinski 2025-01-01 16:46:45 UTC
*** Bug 118268 has been marked as a duplicate of this bug. ***
Comment 29 Andrew Pinski 2025-01-09 16:02:24 UTC
*** Bug 118389 has been marked as a duplicate of this bug. ***
Comment 30 LIU Hao 2025-02-20 17:12:35 UTC
Created attachment 60542 [details]
proposed patch for master

Quote symbols so they are not to be mistaken for registers. This requires Binutils 2.26, where the fix was originally done for ARM: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=d02603dc201f80cd9d2a1f4b1a16110b1e04222b

Bootstrapped on i686-w64-mingw32 with DWARF2 exception model, and on x86_64-w64-mingw32 with SEH exception model, both patched to use Intel syntax by default.

Also bootstrapped on x86_64-linux-gnu with default AT&T syntax, and verified that it produces expected assembly with `-masm=intel`.