Bug 48385 - x86-64: Tail call recursion optimization with -mcmodel=large can generate invalid assembly (immediate operand illegal with absolute jump)
Summary: x86-64: Tail call recursion optimization with -mcmodel=large can generate inv...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.6.1
: P3 major
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: ABI, wrong-code
Depends on:
Blocks:
 
Reported: 2011-03-31 12:58 UTC by Martin Decky
Modified: 2012-11-08 10:24 UTC (History)
4 users (show)

See Also:
Host:
Target: x86_64-linux-gnu
Build:
Known to work:
Known to fail: 4.6.1
Last reconfirmed: 2011-06-30 22:59:32


Attachments
Preprocessed file (8.18 KB, text/plain)
2011-03-31 12:58 UTC, Martin Decky
Details
Initial proposed patch (271 bytes, patch)
2011-06-30 16:03 UTC, Martin Decky
Details | Diff
Short test case (522 bytes, text/plain)
2011-07-01 02:53 UTC, Martin Decky
Details
Preprocessed file for the short test case (576 bytes, text/plain)
2011-07-01 02:57 UTC, Martin Decky
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Decky 2011-03-31 12:58:00 UTC
Created attachment 23836 [details]
Preprocessed file

When using -mcmodel=large and -O3, a tail call to an extern function can generate an invalid assembly like in the following example:

    jmp *$memsetb

The correct assembly output should be perhaps:

    jmp *memsetb

The problem can be worked around by adding the "-fno-optimize-sibling-calls" option to the compiler command line.


Output of /usr/local/cross/amd64/bin/amd64-linux-gnu-gcc -v:

Using built-in specs.
COLLECT_GCC=/usr/local/cross/amd64/bin/amd64-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/usr/local/cross/amd64/libexec/gcc/amd64-linux-gnu/4.6.0/lto-wrapper
Target: amd64-linux-gnu
Configured with: /root/install/cross/amd64/gcc-4.6.0/configure --target=amd64-linux-gnu --prefix=/usr/local/cross/amd64 --program-prefix=amd64-linux-gnu- --with-gnu-as --with-gnu-ld --disable-nls --disable-threads --enable-languages=c,objc,c++,obj-c++ --disable-multilib --disable-libgcj --without-headers --disable-shared --enable-lto
Thread model: single
gcc version 4.6.0 (GCC)


Command line that triggered the bug:

/usr/local/cross/amd64/bin/amd64-linux-gnu-gcc -DKERNEL -DRELEASE=0.4.3 "-DNAME=Sashimi" -D__64_BITS__ -D__LE__ -Igeneric/include -O3 -imacros ../config.h -fexec-charset=UTF-8 -fwide-exec-charset=UTF-32LE -finput-charset=UTF-8 -ffreestanding -fno-builtin -nostdlib -nostdinc -Wall -Wextra -Wno-unused-parameter -Wmissing-prototypes -Werror-implicit-function-declaration -Wwrite-strings -pipe -Werror -m64 -mcmodel=large -mno-red-zone -fno-unwind-tables -fno-omit-frame-pointer -march=opteron -Itest/  -mno-sse -mno-sse2  -c -o genarch/src/mm/page_pt.o genarch/src/mm/page_pt.c


Compiler output:

{standard input}: Assembler messages:
{standard input}:722: Error: immediate operand illegal with absolute jump
Comment 1 Martin Decky 2011-06-29 15:01:46 UTC
The bug is still present in GCC 4.6.1.
Comment 2 Martin Decky 2011-06-30 16:03:08 UTC
Created attachment 24646 [details]
Initial proposed patch

The attached patch works as a temporary workaround and might also hint where exactly the problem is.

Now, guys, please don't crucify me for this patch. I am well aware that this patch is no more than a dirty hack and probably breaks other thinks. I present it here only to provoke some reaction from somebody who knows GCC sources well enough to propose a real solution.

I have just spent some 4 hours browsing the sources, analysing relevant functions such as output_asm_insn(), ix86_print_operand(), print_reg() and similar to figure out how to change the way the tail call instruction is generated for this particular case. But I would really appreciate a little help from a senior GCC developer who not only knows what and how, but also why.

Thanks in advance!
Comment 3 H.J. Lu 2011-06-30 22:59:32 UTC
Please provide a small testcase.
Comment 4 H.J. Lu 2011-06-30 23:01:37 UTC
Confirmed.
Comment 5 H.J. Lu 2011-06-30 23:11:09 UTC
[hjl@gnu-33 delta]$ cat testcase.c   
typedef unsigned char uint8_t;
 typedef unsigned long int uint64_t;
 typedef uint64_t size_t;
 typedef uint64_t uintptr_t;
 typedef uint8_t bool;
 typedef struct {
  unsigned int unused:
1;
  unsigned int addr_12_31 : 30;
  unsigned int addr_32_51 : 21;
 } __attribute__ ((packed)) pte_t;
 typedef struct {
  pte_t *page_table;
 } as_genarch_t;
 typedef struct as {
  as_genarch_t genarch;
 } as_t;
 void pt_mapping_remove(as_t *as, uintptr_t page) {
  pte_t *ptl0 = (pte_t *) (((uintptr_t) ((uintptr_t) as->genarch.page_table)) + 0xffff800000000000UL);
  pte_t *ptl1 = (pte_t *) (((uintptr_t) (((pte_t *) ((((uint64_t) ((pte_t *) (ptl0))[((((page) >> 39) & 0x1ffU))].addr_12_31) << 12) | (((uint64_t) ((pte_t *) (ptl0))[((((page) >> 39) & 0x1ffU))].addr_32_51) << 32))))) + 0xffff800000000000UL);
  pte_t *ptl2 = (pte_t *) (((uintptr_t) (((pte_t *) ((((uint64_t) ((pte_t *) (ptl1))[((((page) >> 30) & 0x1ffU))].addr_12_31) << 12) | (((uint64_t) ((pte_t *) (ptl1))[((((page) >> 30) & 0x1ffU))].addr_32_51) << 32))))) + 0xffff800000000000UL);
  pte_t *ptl3 = (pte_t *) (((uintptr_t) (((pte_t *) ((((uint64_t) ((pte_t *) (ptl2))[((((page) >> 21) & 0x1ffU))].addr_12_31) << 12) | (((uint64_t) ((pte_t *) (ptl2))[((((page) >> 21) & 0x1ffU))].addr_32_51) << 32))))) + 0xffff800000000000UL);
  memsetb(&ptl3[(((page) >> 12) & 0x1ffU)], sizeof(pte_t), 0);
  bool empty = 1;
  unsigned int i;
  for (i = 0;
 i < 512;
 i++) {
   if ((*((uint64_t *) ((&ptl3[i]))) != 0)) {
    empty = 0;
   }
  }
  for (i = 0;
 i < 512;
 i++) {
   if ((*((uint64_t *) ((&ptl2[i]))) != 0)) {
    empty = 0;
   }
  }
  if (empty) {
   frame_free((((uintptr_t) ((uintptr_t) ptl1)) - 0xffff800000000000UL));
   memsetb(&ptl0[(((page) >> 39) & 0x1ffU)], sizeof(pte_t), 0);
  }
 }
[hjl@gnu-33 delta]$
Comment 6 Martin Decky 2011-07-01 02:53:18 UTC
Created attachment 24650 [details]
Short test case

Thanks, H.J. Lu, for providing the short test case. I have just added three extern declarations to make it compile in my freestanding setup and I confirm that it demonstrates the bug in my environment.

I'll post the save-temps output shortly.
Comment 7 Martin Decky 2011-07-01 02:57:21 UTC
Created attachment 24651 [details]
Preprocessed file for the short test case

Output of /usr/local/cross/amd64/bin/amd64-linux-gnu-gcc -v:

Using built-in specs.
COLLECT_GCC=/usr/local/cross/amd64/bin/amd64-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/usr/local/cross/amd64/libexec/gcc/amd64-linux-gnu/4.6.1/lto-wrapper
Target: amd64-linux-gnu
Configured with: /root/install/cross/amd64/gcc-4.6.1/configure --target=amd64-linux-gnu --prefix=/usr/local/cross/amd64 --program-prefix=amd64-linux-gnu- --with-gnu-as --with-gnu-ld --disable-nls --disable-threads --enable-languages=c,objc,c++,obj-c++ --disable-multilib --disable-libgcj --without-headers --disable-shared --enable-lto
Thread model: single
gcc version 4.6.1 (GCC) 

Command line that triggered the bug:

/usr/local/cross/amd64/bin/amd64-linux-gnu-gcc -DKERNEL -DRELEASE=0.4.3 "-DNAME=Sashimi" -D__64_BITS__ -D__LE__ -Igeneric/include -O3 -imacros ../config.h -fexec-charset=UTF-8 -fwide-exec-charset=UTF-32LE -finput-charset=UTF-8 -ffreestanding -fno-builtin -nostdlib -nostdinc -std=gnu99 -Wall -Wextra -Wno-unused-parameter -Wmissing-prototypes -Werror-implicit-function-declaration -Wwrite-strings -pipe -Werror -m64 -mcmodel=large -mno-red-zone -fno-unwind-tables -fno-omit-frame-pointer -march=opteron -Itest/  -mno-sse -mno-sse2  -c -o genarch/src/mm/page_pt.o genarch/src/mm/page_pt.c

Compiler output:

{standard input}: Assembler messages:
{standard input}:284: Error: immediate operand illegal with absolute jump
Comment 8 Mikael Pettersson 2012-11-07 19:47:58 UTC
This was fixed for gcc-4.6.2 in r176841, the 4.6 fix for the essentially identical issue reported as PR49866.  The generated assembly for the test case in comment #7 changed as follows in r176841:

--- pr48385.s-r176840   2012-11-07 20:33:29.000000000 +0100
+++ pr48385.s-r176841   2012-11-07 20:37:54.000000000 +0100
@@ -205,7 +205,8 @@
        popq    %r15
        .cfi_def_cfa_offset 8
        movl    $7, %esi
-       jmp     *$memsetb
+       movabsq $memsetb, %rax
+       jmp     *%rax
        .cfi_endproc
 .LFE0:
        .size   pt_mapping_remove, .-pt_mapping_remove

The test case also works fine with gcc-4.7.2 (contrary to what the known-to-fail line states).  I think this should be closed as a duplicate of PR49886.
Comment 9 Martin Decky 2012-11-08 10:24:06 UTC
I can confirm that the bug is no longer present in 4.7.2, as noted in comment #8. Therefore I am closing this bug as resolved/fixed. I am not marking it as duplicate of 49866 since this bug was reported earlier and to a different version (but feel free to change this).

Thanks for fixing this!