Bug 92904 - varargs for __int128 is placed at an unaligned location and uses movdqa for the load
Summary: varargs for __int128 is placed at an unaligned location and uses movdqa for t...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 9.2.0
: P3 normal
Target Milestone: 8.4
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2019-12-11 00:47 UTC by Jesse Huard
Modified: 2020-02-14 17:26 UTC (History)
4 users (show)

See Also:
Host:
Target: x86_64-linux-gnu
Build:
Known to work: 10.0, 9.2.1
Known to fail: 9.2.0
Last reconfirmed: 2019-12-12 00:00:00


Attachments
Preprocessed test source file for reproducing the issue. (753 bytes, text/plain)
2019-12-11 00:47 UTC, Jesse Huard
Details
Original test source file. (209 bytes, text/x-csrc)
2019-12-11 00:49 UTC, Jesse Huard
Details
gcc10-pr92904.patch (2.29 KB, patch)
2019-12-12 14:47 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jesse Huard 2019-12-11 00:47:55 UTC
Created attachment 47467 [details]
Preprocessed test source file for reproducing the issue.

Compiling the attached test program on x86_64 with optimisations enabled produces code which segfaults in a movdqa instruction accessing unaligned memory. This was observed in code which passes an unsigned __int128 as a variadic function argument when we attempt to retrieve the argument with va_arg().

movdqa is expecting a 16-byte aligned memory location, but instead we get an unaligned address on the stack pointing to our variadic argument within the va_list's reg_save_area. This results in a segmentation fault.
	
12	    b = va_arg(args, unsigned __int128);
   0x000055555555517b <+50>:	movl   $0x18,(%rsp)
=> 0x0000555555555182 <+57>:	movdqa 0x28(%rsp),%xmm0          # segfault here!
   0x0000555555555188 <+63>:	movaps %xmm0,0x2ec1(%rip)        # 0x555555558050 <b>

(gdb) p $rsp+0x28
$1 = (void *) 0x7fffffffe2d8

(gdb) p args
$2 = {{gp_offset = 24, fp_offset = 0, overflow_arg_area = 0x7fffffffe310, reg_save_area = 0x7fffffffe2d0}}

Compiler invocation:

gcc -g -O1 test.c -o test

GCC information:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-shared --enable-threads=posix --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --enable-multilib --disable-werror --enable-checking=release --enable-default-pie --enable-default-ssp --enable-cet=auto gdc_include_dir=/usr/include/dlang/gdc
Thread model: posix
gcc version 9.2.0 (GCC)

libc information:

$ /usr/lib/libc.so.6
GNU C Library (GNU libc) stable release version 2.30.
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 9.2.0.
libc ABIs: UNIQUE IFUNC ABSOLUTE
Comment 1 Jesse Huard 2019-12-11 00:49:09 UTC
Created attachment 47468 [details]
Original test source file.
Comment 2 Jakub Jelinek 2019-12-12 11:51:44 UTC
I've started by checking the argument passing against the psABI:
struct S { long long a, b; };
struct __attribute__((aligned (16))) T { long long a, b; };
void f1 (__int128);
void f2 (int, __int128);
void f3 (int, int, __int128);
void f4 (int, int, int, __int128);
void f5 (int, int, int, int, __int128);
void f6 (int, int, int, int, int, __int128);
void f7 (int, int, int, int, int, int, __int128);
void f8 (int, int, int, int, int, int, int, __int128);
void f9 (int, ...);
void f10 (struct S);
void f11 (int, struct S);
void f12 (int, int, struct S);
void f13 (int, int, int, struct S);
void f14 (int, int, int, int, struct S);
void f15 (int, int, int, int, int, struct S);
void f16 (int, int, int, int, int, int, struct S);
void f17 (int, int, int, int, int, int, int, struct S);
void f18 (struct T);
void f19 (int, struct T);
void f20 (int, int, struct T);
void f21 (int, int, int, struct T);
void f22 (int, int, int, int, struct T);
void f23 (int, int, int, int, int, struct T);
void f24 (int, int, int, int, int, int, struct T);
void f25 (int, int, int, int, int, int, int, struct T);

void
foo ()
{
  __int128 i = -1;
  struct S s = { -1, -1 };
  struct T t = { -1, -1 };
  f1 (-1);
  f2 (0, -1);
  f3 (0, 0, -1);
  f4 (0, 0, 0, -1);
  f5 (0, 0, 0, 0, -1);
  f6 (0, 0, 0, 0, 0, -1);
  f7 (0, 0, 0, 0, 0, 0, -1);
  f8 (0, 0, 0, 0, 0, 0, 0, -1);
  f9 (0, i);
  f9 (0, 0, i);
  f9 (0, 0, 0, i);
  f9 (0, 0, 0, 0, i);
  f9 (0, 0, 0, 0, 0, i);
  f9 (0, 0, 0, 0, 0, 0, i);
  f9 (0, 0, 0, 0, 0, 0, 0, i);
  f10 (s);
  f11 (0, s);
  f12 (0, 0, s);
  f13 (0, 0, 0, s);
  f14 (0, 0, 0, 0, s);
  f15 (0, 0, 0, 0, 0, s);
  f16 (0, 0, 0, 0, 0, 0, s);
  f17 (0, 0, 0, 0, 0, 0, 0, s);
  f9 (0, s);
  f9 (0, 0, s);
  f9 (0, 0, 0, s);
  f9 (0, 0, 0, 0, s);
  f9 (0, 0, 0, 0, 0, s);
  f9 (0, 0, 0, 0, 0, 0, s);
  f9 (0, 0, 0, 0, 0, 0, 0, s);
  f18 (t);
  f19 (0, t);
  f20 (0, 0, t);
  f21 (0, 0, 0, t);
  f22 (0, 0, 0, 0, t);
  f23 (0, 0, 0, 0, 0, t);
  f24 (0, 0, 0, 0, 0, 0, t);
  f25 (0, 0, 0, 0, 0, 0, 0, t);
  f9 (0, t);
  f9 (0, 0, t);
  f9 (0, 0, 0, t);
  f9 (0, 0, 0, 0, t);
  f9 (0, 0, 0, 0, 0, t);
  f9 (0, 0, 0, 0, 0, 0, t);
  f9 (0, 0, 0, 0, 0, 0, 0, t);
}

I believe GCC follows the psABI here:
"For classification purposes __int128 is treated as if it were implemented
as:
typedef struct {
long low, high;
} __int128;
with the exception that arguments of type __int128 that are stored in
memory must be aligned on a 16-byte boundary."
and __int128 is passed the same as the aligned(16) structure containing two long/long long fields, in particular, if both halves fit in general purpose registers, they are passed there, without any gaps, if only half fits into gpr and the other would need to go onto stack, it is passed on stack, and when passed on stack, it is passed aligned on 16-byte boundary.
So, I'd say the bug is in the x86_64 va_arg expansion that when reading the __int128 (and maybe > 8 bytes aligned struct too) from the area of spilled gprs it should use 8 byte alignment (i.e. lower than the type has), will keep looking into that.
Also checked with clang and that one seems to ignore the psABI completely,
hapilly passing the __int128 partially in gpr (%r9) and on the stack (that is the f6 call and f9 (0, 0, 0, 0, 0, i);), which violates the psABI:
"If there are no registers available for any eightbyte of an argument, the whole
argument is passed on the stack."
or that the __int128 should be passed 16-byte aligned on the stack (that is the f8 call and f9 (0, 0, 0, 0, 0, 0, 0, -1);), which violates the psABI:
"with the exception that arguments of type __int128 that are stored in
memory must be aligned on a 16-byte boundary."
Note, the passing of struct S and struct T seems to be correct in both compilers.
Comment 3 Jakub Jelinek 2019-12-12 14:47:39 UTC
Created attachment 47481 [details]
gcc10-pr92904.patch

Untested fix.
Comment 4 Jakub Jelinek 2019-12-13 00:10:05 UTC
Author: jakub
Date: Fri Dec 13 00:09:34 2019
New Revision: 279327

URL: https://gcc.gnu.org/viewcvs?rev=279327&root=gcc&view=rev
Log:
	PR target/92904
	* config/i386/i386.c (ix86_gimplify_va_arg): If need_intregs and
	not need_temp, decrease alignment of the read because the GPR save
	area only guarantees 8-byte alignment.

	* gcc.c-torture/execute/pr92904.c: New test.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr92904.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/testsuite/ChangeLog
Comment 5 Jakub Jelinek 2019-12-13 10:29:18 UTC
Fixed on the trunk so far.
Comment 6 Jakub Jelinek 2019-12-20 17:43:54 UTC
Author: jakub
Date: Fri Dec 20 17:43:23 2019
New Revision: 279673

URL: https://gcc.gnu.org/viewcvs?rev=279673&root=gcc&view=rev
Log:
	Backported from mainline
	2019-12-12  Jakub Jelinek  <jakub@redhat.com>

	PR target/92904
	* config/i386/i386.c (ix86_gimplify_va_arg): If need_intregs and
	not need_temp, decrease alignment of the read because the GPR save
	area only guarantees 8-byte alignment.

	* gcc.c-torture/execute/pr92904.c: New test.

Added:
    branches/gcc-9-branch/gcc/testsuite/gcc.c-torture/execute/pr92904.c
Modified:
    branches/gcc-9-branch/gcc/ChangeLog
    branches/gcc-9-branch/gcc/config/i386/i386.c
    branches/gcc-9-branch/gcc/testsuite/ChangeLog
Comment 7 GCC Commits 2020-02-14 16:35:55 UTC
The releases/gcc-8 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:ffb5cc9a5599b1936c5ebea153ca52a0aa2c785d

commit r8-9995-gffb5cc9a5599b1936c5ebea153ca52a0aa2c785d
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Feb 14 15:24:48 2020 +0100

    backport: re PR target/92904 (varargs for __int128 is placed at an unaligned location and uses movdqa for the load)
    
    	Backported from mainline
    	2019-12-12  Jakub Jelinek  <jakub@redhat.com>
    
    	PR target/92904
    	* config/i386/i386.c (ix86_gimplify_va_arg): If need_intregs and
    	not need_temp, decrease alignment of the read because the GPR save
    	area only guarantees 8-byte alignment.
    
    	* gcc.c-torture/execute/pr92904.c: New test.
Comment 8 Jakub Jelinek 2020-02-14 17:26:57 UTC
Fixed for 8.4+ and 9.3+ too.