Created attachment 47467 [details] Preprocessed test source file for reproducing the issue. Compiling the attached test program on x86_64 with optimisations enabled produces code which segfaults in a movdqa instruction accessing unaligned memory. This was observed in code which passes an unsigned __int128 as a variadic function argument when we attempt to retrieve the argument with va_arg(). movdqa is expecting a 16-byte aligned memory location, but instead we get an unaligned address on the stack pointing to our variadic argument within the va_list's reg_save_area. This results in a segmentation fault. 12 b = va_arg(args, unsigned __int128); 0x000055555555517b <+50>: movl $0x18,(%rsp) => 0x0000555555555182 <+57>: movdqa 0x28(%rsp),%xmm0 # segfault here! 0x0000555555555188 <+63>: movaps %xmm0,0x2ec1(%rip) # 0x555555558050 <b> (gdb) p $rsp+0x28 $1 = (void *) 0x7fffffffe2d8 (gdb) p args $2 = {{gp_offset = 24, fp_offset = 0, overflow_arg_area = 0x7fffffffe310, reg_save_area = 0x7fffffffe2d0}} Compiler invocation: gcc -g -O1 test.c -o test GCC information: $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-shared --enable-threads=posix --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --enable-multilib --disable-werror --enable-checking=release --enable-default-pie --enable-default-ssp --enable-cet=auto gdc_include_dir=/usr/include/dlang/gdc Thread model: posix gcc version 9.2.0 (GCC) libc information: $ /usr/lib/libc.so.6 GNU C Library (GNU libc) stable release version 2.30. Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 9.2.0. libc ABIs: UNIQUE IFUNC ABSOLUTE
Created attachment 47468 [details] Original test source file.
I've started by checking the argument passing against the psABI: struct S { long long a, b; }; struct __attribute__((aligned (16))) T { long long a, b; }; void f1 (__int128); void f2 (int, __int128); void f3 (int, int, __int128); void f4 (int, int, int, __int128); void f5 (int, int, int, int, __int128); void f6 (int, int, int, int, int, __int128); void f7 (int, int, int, int, int, int, __int128); void f8 (int, int, int, int, int, int, int, __int128); void f9 (int, ...); void f10 (struct S); void f11 (int, struct S); void f12 (int, int, struct S); void f13 (int, int, int, struct S); void f14 (int, int, int, int, struct S); void f15 (int, int, int, int, int, struct S); void f16 (int, int, int, int, int, int, struct S); void f17 (int, int, int, int, int, int, int, struct S); void f18 (struct T); void f19 (int, struct T); void f20 (int, int, struct T); void f21 (int, int, int, struct T); void f22 (int, int, int, int, struct T); void f23 (int, int, int, int, int, struct T); void f24 (int, int, int, int, int, int, struct T); void f25 (int, int, int, int, int, int, int, struct T); void foo () { __int128 i = -1; struct S s = { -1, -1 }; struct T t = { -1, -1 }; f1 (-1); f2 (0, -1); f3 (0, 0, -1); f4 (0, 0, 0, -1); f5 (0, 0, 0, 0, -1); f6 (0, 0, 0, 0, 0, -1); f7 (0, 0, 0, 0, 0, 0, -1); f8 (0, 0, 0, 0, 0, 0, 0, -1); f9 (0, i); f9 (0, 0, i); f9 (0, 0, 0, i); f9 (0, 0, 0, 0, i); f9 (0, 0, 0, 0, 0, i); f9 (0, 0, 0, 0, 0, 0, i); f9 (0, 0, 0, 0, 0, 0, 0, i); f10 (s); f11 (0, s); f12 (0, 0, s); f13 (0, 0, 0, s); f14 (0, 0, 0, 0, s); f15 (0, 0, 0, 0, 0, s); f16 (0, 0, 0, 0, 0, 0, s); f17 (0, 0, 0, 0, 0, 0, 0, s); f9 (0, s); f9 (0, 0, s); f9 (0, 0, 0, s); f9 (0, 0, 0, 0, s); f9 (0, 0, 0, 0, 0, s); f9 (0, 0, 0, 0, 0, 0, s); f9 (0, 0, 0, 0, 0, 0, 0, s); f18 (t); f19 (0, t); f20 (0, 0, t); f21 (0, 0, 0, t); f22 (0, 0, 0, 0, t); f23 (0, 0, 0, 0, 0, t); f24 (0, 0, 0, 0, 0, 0, t); f25 (0, 0, 0, 0, 0, 0, 0, t); f9 (0, t); f9 (0, 0, t); f9 (0, 0, 0, t); f9 (0, 0, 0, 0, t); f9 (0, 0, 0, 0, 0, t); f9 (0, 0, 0, 0, 0, 0, t); f9 (0, 0, 0, 0, 0, 0, 0, t); } I believe GCC follows the psABI here: "For classification purposes __int128 is treated as if it were implemented as: typedef struct { long low, high; } __int128; with the exception that arguments of type __int128 that are stored in memory must be aligned on a 16-byte boundary." and __int128 is passed the same as the aligned(16) structure containing two long/long long fields, in particular, if both halves fit in general purpose registers, they are passed there, without any gaps, if only half fits into gpr and the other would need to go onto stack, it is passed on stack, and when passed on stack, it is passed aligned on 16-byte boundary. So, I'd say the bug is in the x86_64 va_arg expansion that when reading the __int128 (and maybe > 8 bytes aligned struct too) from the area of spilled gprs it should use 8 byte alignment (i.e. lower than the type has), will keep looking into that. Also checked with clang and that one seems to ignore the psABI completely, hapilly passing the __int128 partially in gpr (%r9) and on the stack (that is the f6 call and f9 (0, 0, 0, 0, 0, i);), which violates the psABI: "If there are no registers available for any eightbyte of an argument, the whole argument is passed on the stack." or that the __int128 should be passed 16-byte aligned on the stack (that is the f8 call and f9 (0, 0, 0, 0, 0, 0, 0, -1);), which violates the psABI: "with the exception that arguments of type __int128 that are stored in memory must be aligned on a 16-byte boundary." Note, the passing of struct S and struct T seems to be correct in both compilers.
Created attachment 47481 [details] gcc10-pr92904.patch Untested fix.
Author: jakub Date: Fri Dec 13 00:09:34 2019 New Revision: 279327 URL: https://gcc.gnu.org/viewcvs?rev=279327&root=gcc&view=rev Log: PR target/92904 * config/i386/i386.c (ix86_gimplify_va_arg): If need_intregs and not need_temp, decrease alignment of the read because the GPR save area only guarantees 8-byte alignment. * gcc.c-torture/execute/pr92904.c: New test. Added: trunk/gcc/testsuite/gcc.c-torture/execute/pr92904.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog
Fixed on the trunk so far.
Author: jakub Date: Fri Dec 20 17:43:23 2019 New Revision: 279673 URL: https://gcc.gnu.org/viewcvs?rev=279673&root=gcc&view=rev Log: Backported from mainline 2019-12-12 Jakub Jelinek <jakub@redhat.com> PR target/92904 * config/i386/i386.c (ix86_gimplify_va_arg): If need_intregs and not need_temp, decrease alignment of the read because the GPR save area only guarantees 8-byte alignment. * gcc.c-torture/execute/pr92904.c: New test. Added: branches/gcc-9-branch/gcc/testsuite/gcc.c-torture/execute/pr92904.c Modified: branches/gcc-9-branch/gcc/ChangeLog branches/gcc-9-branch/gcc/config/i386/i386.c branches/gcc-9-branch/gcc/testsuite/ChangeLog
The releases/gcc-8 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:ffb5cc9a5599b1936c5ebea153ca52a0aa2c785d commit r8-9995-gffb5cc9a5599b1936c5ebea153ca52a0aa2c785d Author: Jakub Jelinek <jakub@redhat.com> Date: Fri Feb 14 15:24:48 2020 +0100 backport: re PR target/92904 (varargs for __int128 is placed at an unaligned location and uses movdqa for the load) Backported from mainline 2019-12-12 Jakub Jelinek <jakub@redhat.com> PR target/92904 * config/i386/i386.c (ix86_gimplify_va_arg): If need_intregs and not need_temp, decrease alignment of the read because the GPR save area only guarantees 8-byte alignment. * gcc.c-torture/execute/pr92904.c: New test.
Fixed for 8.4+ and 9.3+ too.