Created attachment 57270 [details] reduced testcase Output: $ x86_64-pc-linux-gnu-gcc -O2 -fprofile -mcmodel=large -mavx testcase.c $ ./a.out Aborted "b" (and then "x") seems to be garbage. $ x86_64-pc-linux-gnu-gcc -v Using built-in specs. COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r14-8665-20240131161256-g3fed1609f61-checking-yes-rtl-df-extra-nobootstrap-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/14.0.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --disable-bootstrap --with-cloog --with-ppl --with-isl --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld --with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-r14-8665-20240131161256-g3fed1609f61-checking-yes-rtl-df-extra-nobootstrap-amd64 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 14.0.1 20240131 (experimental) (GCC)
Confirmed. s/511/(256-63)/ is the min to reproduce this issue. Basically just enough to auto-vectorize. I get the feeling that mcount causes alignment differences ...
Seems like a backend bug to me, collision between function profiler after prologue and drap. I see foo: leaq 8(%rsp), %r10 andq $-32, %rsp pushq -8(%r10) pushq %rbp movq %rsp, %rbp pushq %r14 pushq %r13 pushq %r12 pushq %r10 pushq %rbx subq $200, %rsp 1: movabsq $mcount, %r10 call *%r10 xorl %eax, %eax xorl %edx, %edx movl $-511, %r9d addb $-1, %dl movq %rax, %rdx sbbq (%r10), %rdx This function is stack_realign_drap, find_drap_reg returns R10_REG and so ix86_get_drap_rtx uses %r10 as drap register. Later on pro_and_epilogue initializes the drap register in the prologue. And, final.cc when seeing NOTE_INSN_PROLOGUE_END emits the late FUNCTION_PROFILER, which seems to have clobbering of %r10 (and/or %r11) hardcoded in it, so it overwrites the drap value. One doesn't need _BitInt to reproduce: /* { dg-do run { target lp64 } } */ /* { dg-options "-O2 -fprofile -mcmodel=large" } */ __attribute__((noipa)) void bar (char *x, char *y, int *z) { x[0] = 42; y[0] = 42; if (z[0] != 16) __builtin_abort (); } __attribute__((noipa)) void foo (int c, int d, int e, int f, int g, int h, int z) { typedef char B[32]; B b __attribute__((aligned (32))); bar (&b[0], __builtin_alloca (z), &z); } int main () { foo (0, 0, 0, 0, 0, 0, 16); } Started with r11-6548-g1b885264a48dcd71b7aeb26c0abeb91246724897
Anyway, if flag_fentry == 0, it doesn't seem to be safe to clobber any registers to me, sure, the code could test if %r10 or %r11 are ever live in the current function (if that information is up to date at that point, or remember it during prologue generation when we know we need function profiler with flag_fentry == 0), but if they are live, I think it needs to do something different. Or, can it use say %r12 instead of %r10 if %r10 is drap?
A patch is at https://patchwork.sourceware.org/project/gcc/list/?series=30482
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>: https://gcc.gnu.org/g:51f8ac3341078303e81e72d9013698a31c5ddd29 commit r14-8808-g51f8ac3341078303e81e72d9013698a31c5ddd29 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Feb 1 08:02:27 2024 -0800 x86-64: Find a scratch register for large model profiling 2 scratch registers, %r10 and %r11, are available at function entry for large model profiling. But %r10 may be used by stack realignment and we can't use %r10 in this case. Add x86_64_select_profile_regnum to find a caller-saved register which isn't live or a callee-saved register which has been saved on stack in the prologue at entry for large model profiling and sorry if we can't find one. gcc/ PR target/113689 * config/i386/i386.cc (x86_64_select_profile_regnum): New. (x86_function_profiler): Call x86_64_select_profile_regnum to get a scratch register for large model profiling. gcc/testsuite/ PR target/113689 * gcc.target/i386/pr113689-1.c: New file. * gcc.target/i386/pr113689-2.c: Likewise. * gcc.target/i386/pr113689-3.c: Likewise.
Fixed for GCC 14 so far.
This patch broke Solaris/x86 (i386-pc-solaris2.11) bootstrap: /vol/gcc/src/hg/master/local/gcc/config/i386/i386.cc: In function 'void x86_function_profiler(std::FILE*, int)': /vol/gcc/src/hg/master/local/gcc/config/i386/i386.cc:22838:40: error: array subscript -1 is below array bounds of 'const char* const [92]' [-Werror=array-bounds=] 22838 | reg = hi_reg_name[scratch]; | ~~~~~~~~~~~~~~~~~~~^c-include=/vol/gcc/include /vol/gcc/src/hg/master/local/gcc/config/i386/i386.cc:138:26: note: while referencing 'hi_reg_name' --with-target-bdw-gc-lib=/vol/gcc/lib,amd64=/vo 138 | static const char *const hi_reg_name[] = HI_REGISTER_NAMES; | ^~~~~~~~~~~
(In reply to Rainer Orth from comment #7) > This patch broke Solaris/x86 (i386-pc-solaris2.11) bootstrap: > > /vol/gcc/src/hg/master/local/gcc/config/i386/i386.cc: In function 'void > x86_function_profiler(std::FILE*, int)': > /vol/gcc/src/hg/master/local/gcc/config/i386/i386.cc:22838:40: error: array > subscript -1 is below array bounds of 'const char* const [92]' > [-Werror=array-bounds=] > 22838 | reg = hi_reg_name[scratch]; > | ~~~~~~~~~~~~~~~~~~~^c-include=/vol/gcc/include > /vol/gcc/src/hg/master/local/gcc/config/i386/i386.cc:138:26: note: while > referencing 'hi_reg_name' > --with-target-bdw-gc-lib=/vol/gcc/lib,amd64=/vo 138 | static const char > *const hi_reg_name[] = HI_REGISTER_NAMES; > | ^~~~~~~~~~~ Guess for the case where we issue sorry we shouldn't return INVALID_REGNUM, but R10_REG or any other, we acknowledged we aren't emitting correct assembly already. And, as I wrote earlier, I think if we did similar discovery at pro_and_epilogue time and find we otherwise wouldn't have any registers for it and would sorry at final time, it might be better to just pick some call used register and forcibly save it even when it isn't strictly needed. Then it wouldn't sorry at final time anymore.
Like this? diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index f02c6c02ac6..ed0b0e19985 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -22785,10 +22785,10 @@ x86_64_select_profile_regnum (bool r11_ok ATTRIBUTE_UNUSED) && !REGNO_REG_SET_P (reg_live, i)))) return i; - sorry ("no register available for profiling %<-mcmodel=large%s%>", + sorry ("no register available for profiling %<-mcmodel=large%s%>, use r10", ix86_cmodel == CM_LARGE_PIC ? " -fPIC" : ""); - return INVALID_REGNUM; + return R10_REG; } /* Output assembler code to FILE to increment profiler label # LABELNO
(In reply to H.J. Lu from comment #9) > Like this? > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index f02c6c02ac6..ed0b0e19985 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -22785,10 +22785,10 @@ x86_64_select_profile_regnum (bool r11_ok > ATTRIBUTE_UNUSED) > && !REGNO_REG_SET_P (reg_live, i)))) > return i; > > - sorry ("no register available for profiling %<-mcmodel=large%s%>", > + sorry ("no register available for profiling %<-mcmodel=large%s%>, use > r10", > ix86_cmodel == CM_LARGE_PIC ? " -fPIC" : ""); > > - return INVALID_REGNUM; > + return R10_REG; > } > > /* Output assembler code to FILE to increment profiler label # LABELNO Just the second hunk. I think with sorry call the compilation fails, so what you actually emit doesn't matter (one can see it with -pipe, sure).
(In reply to Jakub Jelinek from comment #10) > > Just the second hunk. I think with sorry call the compilation fails, so what > you actually emit doesn't matter (one can see it with -pipe, sure). Done.
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>: https://gcc.gnu.org/g:f2a060820c24724bb48ee006d257c449e4d94b72 commit r14-8831-gf2a060820c24724bb48ee006d257c449e4d94b72 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Feb 6 10:57:24 2024 -0800 x86-64: Return 10_REG if there is no scratch register If we can't find a scratch register for large model profiling, return R10_REG. PR target/113689 * config/i386/i386.cc (x86_64_select_profile_regnum): Return R10_REG after sorry.