Bug 113689 - [11/12/13 Regression] wrong code with -fprofile -mcmodel=large when needing drap register since r11-6548
Summary: [11/12/13 Regression] wrong code with -fprofile -mcmodel=large when needing d...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 14.0
: P2 normal
Target Milestone: 11.5
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2024-01-31 20:32 UTC by Zdenek Sojka
Modified: 2024-02-28 14:38 UTC (History)
6 users (show)

See Also:
Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-01-31 00:00:00


Attachments
reduced testcase (159 bytes, text/plain)
2024-01-31 20:32 UTC, Zdenek Sojka
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Zdenek Sojka 2024-01-31 20:32:59 UTC
Created attachment 57270 [details]
reduced testcase

Output:
$ x86_64-pc-linux-gnu-gcc -O2 -fprofile -mcmodel=large -mavx testcase.c
$ ./a.out 
Aborted

"b" (and then "x") seems to be garbage.




$ x86_64-pc-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r14-8665-20240131161256-g3fed1609f61-checking-yes-rtl-df-extra-nobootstrap-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/14.0.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --disable-bootstrap --with-cloog --with-ppl --with-isl --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld --with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-r14-8665-20240131161256-g3fed1609f61-checking-yes-rtl-df-extra-nobootstrap-amd64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.0.1 20240131 (experimental) (GCC)
Comment 1 Andrew Pinski 2024-01-31 23:04:38 UTC
Confirmed. s/511/(256-63)/ is the min to reproduce this issue. Basically just enough to auto-vectorize.

I get the feeling that mcount causes alignment differences ...
Comment 2 Jakub Jelinek 2024-02-01 12:27:21 UTC
Seems like a backend bug to me, collision between function profiler after prologue and drap.
I see
foo:
        leaq    8(%rsp), %r10
        andq    $-32, %rsp
        pushq   -8(%r10)
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %r14
        pushq   %r13
        pushq   %r12
        pushq   %r10
        pushq   %rbx
        subq    $200, %rsp
1:      movabsq $mcount, %r10
        call    *%r10
        xorl    %eax, %eax
        xorl    %edx, %edx
        movl    $-511, %r9d
        addb    $-1, %dl
        movq    %rax, %rdx
        sbbq    (%r10), %rdx

This function is stack_realign_drap, find_drap_reg returns R10_REG and so ix86_get_drap_rtx uses %r10 as drap register.  Later on pro_and_epilogue initializes
the drap register in the prologue.  And, final.cc when seeing NOTE_INSN_PROLOGUE_END emits the late FUNCTION_PROFILER, which seems to have clobbering of %r10 (and/or %r11) hardcoded in it, so it overwrites the drap value.

One doesn't need _BitInt to reproduce:

/* { dg-do run { target lp64 } } */
/* { dg-options "-O2 -fprofile -mcmodel=large" } */

__attribute__((noipa)) void
bar (char *x, char *y, int *z)
{
  x[0] = 42;
  y[0] = 42;
  if (z[0] != 16)
    __builtin_abort ();
}

__attribute__((noipa)) void 
foo (int c, int d, int e, int f, int g, int h, int z)
{
  typedef char B[32];
  B b __attribute__((aligned (32)));
  bar (&b[0], __builtin_alloca (z), &z);
}

int
main ()
{
  foo (0, 0, 0, 0, 0, 0, 16);
}

Started with r11-6548-g1b885264a48dcd71b7aeb26c0abeb91246724897
Comment 3 Jakub Jelinek 2024-02-01 12:39:23 UTC
Anyway, if flag_fentry == 0, it doesn't seem to be safe to clobber any registers to me,
sure, the code could test if %r10 or %r11 are ever live in the current function (if that information is up to date at that point, or remember it during prologue generation when we know we need function profiler with flag_fentry == 0), but if they are live, I think it needs to do something different.
Or, can it use say %r12 instead of %r10 if %r10 is drap?
Comment 4 H.J. Lu 2024-02-01 18:23:56 UTC
A patch is at

https://patchwork.sourceware.org/project/gcc/list/?series=30482
Comment 5 GCC Commits 2024-02-05 19:08:53 UTC
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>:

https://gcc.gnu.org/g:51f8ac3341078303e81e72d9013698a31c5ddd29

commit r14-8808-g51f8ac3341078303e81e72d9013698a31c5ddd29
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Feb 1 08:02:27 2024 -0800

    x86-64: Find a scratch register for large model profiling
    
    2 scratch registers, %r10 and %r11, are available at function entry for
    large model profiling.  But %r10 may be used by stack realignment and we
    can't use %r10 in this case.  Add x86_64_select_profile_regnum to find
    a caller-saved register which isn't live or a callee-saved register
    which has been saved on stack in the prologue at entry for large model
    profiling and sorry if we can't find one.
    
    gcc/
    
            PR target/113689
            * config/i386/i386.cc (x86_64_select_profile_regnum): New.
            (x86_function_profiler): Call x86_64_select_profile_regnum to
            get a scratch register for large model profiling.
    
    gcc/testsuite/
    
            PR target/113689
            * gcc.target/i386/pr113689-1.c: New file.
            * gcc.target/i386/pr113689-2.c: Likewise.
            * gcc.target/i386/pr113689-3.c: Likewise.
Comment 6 H.J. Lu 2024-02-05 19:09:48 UTC
Fixed for GCC 14 so far.
Comment 7 Rainer Orth 2024-02-06 18:30:13 UTC
This patch broke Solaris/x86 (i386-pc-solaris2.11) bootstrap:

/vol/gcc/src/hg/master/local/gcc/config/i386/i386.cc: In function 'void x86_function_profiler(std::FILE*, int)':
/vol/gcc/src/hg/master/local/gcc/config/i386/i386.cc:22838:40: error: array subscript -1 is below array bounds of 'const char* const [92]' [-Werror=array-bounds=]
22838 |               reg = hi_reg_name[scratch];
      |                     ~~~~~~~~~~~~~~~~~~~^c-include=/vol/gcc/include
/vol/gcc/src/hg/master/local/gcc/config/i386/i386.cc:138:26: note: while referencing 'hi_reg_name'               --with-target-bdw-gc-lib=/vol/gcc/lib,amd64=/vo  138 | static const char *const hi_reg_name[] = HI_REGISTER_NAMES;
      |                          ^~~~~~~~~~~
Comment 8 Jakub Jelinek 2024-02-06 18:37:47 UTC
(In reply to Rainer Orth from comment #7)
> This patch broke Solaris/x86 (i386-pc-solaris2.11) bootstrap:
> 
> /vol/gcc/src/hg/master/local/gcc/config/i386/i386.cc: In function 'void
> x86_function_profiler(std::FILE*, int)':
> /vol/gcc/src/hg/master/local/gcc/config/i386/i386.cc:22838:40: error: array
> subscript -1 is below array bounds of 'const char* const [92]'
> [-Werror=array-bounds=]
> 22838 |               reg = hi_reg_name[scratch];
>       |                     ~~~~~~~~~~~~~~~~~~~^c-include=/vol/gcc/include
> /vol/gcc/src/hg/master/local/gcc/config/i386/i386.cc:138:26: note: while
> referencing 'hi_reg_name'              
> --with-target-bdw-gc-lib=/vol/gcc/lib,amd64=/vo  138 | static const char
> *const hi_reg_name[] = HI_REGISTER_NAMES;
>       |                          ^~~~~~~~~~~

Guess for the case where we issue sorry we shouldn't return INVALID_REGNUM, but R10_REG or any other, we acknowledged we aren't emitting correct assembly already.

And, as I wrote earlier, I think if we did similar discovery at pro_and_epilogue time and find we otherwise wouldn't have any registers for it and would sorry at final time, it might be better to just pick some call used register and forcibly save it even when it isn't strictly needed.  Then it wouldn't sorry at final time anymore.
Comment 9 H.J. Lu 2024-02-06 18:42:33 UTC
Like this?

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index f02c6c02ac6..ed0b0e19985 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -22785,10 +22785,10 @@ x86_64_select_profile_regnum (bool r11_ok ATTRIBUTE_UNUSED)
      && !REGNO_REG_SET_P (reg_live, i))))
       return i;
 
-  sorry ("no register available for profiling %<-mcmodel=large%s%>",
+  sorry ("no register available for profiling %<-mcmodel=large%s%>, use r10",
    ix86_cmodel == CM_LARGE_PIC ? " -fPIC" : "");
 
-  return INVALID_REGNUM;
+  return R10_REG;
 }
 
 /* Output assembler code to FILE to increment profiler label # LABELNO
Comment 10 Jakub Jelinek 2024-02-06 18:50:15 UTC
(In reply to H.J. Lu from comment #9)
> Like this?
> 
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index f02c6c02ac6..ed0b0e19985 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22785,10 +22785,10 @@ x86_64_select_profile_regnum (bool r11_ok
> ATTRIBUTE_UNUSED)
>       && !REGNO_REG_SET_P (reg_live, i))))
>        return i;
>  
> -  sorry ("no register available for profiling %<-mcmodel=large%s%>",
> +  sorry ("no register available for profiling %<-mcmodel=large%s%>, use
> r10",
>     ix86_cmodel == CM_LARGE_PIC ? " -fPIC" : "");
>  
> -  return INVALID_REGNUM;
> +  return R10_REG;
>  }
>  
>  /* Output assembler code to FILE to increment profiler label # LABELNO

Just the second hunk.  I think with sorry call the compilation fails, so what
you actually emit doesn't matter (one can see it with -pipe, sure).
Comment 11 H.J. Lu 2024-02-06 18:58:00 UTC
(In reply to Jakub Jelinek from comment #10)
> 
> Just the second hunk.  I think with sorry call the compilation fails, so what
> you actually emit doesn't matter (one can see it with -pipe, sure).

Done.
Comment 12 GCC Commits 2024-02-06 19:03:01 UTC
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>:

https://gcc.gnu.org/g:f2a060820c24724bb48ee006d257c449e4d94b72

commit r14-8831-gf2a060820c24724bb48ee006d257c449e4d94b72
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Feb 6 10:57:24 2024 -0800

    x86-64: Return 10_REG if there is no scratch register
    
    If we can't find a scratch register for large model profiling, return
    R10_REG.
    
            PR target/113689
            * config/i386/i386.cc (x86_64_select_profile_regnum): Return
            R10_REG after sorry.