Bug 114872 - [13/14/15 Regression] Miscompilation with -O2 after commit r13-8037
Summary: [13/14/15 Regression] Miscompilation with -O2 after commit r13-8037
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 13.2.1
: P3 normal
Target Milestone: 13.3
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2024-04-27 18:57 UTC by Antonio Rojas
Modified: 2024-05-12 03:10 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work: 13.2.0
Known to fail: 13.2.1
Last reconfirmed:


Attachments
output of gcc -save-temps (247.98 KB, application/gzip)
2024-04-27 18:59 UTC, Antonio Rojas
Details
data for comment 12 - decompiled things (797.73 KB, application/octet-stream)
2024-05-03 11:26 UTC, Dmitrii Pasechnik
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Antonio Rojas 2024-04-27 18:57:53 UTC
Commit 049ec9b981d1f4f97736061d5cf7d0ae990b57d7 is causing some runtime crashes in sagemath when compiled with -O2 or higher (-O1 is fine)

The specific affected source file is element.c obtained from cythonizing element.pyx in https://github.com/sagemath/sage/blob/10.4.beta4/src/sage/libs/gap/element.pyx

Manual compilation command and output is:

> LANG=C gcc -O2 -fPIC -I/usr/lib/python3.12/site-packages/cysignals -I/usr/lib/python3.12/site-packages/sage/cpython/ -I/usr/include/python3.12 -c element.c -o element.o -v -save-temps 
Using built-in specs.
COLLECT_GCC=gcc
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++ --enable-bootstrap --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://gitlab.archlinux.org/archlinux/packaging/packages/gcc/-/issues --with-build-config=bootstrap-lto --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-werror
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.2.1 20231110 (GCC) 
COLLECT_GCC_OPTIONS='-O2' '-fPIC' '-I' '/usr/lib/python3.12/site-packages/cysignals' '-I' '/usr/lib/python3.12/site-packages/sage/cpython/' '-I' '/usr/include/python3.12' '-c' '-o' 'element.o' '-v' '-save-temps' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/cc1 -E -quiet -v -I /usr/lib/python3.12/site-packages/cysignals -I /usr/lib/python3.12/site-packages/sage/cpython/ -I /usr/include/python3.12 element.c -mtune=generic -march=x86-64 -fPIC -O2 -fpch-preprocess -o element.i
ignoring nonexistent directory "/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../x86_64-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/python3.12/site-packages/cysignals
 /usr/lib/python3.12/site-packages/sage/cpython/
 /usr/include/python3.12
 /usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/include
 /usr/local/include
 /usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/include-fixed
 /usr/include
End of search list.
COLLECT_GCC_OPTIONS='-O2' '-fPIC' '-I' '/usr/lib/python3.12/site-packages/cysignals' '-I' '/usr/lib/python3.12/site-packages/sage/cpython/' '-I' '/usr/include/python3.12' '-c' '-o' 'element.o' '-v' '-save-temps' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/cc1 -fpreprocessed element.i -quiet -dumpbase element.c -dumpbase-ext .c -mtune=generic -march=x86-64 -O2 -version -fPIC -o element.s
GNU C17 (GCC) version 13.2.1 20231110 (x86_64-pc-linux-gnu)
        compiled by GNU C version 13.2.1 20231110, GMP version 6.3.0, MPFR version 4.2.1, MPC version 1.3.1, isl version isl-0.26-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 45f88f2c66186490486270a091963f0f
element.c: In function '__pyx_f_4sage_4libs_3gap_7element_10GapElement__type_number':
element.c:12020:16: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
12020 |   __pyx_v_name = TNAM_OBJ(__pyx_v_self->value);
      |                ^
COLLECT_GCC_OPTIONS='-O2' '-fPIC' '-I' '/usr/lib/python3.12/site-packages/cysignals' '-I' '/usr/lib/python3.12/site-packages/sage/cpython/' '-I' '/usr/include/python3.12' '-c' '-o' 'element.o' '-v' '-save-temps' '-mtune=generic' '-march=x86-64'
 as -v -I /usr/lib/python3.12/site-packages/cysignals -I /usr/lib/python3.12/site-packages/sage/cpython/ -I /usr/include/python3.12 --64 -o element.o element.s
GNU assembler version 2.42.0 (x86_64-pc-linux-gnu) using BFD version (GNU Binutils) 2.42.0
COMPILER_PATH=/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/:/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/:/usr/lib/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/:/usr/lib/gcc/x86_64-pc-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/:/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../lib/:/lib/../lib/:/usr/lib/../lib/:/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-O2' '-fPIC' '-I' '/usr/lib/python3.12/site-packages/cysignals' '-I' '/usr/lib/python3.12/site-packages/sage/cpython/' '-I' '/usr/include/python3.12' '-c' '-o' 'element.o' '-v' '-save-temps' '-mtune=generic' '-march=x86-64' '-dumpdir' 'element.'

At runtime, the crash has the following backtrace:

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=11, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007aea958ab393 in __pthread_kill_internal (signo=11, threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007aea9585a6c8 in __GI_raise (sig=sig@entry=11) at ../sysdeps/posix/raise.c:26
#3  0x00007aea955be2f9 in sigdie (sig=sig@entry=11, s=s@entry=0x7aea955c6f78 "Unhandled SIGSEGV: A segmentation fault occurred.")
    at build/src/cysignals/implementation.c:713
#4  0x00007aea955c267d in sigdie_for_sig (inside=0, sig=11) at build/src/cysignals/implementation.c:256
#5  cysigs_signal_handler (sig=11) at build/src/cysignals/implementation.c:361
#6  <signal handler called>
#7  0x00007ae9ba6a5d60 in _Py_IsImmortal (op=0x0) at /usr/include/python3.12/object.h:243
#8  Py_DECREF (op=0x0) at /usr/include/python3.12/object.h:701
#9  Py_XDECREF (op=0x0) at /usr/include/python3.12/object.h:799
#10 __pyx_pf_4sage_4libs_3gap_7element_19GapElement_Function_2__call__ (__pyx_v_self=__pyx_v_self@entry=0x7ae9b3e90a00, 
    __pyx_v_args=__pyx_v_args@entry=0x7ae9b57a0400) at build/cythonized/sage/libs/gap/element.c:26540
#11 0x00007ae9ba6a6bab in __pyx_pw_4sage_4libs_3gap_7element_19GapElement_Function_3__call__ (__pyx_v_self=0x7ae9b3e90a00, __pyx_args=0x7ae9b57a0400, 
    __pyx_kwds=<optimized out>) at build/cythonized/sage/libs/gap/element.c:26110
#12 0x00007aea95bc1949 in _PyObject_Call (tstate=0x7aea96031788 <_PyRuntime+459656>, callable=0x7ae9b3e90a00, args=0x7ae9b57a0400, 
    kwargs=<optimized out>) at Objects/call.c:367
#13 0x00007aea95a8d591 in PyCFunction_Call (kwargs=0x0, args=0x7ae9b57a0400, callable=0x7ae9b3e90a00) at Objects/call.c:387

which suggests that the null pointer check in Py_XDECREF is being optimized away.

Attached is the produced element.i
Comment 1 Antonio Rojas 2024-04-27 18:59:34 UTC
Created attachment 58055 [details]
output of gcc -save-temps
Comment 2 Andrew Pinski 2024-04-27 19:03:04 UTC
g:049ec9b981d1f4f97736061d5cf7d0ae990b57d7
Comment 3 Andrew Pinski 2024-04-27 19:05:21 UTC
Does -fno-strict-aliasing help?
Comment 4 Antonio Rojas 2024-04-27 19:06:54 UTC
(In reply to Andrew Pinski from comment #3)
> Does -fno-strict-aliasing help?

No, same result.
Comment 5 Sam James 2024-04-29 15:53:54 UTC
Some ideas:
* Could you maybe give a reproducer for the runtime crash?
* Any chance you'd be willing to try bisect element.i with pragmas to disable/enable optimisation for chunks of it, to find the miscompiled function?
Comment 6 Jakub Jelinek 2024-04-29 16:00:25 UTC
I have looked at the IL and I don't see how it could crash that way based on the -fdump-tree-optimized-lineno dump, neither on branch nor on the trunk.
On the branch, I see
  <bb 286> [local count: 462566023]:
  [element.c:26539:57 discrim 1] __pyx_t_5_522(ab) = 0B;
  [/usr/include/python3.12/object.h:797:8] if (__pyx_t_6_211(ab) != 0B)
    goto <bb 291>; [70.00%]
  else
    goto <bb 290>; [30.00%]

  <bb 290> [local count: 462566023]:
  [element.c:26540:57 discrim 1] __pyx_t_6_524(ab) = 0B;
  [/usr/include/python3.12/object.h:797:8] if (__pyx_t_8_1031(ab) != 0B)
    goto <bb 295>; [70.00%]
  else
    goto <bb 294>; [30.00%]

  <bb 291> [local count: 323796219]:
  [/usr/include/python3.12/object.h:242:25] _1105 = [/usr/include/python3.12/object.h:242:25] [/usr/include/python3.12/object.h:242:25] __pyx_t_6_211(ab)->D.10046.ob_refcnt;
  [/usr/include/python3.12/object.h:242:13] _1106 = (int) _1105;
  [/usr/include/python3.12/object.h:700:8 discrim 1] if (_1106 < 0)
    goto <bb 290>; [26.36%]
  else
    goto <bb 292>; [73.64%]

  <bb 292> [local count: 238443538]:
  [/usr/include/python3.12/object.h:704:9] _1107 = _1105 + -1;
  [/usr/include/python3.12/object.h:704:8] [/usr/include/python3.12/object.h:704:13] [/usr/include/python3.12/object.h:704:13] __pyx_t_6_211(ab)->D.10046.ob_refcnt = _1107;
  [/usr/include/python3.12/object.h:704:8] if (_1107 == 0)
    goto <bb 293>; [33.00%]
  else
    goto <bb 290>; [67.00%]
  
  <bb 293> [local count: 78686364]:
  [/usr/include/python3.12/object.h:705:9] _Py_Dealloc (__pyx_t_6_211(ab));
  goto <bb 290>; [100.00%]

so the 26540 line case checks if the pointer later passed to the _Py_Dealloc is non-NULL and doesn't let it be called in that case.
Comment 7 Antonio Rojas 2024-04-29 20:05:52 UTC
(In reply to Sam James from comment #5)
> Some ideas:
> * Could you maybe give a reproducer for the runtime crash?

In sage, calling any libgap function in exactly 3 parameters triggers this. For instance:

sage: libgap.AbelianGroup(0,0,0)

Note that you need Python 3.12 to get the crash (which happens when trying to derefence the null pointer in the newly introduced _Py_IsImmortal function)

> * Any chance you'd be willing to try bisect element.i with pragmas to
> disable/enable optimisation for chunks of it, to find the miscompiled
> function?

I have done that. As expected, the problem is in Py_XDECREF. This makes the problem disappear:

#pragma GCC push_options
#pragma GCC optimize ("O0")

static inline void Py_XDECREF(PyObject *op)
{
    if (op != 
# 797 "/usr/include/python3.12/object.h" 3 4
             ((void *)0)
# 797 "/usr/include/python3.12/object.h"
                     ) {
        Py_DECREF(((PyObject*)((op))));
    }
}

#pragma GCC pop_options
Comment 8 Sam James 2024-05-01 01:00:39 UTC
element.i is unfortunatley huge. It's hard to analyse things without a standalone testcase, but it's even harder without _something_ one can run.

I'd suggest:
1) giving instructions to reproduce the crash assuming someone knows nothing about sage and cython;
2) taking sage and cannibalising it (first to directly call `libgap.AbelianGroup(0,0,0)`, then you can cut things down with lots of removals + gdb so that the important caller of Py_XDECREF gets passed with the same args; it's harder if there's a lot of state involved though, of course)
3) build element.i with -fdump-tree-all -fdump-unnumbered -fdump-noaddr (can also try e.g. -da, see https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html) at the bad commit and before, and diff the produced dumps, and show us the *first* dump which differs between the two
Comment 9 Sam James 2024-05-01 01:01:10 UTC
unfortunately*
Comment 10 Sam James 2024-05-01 01:01:47 UTC
also maybe obvious, but if you can find something from the cython testsuite, or at least some other heavy use of cython, which fails, that may be a good direction.
Comment 11 Sam James 2024-05-01 01:03:17 UTC
also, do asan and ubsan give anything?
Comment 12 Dmitrii Pasechnik 2024-05-03 11:07:44 UTC
A colleague disassembled, using ghidra (https://ghidra-sre.org/), the results of the compilations with, respectively, -O2 and with -O0 flags.
Comparing the results, in the broken (built with -O2) case one sees a miscompilation of a call to GAP_CallFunc3Args - it is called with one argument less than it should, three instead of four!

broken (-O2):

>               plVar9 = (long *)GAP_CallFunc3Args(*(undefined8 *)(param_1 + 0x20),local_a0[4],
>                                                  local_a8[4]);

vs. good (-O0):

< LAB_0013cbd9:
<             plVar10 = (long *)GAP_CallFunc3Args(*(undefined8 *)(param_1 + 0x20),local_a8[4],
<                                                 local_a0[4],plVar16[4]);

And this is despite the prototype for calling GAP_CallFunc3Args() is
found in "gap/libgap-api.h", which is included in example.c as #include "gap/libgap-api.h",  meant to be respected during the compilation. 

I hope this helps in chasing down the obvious compiler bug. Perhaps it can be also seen without disassembling, simply on the intermediate data generated by the compiler.
Comment 13 Dmitrii Pasechnik 2024-05-03 11:26:17 UTC
Created attachment 58099 [details]
data for comment 12 - decompiled things

data for comment 12 - decompiled .so's, .so's themselves, original C file.

BROKEN* are for -O2 option, and FIXED* are for -O0 option.
Comment 14 Sergei Trofimovich 2024-05-06 13:53:32 UTC
I reproduced the `SIGSEGV` on Gentoo ~amd64 and ::sage-on-gentoo overlay against sci-mathematics/sagemath-standard package.

One of the unusual properties of __pyx_pf_4sage_4libs_3gap_7element_19GapElement_Function_2__call__() is that it raises 2 signals while it gets executed:

- SIGABRT handler uses longjmp() to return to the ~beginning of a function
- and then SIGSEGV happens at cleanup when an attempt to dereference the pointer happens.

I see no `volatile` annotations anywhere in the __pyx_pf_4sage_4libs_3gap_7element_19GapElement_Function_2__call__().

My wild guess would be that:
1. `PyObject *__pyx_t_4 = ((void *)0);` gets saved in setjmp() with one value (probably NULL)
2. updated at some point later in the same function to non-NULL that `gcc` can infer and throw away all later `NULL` checks
3. then SIGABRT returns with longjmp() by accidentally resetting

I would expect `__pyx_t_4` to require volatile annotation for such an `element.i` definition. Or `longjmp()` should be called from a `((noipa))` function to force register spill/reload on stack.

To cite `man setjmp`:

"""
CAVEATS
       The  compiler  may  optimize  variables  into registers, and longjmp() may restore the values of other registers in addition to the stack pointer and program counter.  Consequently, the values of automatic
       variables are unspecified after a call to longjmp() if they meet all the following criteria:
       •  they are local to the function that made the corresponding setjmp() call;
       •  their values are changed between the calls to setjmp() and longjmp(); and
       •  they are not declared as volatile.
       Analogous remarks apply for siglongjmp().
"""

Sounds plausible?
Comment 15 Jakub Jelinek 2024-05-06 17:29:07 UTC
(In reply to Dmitrii Pasechnik from comment #12)
> A colleague disassembled, using ghidra (https://ghidra-sre.org/), the
> results of the compilations with, respectively, -O2 and with -O0 flags.
> Comparing the results, in the broken (built with -O2) case one sees a
> miscompilation of a call to GAP_CallFunc3Args - it is called with one
> argument less than it should, three instead of four!
> 
> broken (-O2):
> 
> >               plVar9 = (long *)GAP_CallFunc3Args(*(undefined8 *)(param_1 + 0x20),local_a0[4],
> >                                                  local_a8[4]);
> 
> vs. good (-O0):
> 
> < LAB_0013cbd9:
> <             plVar10 = (long *)GAP_CallFunc3Args(*(undefined8 *)(param_1 +
> 0x20),local_a8[4],
> <                                                 local_a0[4],plVar16[4]);
> 
> And this is despite the prototype for calling GAP_CallFunc3Args() is
> found in "gap/libgap-api.h", which is included in example.c as #include
> "gap/libgap-api.h",  meant to be respected during the compilation. 
> 
> I hope this helps in chasing down the obvious compiler bug. Perhaps it can
> be also seen without disassembling, simply on the intermediate data
> generated by the compiler.

Both calls pass 4 arguments, both in optimized -O2 -fPIC GCC 13.2.1 dump and in assembly:
        movq    -88(%rbp), %rdx
        movq    %r12, %rcx
        movq    %r13, %rsi
        movq    %rax, %rdi
        call    GAP_CallFunc3Args@PLT
...
        movq    -88(%rbp), %rdx
        movq    %r12, %rcx
        movq    %r13, %rsi
        movq    %rax, %rdi
        call    GAP_CallFunc3Args@PLT
and
  GAP_CallFunc3Args (_53, _47, __pyx_v_func_54(D), _49);
...
  _441 = GAP_CallFunc3Args (_98, _97, _96, _95);
Comment 16 Jakub Jelinek 2024-05-06 17:59:50 UTC
(In reply to Sergei Trofimovich from comment #14)
> I reproduced the `SIGSEGV` on Gentoo ~amd64 and ::sage-on-gentoo overlay
> against sci-mathematics/sagemath-standard package.
> 
> One of the unusual properties of
> __pyx_pf_4sage_4libs_3gap_7element_19GapElement_Function_2__call__() is that
> it raises 2 signals while it gets executed:
> 
> - SIGABRT handler uses longjmp() to return to the ~beginning of a function
> - and then SIGSEGV happens at cleanup when an attempt to dereference the
> pointer happens.
> 
> I see no `volatile` annotations anywhere in the
> __pyx_pf_4sage_4libs_3gap_7element_19GapElement_Function_2__call__().
> 
> My wild guess would be that:
> 1. `PyObject *__pyx_t_4 = ((void *)0);` gets saved in setjmp() with one
> value (probably NULL)
> 2. updated at some point later in the same function to non-NULL that `gcc`
> can infer and throw away all later `NULL` checks
> 3. then SIGABRT returns with longjmp() by accidentally resetting
> 
> I would expect `__pyx_t_4` to require volatile annotation for such an
> `element.i` definition. Or `longjmp()` should be called from a `((noipa))`
> function to force register spill/reload on stack.
> 
> To cite `man setjmp`:
> 
> """
> CAVEATS
>        The  compiler  may  optimize  variables  into registers, and
> longjmp() may restore the values of other registers in addition to the stack
> pointer and program counter.  Consequently, the values of automatic
>        variables are unspecified after a call to longjmp() if they meet all
> the following criteria:
>        •  they are local to the function that made the corresponding
> setjmp() call;
>        •  their values are changed between the calls to setjmp() and
> longjmp(); and
>        •  they are not declared as volatile.
>        Analogous remarks apply for siglongjmp().
> """
> 
> Sounds plausible?

So, if you can reproduce it, can you:
1) attach your *.s file and state which exact compiler you used (revision)
2) ideally show a gdb session with the important events, which setjmp was it (I see
_setjmp and __sigsetjmp calls in the function), which exact function called from the function ended up aborting/doing longjmp in the signal handler and where is the crash
3) is it __pyx_t_6, __pyx_t_4 or some other pointer that triggers it (from the line numbers in #c0 my guess was __pyx_t_6, but you talk about __pyx_t_4)

Yes, there are no volatile keywords on any of the vars, but without knowing which setjmp call it is and from where longjmp jumps to it, it is hard to know if the variables have been modified in between (then volatile would be required) or if they
are only modified before the setjmp call or after the call that calls longjmp (then volatile might not be required).
Comment 17 Sergei Trofimovich 2024-05-06 21:41:40 UTC
> 1) attach your *.s file and state which exact compiler you used (revision)

Generate code first:

https://slyfox.uni.cx/b/gcc/PR114872/d.tar.gz (4MB, does not fit on bugzilla's 1MB limit)

is the archive of source files from sage-10.3:
- element.c: original unpreprocessed file
- element.c.c: preprocessed file
- element.S: assembly file (-S)
- element-verbose.S: assembly file (-S -fverbose-asm, has compiler options)

Compiler:

# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/13/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /dev/shm/portage/sys-devel/gcc-13.2.1_p20240503/work/gcc-13-20240503/configure --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/13 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/13/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/13 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/13/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/13/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13 --disable-silent-rules --disable-dependency-tracking --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/13/python --enable-languages=c,c++,fortran --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --disable-libunwind-exceptions --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 13.2.1_p20240503 p15' --with-gcc-major-version-only --enable-libstdcxx-time --enable-lto --disable-libstdcxx-pch --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-multilib --with-multilib-list=m32,m64 --disable-fixed-point --enable-targets=all --enable-libgomp --disable-libssp --disable-libada --disable-cet --disable-systemtap --disable-valgrind-annotations --disable-vtable-verify --disable-libvtv --without-zstd --without-isl --enable-default-pie --enable-default-ssp --disable-fixincludes
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 13.2.1 20240503 (Gentoo 13.2.1_p20240503 p15)

Note: it's a weekly snapshot of 20240503 with --enable-default-ssp. I think it's a r13-8684-g704b15e277a879.

The compiler also has a few distro-specific patches on top: https://gitweb.gentoo.org/proj/gcc-patches.git/tree/13.2.0/gentoo. Most of them should not affect code generation too much, but some like 22_all_default_ssp-buffer-size.patch and 84_all_x86_PR110792-Early-clobber-issues-with-rot32di2.patch might.
Comment 18 Sergei Trofimovich 2024-05-06 22:08:34 UTC
> 2) ideally show a gdb session with the important events, which setjmp was it (I see _setjmp and __sigsetjmp calls in the function), which exact function called from the function ended up aborting/doing longjmp in the signal handler and where is the crash

# gdb --quiet -p 1180766

Attaching to a running `sage` interactive process.
In sage repl typing:

  libgap.AbelianGroup(0,0,0)

Breakpoint happens. SIGABRT (immediate longjmp trigger) backtrace:

Thread 1 "sage-ipython" received signal SIGABRT, Aborted.
0x00007f53f8e617a7 in __GI_kill () at ../sysdeps/unix/syscall-template.S:120
120     T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt
#0  0x00007f53f8e617a7 in __GI_kill () at ../sysdeps/unix/syscall-template.S:120
#1  0x00007f539c581edf in sig_error () at /usr/lib/python3.12/site-packages/cysignals/macros.h:298
#2  __pyx_pf_4sage_4libs_3gap_7element_19GapElement_Function_2__call__ (__pyx_v_self=__pyx_v_self@entry=0x7f539635bc40,
    __pyx_v_args=__pyx_v_args@entry=(<sage.rings.integer.Integer at remote 0x7f539b17b6c0>, <sage.rings.integer.Integer at remote 0x7f5396574390>, <sage.rings.integer.Integer at remote 0x7f5396f957d0>))
    at /usr/src/debug/sci-mathematics/sagemath-standard-10.3/sagemath-standard-10.3-python3_12/build/cythonized/sage/libs/gap/element.c:26315
#3  0x00007f539c5834e7 in __pyx_pw_4sage_4libs_3gap_7element_19GapElement_Function_3__call__ (__pyx_v_self=<sage.libs.gap.element.GapElement_Function at remote 0x7f539635bc40>,
    __pyx_args=(<sage.rings.integer.Integer at remote 0x7f539b17b6c0>, <sage.rings.integer.Integer at remote 0x7f5396574390>, <sage.rings.integer.Integer at remote 0x7f5396f957d0>), __pyx_kwds=<optimized out>)
    at /usr/src/debug/sci-mathematics/sagemath-standard-10.3/sagemath-standard-10.3-python3_12/build/cythonized/sage/libs/gap/element.c:26105
#4  0x00007f53f916496b in _PyObject_MakeTpCall (tstate=0x7f53f9670d08 <_PyRuntime+459656>, callable=callable@entry=<sage.libs.gap.element.GapElement_Function at remote 0x7f539635bc40>,
    args=args@entry=0x7f53f96b5480, nargs=3, keywords=0x0) at Objects/call.c:240
...

SIGSEGV backtrace (for completeness):

Thread 1 "sage-ipython" received signal SIGSEGV, Segmentation fault.
0x00007f539c58256f in _Py_IsImmortal (op=0x0) at /usr/include/python3.12/object.h:242
242         return _Py_CAST(PY_INT32_T, op->ob_refcnt) < 0;
(gdb) bt
#0  0x00007f539c58256f in _Py_IsImmortal (op=0x0) at /usr/include/python3.12/object.h:242
#1  Py_DECREF (op=0x0) at /usr/include/python3.12/object.h:700
#2  Py_XDECREF (op=0x0) at /usr/include/python3.12/object.h:798
#3  __pyx_pf_4sage_4libs_3gap_7element_19GapElement_Function_2__call__ (__pyx_v_self=__pyx_v_self@entry=0x7f539635bc40,
    __pyx_v_args=__pyx_v_args@entry=(<sage.rings.integer.Integer at remote 0x7f539b17b6c0>, <sage.rings.integer.Integer at remote 0x7f5396574390>, <sage.rings.integer.Integer at remote 0x7f5396f957d0>))
    at /usr/src/debug/sci-mathematics/sagemath-standard-10.3/sagemath-standard-10.3-python3_12/build/cythonized/sage/libs/gap/element.c:26535
#4  0x00007f539c5834e7 in __pyx_pw_4sage_4libs_3gap_7element_19GapElement_Function_3__call__ (__pyx_v_self=<sage.libs.gap.element.GapElement_Function at remote 0x7f539635bc40>,
    __pyx_args=(<sage.rings.integer.Integer at remote 0x7f539b17b6c0>, <sage.rings.integer.Integer at remote 0x7f5396574390>, <sage.rings.integer.Integer at remote 0x7f5396f957d0>), __pyx_kwds=<optimized out>)
    at /usr/src/debug/sci-mathematics/sagemath-standard-10.3/sagemath-standard-10.3-python3_12/build/cythonized/sage/libs/gap/element.c:26105
#5  0x00007f53f916496b in _PyObject_MakeTpCall (tstate=0x7f53f9670d08 <_PyRuntime+459656>, callable=callable@entry=<sage.libs.gap.element.GapElement_Function at remote 0x7f539635bc40>,
    args=args@entry=0x7f53f96b5480, nargs=3, keywords=0x0) at Objects/call.c:240


Catching `*jmp` flavours:

(gdb) break __pyx_pf_4sage_4libs_3gap_7element_19GapElement_Function_2__call__
(gdb) continue
...
(gdb) break __GI___sigsetjmp
(gdb) break longjmp
(gdb) break siglongjmp
(gdb) continue # a lot of them

(gdb) bt
#0  __GI___sigsetjmp () at ../sysdeps/x86_64/setjmp.S:33
#1  0x00007fc83e6e6ea4 in __pyx_pf_4sage_4libs_3gap_7element_19GapElement_Function_2__call__ (__pyx_v_self=__pyx_v_self@entry=0x7fc83844f2c0,
    __pyx_v_args=__pyx_v_args@entry=(<sage.rings.integer.Integer at remote 0x7fc83e878de0>, <sage.rings.integer.Integer at remote 0x7fc8392b2790>, <sage.rings.integer.Integer at remote 0x7fc8392b2730>))
    at /usr/src/debug/sci-mathematics/sagemath-standard-10.3/sagemath-standard-10.3-python3_12/build/cythonized/sage/libs/gap/element.c:26315

(gdb) fr 1
#1  0x00007fc83e6e6ea4 in __pyx_pf_4sage_4libs_3gap_7element_19GapElement_Function_2__call__ (__pyx_v_self=__pyx_v_self@entry=0x7fc83844f2c0,
    __pyx_v_args=__pyx_v_args@entry=(<sage.rings.integer.Integer at remote 0x7fc83e878de0>, <sage.rings.integer.Integer at remote 0x7fc8392b2790>, <sage.rings.integer.Integer at remote 0x7fc8392b2730>))
    at /usr/src/debug/sci-mathematics/sagemath-standard-10.3/sagemath-standard-10.3-python3_12/build/cythonized/sage/libs/gap/element.c:26315
26315       sig_GAP_Enter();
(gdb) disassemble
Dump of assembler code for function __pyx_pf_4sage_4libs_3gap_7element_19GapElement_Function_2__call__:

   0x00007fc83e6e6e9f <+831>:   call   0x7fc83e6b54e0 <_setjmp@plt>
=> 0x00007fc83e6e6ea4 <+836>:   mov    $0x1,%ebx

That is a _setjmp at element.c:26315. ABORT happens after it.
Comment 19 Dmitrii Pasechnik 2024-05-06 22:45:58 UTC
Declaring the last argument in the call to GAP_CallFunc3Args() volatile appears to fix the issue. Namely, apply

diff --git a/src/sage/libs/gap/element.pyx b/src/sage/libs/gap/element.pyx
index f1482997b8..7ca4a666ab 100644
--- a/src/sage/libs/gap/element.pyx
+++ b/src/sage/libs/gap/element.pyx
@@ -2504,6 +2504,7 @@ cdef class GapElement_Function(GapElement):
         cdef Obj result = NULL
         cdef Obj arg_list
         cdef int n = len(args)
+        cdef volatile Obj v2
 
         if n > 0 and n <= 3:
             libgap = self.parent()
@@ -2522,10 +2523,11 @@ cdef class GapElement_Function(GapElement):
                                            (<GapElement>a[0]).value,
                                            (<GapElement>a[1]).value)
             elif n == 3:
+                v2 = (<GapElement>a[2]).value
                 result = GAP_CallFunc3Args(self.value,
                                            (<GapElement>a[0]).value,
                                            (<GapElement>a[1]).value,
-                                           (<GapElement>a[2]).value)
+                                           v2)
             else:
                 arg_list = make_gap_list(args)
                 result = GAP_CallFuncList(self.value, arg_list)
Comment 20 Jakub Jelinek 2024-05-07 10:45:05 UTC
(In reply to Sergei Trofimovich from comment #18)
> > 2) ideally show a gdb session with the important events, which setjmp was it (I see _setjmp and __sigsetjmp calls in the function), which exact function called from the function ended up aborting/doing longjmp in the signal handler and where is the crash

Strange.  If it is the _setjmp call on line 26315 followed by kill SIGABRT from sig_error(); on the same line, then I don't see any local vars modified in between, except that int t on line 26315 inside of sig_GAP_Enter macro.
There is then __sigsetjmp on line 26324, but no further kill calls and I think this isn't in a loop.
Comment 21 Sergei Trofimovich 2024-05-07 11:05:12 UTC
Good point! I wonder if I'm looking at the backtrace too late (or at the wrong one). I'll retry again this evening and will extract more context.
Comment 22 Sergei Trofimovich 2024-05-07 22:13:06 UTC
Trying again to catch more precise place for SIGABRT.

Beginning at the start of the possibly aborting function:

(gdb) break __pyx_pf_4sage_4libs_3gap_7element_19GapElement_Function_2__call__
(gdb) continue

Running step by step to get the rough location:

(gdb) continue # many times

(gdb)
26459         __pyx_t_6 = __Pyx_GetItemInt_List(__pyx_v_a, 2, long, 1, __Pyx_PyInt_From_long, 1, 0, 1); if (unlikely(!__pyx_t_6)) __PYX_ERR(0, 2528, __pyx_L14_error)
(gdb)
26469         __pyx_v_result = GAP_CallFunc3Args(__pyx_v_self->__pyx_base.value, ((struct __pyx_obj_4sage_4libs_3gap_7element_GapElement *)__pyx_t_5)->value, ((struct __pyx_obj_4sage_4libs_3gap_7element_GapElement *)__pyx_t_4)->value, ((struct __pyx_obj_4sage_4libs_3gap_7element_GapElement *)__pyx_t_6)->value);
(gdb)

Thread 1 "sage-ipython" received signal SIGABRT, Aborted.
0x00007f73e4a617a7 in __GI_kill () at ../sysdeps/unix/syscall-template.S:120
120     T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)

Don't know why backtrace does not point at element.c:26459. It is this snippet:

      /* "sage/libs/gap/element.pyx":2525
 *                                            (<GapElement>a[1]).value)
 *             elif n == 3:
 *                 result = GAP_CallFunc3Args(self.value,             # <<<<<<<<<<<<<<
 *                                            (<GapElement>a[0]).value,
 *                                            (<GapElement>a[1]).value,
 */
      __pyx_v_result = GAP_CallFunc3Args(__pyx_v_self->__pyx_base.value, ((struct __pyx_obj_4sage_4libs_3gap_7element_GapElement *)__pyx_t_5)->value, ((struct __pyx_obj_4sage_4libs_3gap_7element_GapElement *)__pyx_t_4)->value, ((struct __pyx_obj_4sage_4libs_3gap_7element_GapElement *)__pyx_t_6)->value);
Comment 23 Sergei Trofimovich 2024-05-08 08:18:49 UTC
At SIGSEGV site the code is an unconditional NULL dereference due to dereference of `xor %esi,%esi` result from `gdb`.

797         if (op != _Py_NULL) {
   0x00007f940c871563 <+2563>:  cmpq   $0x0,-0xc8(%rbp)
   0x00007f940c87156b <+2571>:  je     0x7f940c871583 <__pyx_pf_4sage_4libs_3gap_7element_19GapElement_Function_2__call__+2595>

242         return _Py_CAST(PY_INT32_T, op->ob_refcnt) < 0;
   0x00007f940c87156d <+2573>:  xor    %esi,%esi
=> 0x00007f940c87156f <+2575>:  mov    (%rsi),%rax

In `element-verbose.S` it is:

# /usr/include/python3.12/object.h:797:     if (op != _Py_NULL) {
    .loc 5 797 8 is_stmt 0 view .LVU65876
    cmpq $0, -200(%rbp)<>#, %sfp
    je<---->.L12727>#,
    .loc 5 798 9 is_stmt 1 view .LVU65877
.LVL15705:
.LBB49946:
.LBI49946:
    .loc 5 696 37 view .LVU65878
.LBB49947:
    .loc 5 700 5 view .LVU65879
.LBB49948:
.LBI49948:
    .loc 5 239 36 view .LVU65880
.LBB49949:
    .loc 5 242 5 view .LVU65881
# /usr/include/python3.12/object.h:242:     return _Py_CAST(PY_INT32_T, op->ob_refcnt) < 0;
    .loc 5 242 12 is_stmt 0 view .LVU65882
    xorl %esi, %esi # r
    movq (%rsi), %rax # __pyx_t_6_208(ab)->D.11083.ob_refcnt, _991

Looking at other sites in `element-verbose.S` for comparison do try to use `-0xc8(%rbp)` contents:

# /usr/include/python3.12/object.h:797:     if (op != _Py_NULL) {
    .loc 5 797 8 is_stmt 0 view .LVU66162
    cmpq $0, -200(%rbp) #, %sfp
    je .L12782>#,
    .loc 5 798 9 is_stmt 1 view .LVU66163
.LVL15760:
.LBB50093:
.LBI50093:
    .loc 5 696 37 view .LVU66164
.LBB50094:
    .loc 5 700 5 view .LVU66165
.LBB50095:
.LBI50095:
    .loc 5 239 36 view .LVU66166
.LBB50096:
    .loc 5 242 5 view .LVU66167
# /usr/include/python3.12/object.h:242:     return _Py_CAST(PY_INT32_T, op->ob_refcnt) < 0;
    .loc 5 242 12 is_stmt 0 view .LVU66168
    movq -200(%rbp), %rdx # %sfp, r
    movq (%rdx), %rax # __pyx_t_6_10(ab)->D.11083.ob_refcnt, _1070

Thus my guess is that something clobbered `-200(%rbp)` value across setjmp()/longjmp().

Trying to trace:

$ gdb -p `pgrep sage-ipython`
(gdb) break __pyx_pf_4sage_4libs_3gap_7element_19GapElement_Function_2__call__
(gdb) continue

    # trigger break with with ` libgap.AbelianGroup(0,0,0)`

(gdb) disassemble
Dump of assembler code for function __pyx_pf_4sage_4libs_3gap_7element_19GapElement_Function_2__call__:
=> 0x00007f4ed9981b60 <+0>:     push   %rbp
   0x00007f4ed9981b61 <+1>:     mov    %rsp,%rbp

    # Populating `%rbp`:

(gdb) nexti
(gdb) nexti
(gdb) disassemble
Dump of assembler code for function __pyx_pf_4sage_4libs_3gap_7element_19GapElement_Function_2__call__:
   0x00007f4ed9981b60 <+0>:     push   %rbp
   0x00007f4ed9981b61 <+1>:     mov    %rsp,%rbp
=> 0x00007f4ed9981b64 <+4>:     push   %r15

(gdb) print $rbp-200
$2 = (void *) 0x7ffd2824c5e8

(gdb) watch *(int*)(void *) 0x7ffd2824c5e8
Hardware watchpoint 2: *(int*)(void *) 0x7ffd2824c5e8

(gdb) continue
Continuing.

Thread 1 "sage-ipython" hit Hardware watchpoint 2: *(int*)(void *) 0x7ffd2824c5e8

Old value = 673498624
New value = 0
0x00007f98e609d2a8 in __pyx_pf_4sage_4libs_3gap_7element_19GapElement_Function_2__call__ (
    __pyx_v_self=__pyx_v_self@entry=0x7f98dfe70dc0,
    __pyx_v_args=__pyx_v_args@entry=(<sage.rings.integer.Integer at remote 0x7f98e4722c40>, <sage.rings.integer.Integer at remote 0x7f98dfe7afd0>, <sage.rings.integer.Integer at remote 0x7f98e01dd5c0>))
    at /usr/src/debug/sci-mathematics/sagemath-standard-10.3/sagemath-standard-10.3-python3_12/build/cythonized/sage/libs/gap/element.c:26192
26192       __pyx_t_6 = NULL;

NULL store.

(gdb) continue
Continuing.

Thread 1 "sage-ipython" hit Hardware watchpoint 2: *(int*)(void *) 0x7ffd2824c5e8

Old value = 0
New value = -538669696
__Pyx_GetItemInt_List_Fast (wraparound=0, boundscheck=1, i=2,
    o=[<sage.libs.gap.element.GapElement_Integer at remote 0x7f98e0ac5c00>, <sage.libs.gap.element.GapElement_Integer at remote 0x7f98dfe4b500>, <sage.libs.gap.element.GapElement_Integer at remote 0x7f98dfe48d80>])
    at /usr/src/debug/sci-mathematics/sagemath-standard-10.3/sagemath-standard-10.3-python3_12/build/cythonized/sage/libs/gap/element.c:38070
38070           Py_INCREF(r);

Create an object?

(gdb) continue
Continuing.

Thread 1 "sage-ipython" received signal SIGABRT, Aborted.
0x00007f99428617a7 in __GI_kill () at ../sysdeps/unix/syscall-template.S:120
120     T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)

Abort.

(gdb) continue
Continuing.

Thread 1 "sage-ipython" received signal SIGSEGV, Segmentation fault.
0x00007f98e609c56f in _Py_IsImmortal (op=0x0) at /usr/include/python3.12/object.h:242
242         return _Py_CAST(PY_INT32_T, op->ob_refcnt) < 0;

SIGSEGV.

Note that all two memory references happen before longjmp() (the ABORT).

Why did `gcc` generate unconditional NULL dereference here? I suspect it somehow inferred that `__pyx_t_6 = NULL;` in that branch, but not before comparison.
Comment 24 Richard Biener 2024-05-08 08:27:35 UTC
(In reply to Sergei Trofimovich from comment #23)
[...]
> Why did `gcc` generate unconditional NULL dereference here? I suspect it
> somehow inferred that `__pyx_t_6 = NULL;` in that branch, but not before
> comparison.

That's what happens if we isolate an unreachable path because of a NULL
dereference (like if exposed by jump-threading).  We make the NULL
dereference volatile so it stays but DCE/DSE can cleanup code on the path
leading to it.

If you run into such path the this might suggest that jump-threading triggered
a problem with the setjmp/longjmp, so it's then likely some condition that's
evaluated in a wrong way after the longjmp, either because a dependent
value wasn't properly preserved or by GCC breaking that.  Seeing stack memory
arguments used on a call in a previous comment I wondered if POSIX suggests
that even non-register variables need to be made volatile and thus whether
SRA or FRE might impose problems with code using setjmp/longjmp.
Comment 25 Sergei Trofimovich 2024-05-08 18:10:22 UTC
(In reply to Richard Biener from comment #24)
> (In reply to Sergei Trofimovich from comment #23)
> [...]
> > Why did `gcc` generate unconditional NULL dereference here? I suspect it
> > somehow inferred that `__pyx_t_6 = NULL;` in that branch, but not before
> > comparison.
> 
> That's what happens if we isolate an unreachable path because of a NULL
> dereference (like if exposed by jump-threading).  We make the NULL
> dereference volatile so it stays but DCE/DSE can cleanup code on the path
> leading to it.
> 
> If you run into such path the this might suggest that jump-threading
> triggered
> a problem with the setjmp/longjmp, so it's then likely some condition that's
> evaluated in a wrong way after the longjmp, either because a dependent
> value wasn't properly preserved or by GCC breaking that.  Seeing stack memory
> arguments used on a call in a previous comment

Yeah, that makes sense. Having stared a bit more at
__pyx_pf_4sage_4libs_3gap_7element_19GapElement_Function_2__call__() I
think I get the problem now. We deal with the code similar to the following:

__pyx_pf_4sage_4libs_3gap_7element_19GapElement_Function_2__call__() {
    __pyx_t_6 = NULL;

    // the loop is not very important, but it forces `__pyx_t_6` initialization before `setjmp()`
    for (;;) {
        __pyx_t_6 = something();
        int done = use_pre_setjmp(__pyx_t_6);
        __pyx_t_6 = NULL;
        if (done) break;
    }

    int mode = setjmp(jb);

    switch (mode) {
      case 1: // longjmp() case
        break;
      case 0: // regular case (`case 3:` in real code)
        __pyx_t_6 = something_else(); // set __pyx_t_6 to non-zero
        int done = use_post_setjmp(__pyx_t_6); // call longjmp(jb, 1) here
        __pyx_t_6 = NULL;
        break;
    }

    // get here via longjmp()
    if (__pyx_t_6 != NULL) deref(__pyx_t_6);
}

AFAIU `gcc` is smart enough to see that all paths to `deref()` reach with
`__pyx_t_6 = NULL`, but it does not eliminate the `deref()` entirely
and uses `if (__pyx_t_6 != NULL) deref(NULL);` as Richard explained above.

Now due to `longjmp()` `__pyx_t_6 = NULL;` does not get executed (even
though it's present in assembly code as `movq $0, -200(%rbp)` in all the
places where it's present in C code.

As a result after the `longjmp()` `__pyx_t_6` is not `NULL` and we get
to `deref(NNULL)` and SIGSEGV.

Thus it's a matter of missing `volatile __pyx_t_6`. Sounds about right?

> I wondered if POSIX suggests
> that even non-register variables need to be made volatile and thus whether
> SRA or FRE might impose problems with code using setjmp/longjmp.

That matches my understanding as well. Would it be fair to say that sprinkling
`volatile` has to be done for every single local variable in the function to prevent
possible stack reuse? And that rule should extend to the functions that could
host an inline variant of the function using setjmp()/longjmp() and not just
immediate caller of setjmp()/longjmp()?
Comment 26 Dmitrii Pasechnik 2024-05-08 19:17:49 UTC
We have megabytes of code calling libraries where setjmp/longjmp is used,
in github.com/sagemath/sage/ (most of it in Cython).

It looks like a huge hassle to put volatiles there. 
Looks like we would need ways to disable particular optimisations which lead to these sorts of errors:-(
Comment 27 Andrew Pinski 2024-05-08 19:35:13 UTC
https://pubs.opengroup.org/onlinepubs/7908799/xsh/longjmp.html


The requirement of using volatile has been there a long time before your code was written ...
Comment 28 Jakub Jelinek 2024-05-08 19:58:26 UTC
(In reply to Dmitrii Pasechnik from comment #26)
> We have megabytes of code calling libraries where setjmp/longjmp is used,
> in github.com/sagemath/sage/ (most of it in Cython).
> 
> It looks like a huge hassle to put volatiles there. 
> Looks like we would need ways to disable particular optimisations which lead
> to these sorts of errors:-(

You mean turn off all optimizations then?  -O0.
Even that doesn't guarantee it will work if some such variables are declared with register keyword.

It isn't just POSIX which says this, e.g. C99 also says:
"All accessible objects have values, and all other components of the abstract machine
have state, as of the time the longjmp function was called, except that the values of
objects of automatic storage duration that are local to the function containing the
invocation of the corresponding setjmp macro that do not have volatile-qualified type
and have been changed between the setjmp invocation and longjmp call are
indeterminate."
C89 said pretty much the same.