[Bug fortran/100799] New: Stackoverflow in optimized code on PPC

alexander.grund@tu-dresden.de gcc-bugzilla@gcc.gnu.org
Thu May 27 11:20:38 GMT 2021


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100799

            Bug ID: 100799
           Summary: Stackoverflow in optimized code on PPC
           Product: gcc
           Version: 10.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: alexander.grund@tu-dresden.de
  Target Milestone: ---

Created attachment 50879
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50879&action=edit
Disassembly of dbgebal_ in debug and release modes

Quick summary of the use case: When using FlexiBLAS with OpenBLAS I noticed
corruption of the parameters passed to OpenBLAS functions. FlexiBLAS basically
provides a BLAS interface where each function is a stub that forwards the
arguments to a real BLAS lib, like OpenBLAS

Example:
void FC_GLOBAL(dgebal,DGEBAL)(char* job, blasint* n, double* a, blasint* lda,
blasint* ilo, blasint* ihi, double* scale, blasint* info)
{
        void (*fn) (void* job, void* n, void* a, void* lda, void* ilo, void*
ihi, void* scale, void* info);

        fn = current_backend->lapack.dgebal.f77_blas_function; 

                fn((void*) job, (void*) n, (void*) a, (void*) lda, (void*) ilo,
(void*) ihi, (void*) scale, (void*) info); 

        return;
}
void dgebal(char* job, blasint* n, double* a, blasint* lda, blasint* ilo,
blasint* ihi, double* scale, blasint* info)
__attribute__((alias(MTS(FC_GLOBAL(dgebal,DGEBAL)))));

Due to the alias and the real BLAS lib being loader after FlexiBLAS also the
calls from an OpenBLAS function to another OpenBLAS function get routed through
FlexiBLAS.

Now I noticed that the parameter "N" at
https://github.com/xianyi/OpenBLAS/blob/v0.3.15/lapack-netlib/SRC/dgeev.f#L369
gets messed up during the call at
https://github.com/xianyi/OpenBLAS/blob/v0.3.15/lapack-netlib/SRC/dgeev.f#L363
which I traced to FlexiBLAS pushing the register that holds it, calling the
OpenBLAS DGEBAL and restoring it afterwards but the stack entry where it came
from gets changed by DGEBAL

So the actual Bug here is that GCC generates code for DGEBAL which uses a write
outside of the allocated stack.

The dissassembly of the dgebal_ function shows "stdu    r1,-368(r1)" in the
prologue and "std     r25,440(r1)" later, which is the instruction that
overwrites the saved register from the calling function.
As far as I can tell an offset of 440 onto r1, which is bigger than the 368
"allocated" by the stdu is invalid.
The line reported by GDB for the overwriting instruction is
https://github.com/xianyi/OpenBLAS/blob/v0.3.15/lapack-netlib/SRC/dgebal.f#L328

The command used to compile the file is: gfortran -fno-math-errno -Wall
-frecursive -fno-optimize-sibling-calls -m64 -fopenmp -fPIC -O2 -fno-fast-math
-mcpu=power9 -mtune=power9  -DUSE_OPENMP -fopenmp -fno-optimize-sibling-calls
-g  -c -o dgebal.o dgebal.f

Replacing the "O2" by "Og" changes the prologue to "stdu    r1,-336(r1)" and
the max offset used for std on r1 is 328. Using this works with FlexiBLAS,
hence I suspect an optimization issue which leads to more spills but doesn't
update the stack size.

Reproduced with GCC 10.2.0, 10.3.0, 11.1.0


More information about the Gcc-bugs mailing list