CP2K (see PR 29975) compiles fine with current trunk using '-O3 -ffast-math -ftree-vectorize -march=native -funroll-loops' but the code stops with a bogus error message using '-O3 -ffast-math -ftree-vectorize -march=native -funroll-loops -pg' to reproduce see comment 112 in PR 29975 to obtain the code and the example inputs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29975#c112 I've tested this using: Target: x86_64-unknown-linux-gnu Configured with: /data/vondele/gcc_trunk/gcc/configure --prefix=/data/vondele/gcc_trunk/build --enable-languages=c,fortran --with-mpfr=/data/programs/mpfr/ Thread model: posix gcc version 4.3.0 20070620 (experimental)
Most likely -pg is changing the alignment of the stack which is incorrect. Oh -pg code is emitted by the target specific code so this is a target issue.
(In reply to comment #1) > Most likely -pg is changing the alignment of the stack which is incorrect. Oh > -pg code is emitted by the target specific code so this is a target issue. > Is there an easy way for me to check this ? Meanwhile, I find that '-O1 -march=native -pg' compiles/executes OK.
this is the list of options I have tested, the comment indicates if this yields a failure or not, basically, you need -O2 and -march=native to trigger the bug using '-O1 -march=native -pg' or '-O2 -pg' are not sufficient # OK FCFLAGS="-O3 -ffast-math -ftree-vectorize -funroll-loops -march=native" # wrong FCFLAGS="-O3 -ffast-math -ftree-vectorize -march=native -funroll-loops -pg" # OK FCFLAGS="-O1 -march=native -pg" # OK FCFLAGS="-O1 -march=native -ftree-vectorize -pg" # wrong FCFLAGS="-O2 -march=native -ftree-vectorize -pg" # wrong FCFLAGS="-O2 -march=native -pg" # OK FCFLAGS="-O2 -pg"
(In reply to comment #3) > basically, you need -O2 and -march=native to trigger the bug I can't reproduce that, what is your processor exactly? (ie what is "native" for you)
(In reply to comment #4) > (In reply to comment #3) > > basically, you need -O2 and -march=native to trigger the bug > > I can't reproduce that, what is your processor exactly? (ie what is "native" > for you) > ... here is a suggestion for the gcc crew ... what about having gfortran -v also print the value fo -march=native. Honestly, I often don't now the precise CPU I'm running on (that's why I find -march=native useful in the first place). In this case, it is either on a intel core 2 duo or on an opteron. I'll see if I can figure out where I did these runs ....
> ... here is a suggestion for the gcc crew ... what about having gfortran -v When you invoke gfortran with -v march=native and with a source file, it will show the values. This is the recommended way of showing how you involved gcc/gfortran anyways.
(In reply to comment #6) > When you invoke gfortran with -v march=native and with a source file, it will > show the values. This is the recommended way of showing how you involved > gcc/gfortran anyways. Yes, that works fine: $ gcc -c -v -march=native a.c [...] /path/to/cc1 -quiet -v a.c -march=k8 -msahf --param l1-cache-size=1024 --param l1-cache-line-size=64 -mtune=k8 -quiet -dumpbase a.c -auxbase a -version -o /tmp/ccYMbEr2.s
(In reply to comment #7) > (In reply to comment #6) > > When you invoke gfortran with -v march=native and with a source file, it will right.. that shows: gfortran --verbose -O2 -march=native -pg all.f90 Driving: gfortran -v -O2 -march=native -pg all.f90 -lgfortranbegin -lgfortran -lm -shared-libgcc Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: /data/vondele/gcc_trunk/gcc/configure --prefix=/data/vondele/gcc_trunk/build --enable-languages=c,fortran --with-mpfr=/data/programs/mpfr/ Thread model: posix gcc version 4.3.0 20070626 (experimental) /data/vondele/gcc_trunk/build/libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/f951 all.f90 -march=core2 -mcx16 -msahf --param l1-cache-size=512 --param l1-cache-line-size=64 -mtune=core2 -quiet -dumpbase all.f90 -auxbase all -O2 -version -p -fintrinsic-modules-path /data/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.3.0/finclude -o /tmp/ccLxqkOP.s GNU F95 version 4.3.0 20070626 (experimental) (x86_64-unknown-linux-gnu) compiled by GNU C version 4.3.0 20070626 (experimental), GMP version 4.1.4, MPFR version 2.2.1. GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 and that also leads to the executable which fails as: bench02> ../src/cp2k.sopt JAC_gen.inp CP2K| Stopped by processor number 0 CP2K| cp_log_handling:cp_add_default_loggertoo many default loggers, increase max_stack_pointer in cp_log_handling CP2K| Error number was 100 STOP mp_stop
(In reply to comment #1) > Most likely -pg is changing the alignment of the stack which is incorrect. Oh > -pg code is emitted by the target specific code so this is a target issue. Hm, on x86_64 pg inserts: fprintf (file, "\tleaq\t%sP%d@(%%rip),%%r11\n", LPREFIX, labelno); fprintf (file, "\tcall\t*%s@GOTPCREL(%%rip)\n", MCOUNT_NAME); So, it loads %r11 and calls mcount. The only thing that can go wrong is, that some value in %r11 gets rewritten. Could you look what happens with %r11 around the point of failure?
(In reply to comment #9) > So, it loads %r11 and calls mcount. The only thing that can go wrong is, that > some value in %r11 gets rewritten. Not even that because x86_64 is a NO_PROFILE_COUNTERS by default.
Hm... --cut here-- // just a stupid testcase, don't bother with source long long test(long long a, long long b) { return a / b; } --cut here-- cc1 -O2: test: .LFB2: movq %rdi, %rdx movq %rdi, %rax sarq $63, %rdx idivq %rsi ret cc1 -O1 -p: test: .LFB2: pushq %rbp .LCFI0: movq %rsp, %rbp .LCFI1: call mcount movq %rdi, %rdx movq %rdi, %rax sarq $63, %rdx idivq %rsi leave ret cc1 -O2 -p test: .LFB2: pushq %rbp .LCFI0: movq %rsp, %rbp .LCFI1: call mcount leave <<<< movq %rdi, %rdx movq %rdi, %rax sarq $63, %rdx idivq %rsi ret Just a wild guess, could this depend on PR32450? Could you check if there is an access to stack after leave insn?
(In reply to comment #11) > > Just a wild guess, could this depend on PR32450? Could you check if there is an > access to stack after leave insn? > Hi Uros, thanks for looking into this, but I'm afraid I don't really understand what you're asking for. Also the PR mentioned in the above comment is a circular reference.
just checked that current trunk (Wed Jul 4 17:21:37 UTC 2007 (revision 126328)) still exhibits the same problem. I don't see the same problem on an opteron, only on a core2 (both using -march=native), so it could be something specific to core2?
Created attachment 13847 [details] bad assembly
Created attachment 13848 [details] good assembly
I've added the assembly as obtained from 1044 gfortran -O2 -pg -S cp_log_handling.f90 1045 mv cp_log_handling.s g.s 1046 gfortran -O2 -pg -march=native -S cp_log_handling.f90 1047 mv cp_log_handling.s b.s (i.e. the good and the bad case) for the file that triggers the stop. The stop is triggered by a check on : INTEGER, PRIVATE :: stack_pointer=0 INTEGER, PARAMETER, PRIVATE :: max_stack_pointer=10 [...] IF (stack_pointer+1>max_stack_pointer) THEN CALL mp_stop(100,routineP//& "too many default loggers, increase max_stack_pointer in "//moduleN) ENDIF it must be stack_pointer that gets corrupted somehow.
actually, in gdb, I find that the variable is not corrupted: (gdb) bt #0 0x000000000089dc94 in __message_passing_MOD_mp_stop () #1 0x000000000050ed2e in __cp_log_handling_MOD_cp_add_default_logger () #2 0x000000000059a897 in __f77_interface_MOD_init_cp2k () #3 0x0000000000d49638 in MAIN__ () #4 0x0000000000e8786c in main (argc=2, argv=0x7fff1e9b6768) at /data/vondele/gcc_trunk/gcc/libgfortran/fmain.c:22 (gdb) up #1 0x000000000050ed2e in __cp_log_handling_MOD_cp_add_default_logger () (gdb) print __cp_log_handling_MOD_stack_pointer $1 = 0 (the variable should be 0), but it looks like the assembly might be wrong: g.s: .LCFI15: call mcount cmpl $9, __cp_log_handling_MOD_stack_pointer(%rip) movq %rdi, %rbx jle .L21 movl $108, %edx movl $.LC8, %esi movl $.LC9, %edi call __message_passing_MOD_mp_stop b.s: .LCFI15: cmpl $9, __cp_log_handling_MOD_stack_pointer(%rip) call mcount movq %rdi, %rbx jle .L21 movl $108, %edx movl $.LC8, %esi movl $.LC9, %edi call __message_passing_MOD_mp_stop
(In reply to comment #12) > thanks for looking into this, but I'm afraid I don't really understand what > you're asking for. Also the PR mentioned in the above comment is a circular > reference. Sorry for the confusion, I was thinking on PR 32475, "function with asm() does not setup stack frame".
Subject: Re: -pg seemingly causes miscompilation On 4 Jul 2007 19:17:22 -0000, jv244 at cam dot ac dot uk <gcc-bugzilla@gcc.gnu.org> wrote: > b.s: > > .LCFI15: > cmpl $9, __cp_log_handling_MOD_stack_pointer(%rip) > call mcount > movq %rdi, %rbx > jle .L21 This is obviosuly wrong as the call will most likely clobber the flags register. -- Pinski
(In reply to comment #17) > (the variable should be 0), but it looks like the assembly might be wrong: > b.s: > (1) cmpl $9, __cp_log_handling_MOD_stack_pointer(%rip) > call mcount > movq %rdi, %rbx > (2) jle .L21 This _is_ wrong assembly. The call to mcount is clobbering flags reg between CC setting insn (1) and CC receiving insn (2). From there, the problem becomes trivial to track down, but I _really_ need a good sleep now.
This is a small testcase (that aborts if miscompiled): > cat test.f90 MODULE cp_log_handling INTEGER, PRIVATE :: stack_pointer=0 INTEGER, PARAMETER, PRIVATE :: max_stack_pointer=10 CONTAINS SUBROUTINE cp_add_default_logger() IF (stack_pointer+1>max_stack_pointer) THEN CALL mp_stop() ENDIF stack_pointer=stack_pointer+1 END SUBROUTINE cp_add_default_logger SUBROUTINE mp_stop() CALL ABORT() END SUBROUTINE END MODULE USE cp_log_handling CALL cp_add_default_logger() END gfortran -v -O2 -pg -march=native test.f90 Driving: gfortran -v -O2 -pg -march=native test.f90 -lgfortranbegin -lgfortran -lm -shared-libgcc Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: /data/vondele/gcc_trunk/gcc/configure --prefix=/data/vondele/gcc_trunk/build --enable-languages=c,fortran --with-mpfr=/data/programs/mpfr/ Thread model: posix gcc version 4.3.0 20070704 (experimental) /data/vondele/gcc_trunk/build/libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/f951 test.f90 -march=core2 -mcx16 -msahf --param l1-cache-size=512 --param l1-cache-line-size=64 -mtune=core2 -quiet -dumpbase test.f90 -auxbase test -O2 -version -p -fintrinsic-modules-path /data/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.3.0/finclude -o /tmp/cckJso3I.s GNU F95 version 4.3.0 20070704 (experimental) (x86_64-unknown-linux-gnu) compiled by GNU C version 4.3.0 20070704 (experimental), GMP version 4.1.4, MPFR version 2.2.1. GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 as -V -Qy -o /tmp/ccA9EOak.o /tmp/cckJso3I.s GNU assembler version 2.16.91.0.5 (x86_64-suse-linux) using BFD version 2.16.91.0.5 20051219 (SUSE Linux) /data/vondele/gcc_trunk/build/libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/collect2 --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib/../lib64/gcrt1.o /usr/lib/../lib64/crti.o /data/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.3.0/crtbegin.o -L/data/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.3.0 -L/data/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.3.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/data/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.3.0/../../.. /tmp/ccA9EOak.o -lgfortranbegin -lgfortran -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /data/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.3.0/crtend.o /usr/lib/../lib64/crtn.o > ./a.out Aborted
It really seems core2 specific, -march=nocona doesn't fail.
This is generic sched2 problem and affects all targets. The problem is that the call to mcount is emitted in final pass, so scheduler can't see it, and is free to move insns before (note 26 25 2 NOTE_INSN_PROLOGUE_END) that marks "call mcount" insertion point. It happens that on core2, scheduler moves compare before the note, and we got: .LCFI5: cmpl $9, __cp_log_handling_MOD_stack_pointer(%rip) call mcount jle .L6 Using -fno-schedule-insns2 fixes this problem. There are two ways to fix this problem: - The gcc way is to insert clobber of CC reg after the prologue end note. - The libc way is to fix _mcount to really do what it claims to do: /* We need a special version of the `mcount' function since for ix86 it must not clobber any register... and also save/restore flags reg to the stack.
glibc bug report at: http://sources.redhat.com/bugzilla/show_bug.cgi?id=4744
Also fails for 32bit builds on core2 target: __cp_log_handling_MOD_cp_add_default_logger: pushl %ebp movl %esp, %ebp subl $8, %esp cmpl $9, __cp_log_handling_MOD_stack_pointer call mcount jle .L6 call __cp_log_handling_MOD_mp_stop
adding a c testcase: int stack_pointer=0; void mystop() { abort(); } void add() { if (stack_pointer+1>10) { mystop(); }; stack_pointer=stack_pointer+1; } int main() { add(); return stack_pointer-1 ; } > gcc -O2 -pg -march=native test.c > ./a.out Aborted
Patch at http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00460.html.
(In reply to comment #27) > Patch at http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00460.html. > without knowing C to well, don't you need to initialize stack_pointer in the testcase you've added to the patch?
(In reply to comment #28) > without knowing C to well, don't you need to initialize stack_pointer in the > testcase you've added to the patch? No, written this way it is zero by default.
(In reply to comment #29) > (In reply to comment #28) > > > without knowing C to well, don't you need to initialize stack_pointer in the > > testcase you've added to the patch? > > No, written this way it is zero by default. > :-) you might have noticed from my bug reports I'm not a real C programmer... thanks for fixing this bug quickly.
Author: uros Date: Fri Jul 6 08:53:15 2007 New Revision: 126403 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126403 Log: PR rtl_optimization/32450 * function.c (thread_prologue_and_epilogue_insns): Emit blockage insn to ensure that instructions are not moved into the prologue when profiling is on. Remove unused prologue_end variable. (expand_function_end): Emit blockage insn instead of ASM_INPUT rtx as a scheduling barrier. testsuite/ChangeLog: PR rtl_optimization/32450 * gcc.dg/pr32450.c: New runtime test. Added: trunk/gcc/testsuite/gcc.dg/pr32450.c Modified: trunk/gcc/ChangeLog trunk/gcc/function.c trunk/gcc/testsuite/ChangeLog
Subject: Bug 32450 Author: uros Date: Fri Jul 6 09:49:52 2007 New Revision: 126407 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126407 Log: PR rtl-optimization/32450 * function.c (thread_prologue_and_epilogue_insns): Emit blockage insn to ensure that instructions are not moved into the prologue when profiling is on. testsuite/ChangeLog: PR rtl-optimization/32450 * gcc.dg/pr32450.c: New runtime test. Added: branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/pr32450.c - copied, changed from r126403, trunk/gcc/testsuite/gcc.dg/pr32450.c Modified: branches/gcc-4_2-branch/gcc/ChangeLog branches/gcc-4_2-branch/gcc/function.c branches/gcc-4_2-branch/gcc/testsuite/ChangeLog
Subject: Bug 32450 Author: uros Date: Fri Jul 6 10:54:03 2007 New Revision: 126411 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126411 Log: PR rtl-optimization/32450 * function.c (thread_prologue_and_epilogue_insns): Emit blockage insn to ensure that instructions are not moved into the prologue when profiling is on. testsuite/ChangeLog: PR rtl-optimization/32450 * gcc.dg/pr32450.c: New runtime test. Added: branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/pr32450.c - copied unchanged from r126407, branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/pr32450.c Modified: branches/gcc-4_1-branch/gcc/ChangeLog branches/gcc-4_1-branch/gcc/function.c branches/gcc-4_1-branch/gcc/testsuite/ChangeLog
Fixed everywhere.