Bug 32450 - -pg causes miscompilation
Summary: -pg causes miscompilation
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.3.0
: P3 normal
Target Milestone: ---
Assignee: Uroš Bizjak
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords: patch, wrong-code
Depends on:
Blocks: 29975
  Show dependency treegraph
 
Reported: 2007-06-21 09:03 UTC by Joost VandeVondele
Modified: 2007-07-06 11:16 UTC (History)
2 users (show)

See Also:
Host:
Target: i686-unknown-linux-gnu
Build:
Known to work:
Known to fail: 4.3.0
Last reconfirmed: 2007-07-04 20:10:22


Attachments
bad assembly (10.06 KB, text/plain)
2007-07-04 19:02 UTC, Joost VandeVondele
Details
good assembly (10.05 KB, text/plain)
2007-07-04 19:03 UTC, Joost VandeVondele
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Joost VandeVondele 2007-06-21 09:03:03 UTC
CP2K (see PR 29975) compiles fine with current trunk using 

'-O3 -ffast-math -ftree-vectorize -march=native -funroll-loops'

but the code stops with a bogus error message using

'-O3 -ffast-math -ftree-vectorize -march=native -funroll-loops -pg'

to reproduce see comment 112 in PR 29975 to obtain the code and the example inputs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29975#c112

I've tested this using:

Target: x86_64-unknown-linux-gnu
Configured with: /data/vondele/gcc_trunk/gcc/configure --prefix=/data/vondele/gcc_trunk/build --enable-languages=c,fortran --with-mpfr=/data/programs/mpfr/
Thread model: posix
gcc version 4.3.0 20070620 (experimental)
Comment 1 Andrew Pinski 2007-06-21 09:07:16 UTC
Most likely -pg is changing the alignment of the stack which is incorrect.  Oh -pg code is emitted by the target specific code so this is a target issue.
Comment 2 Joost VandeVondele 2007-06-21 10:16:18 UTC
(In reply to comment #1)
> Most likely -pg is changing the alignment of the stack which is incorrect.  Oh
> -pg code is emitted by the target specific code so this is a target issue.
> 

Is there an easy way for me to check this ?

Meanwhile, I find that '-O1 -march=native -pg' compiles/executes OK.
Comment 3 Joost VandeVondele 2007-06-21 14:28:09 UTC
this is the list of options I have tested, the comment indicates if this yields a failure or not, basically, you need -O2 and -march=native to trigger the bug using '-O1 -march=native -pg' or '-O2 -pg' are not sufficient 

# OK
FCFLAGS="-O3 -ffast-math -ftree-vectorize -funroll-loops -march=native"
# wrong
FCFLAGS="-O3 -ffast-math -ftree-vectorize -march=native -funroll-loops -pg"
# OK
FCFLAGS="-O1 -march=native -pg"
# OK
FCFLAGS="-O1 -march=native -ftree-vectorize -pg"
# wrong
FCFLAGS="-O2 -march=native -ftree-vectorize -pg"
# wrong
FCFLAGS="-O2 -march=native -pg"
# OK
FCFLAGS="-O2 -pg"
Comment 4 Francois-Xavier Coudert 2007-06-27 08:58:10 UTC
(In reply to comment #3)
> basically, you need -O2 and -march=native to trigger the bug

I can't reproduce that, what is your processor exactly? (ie what is "native" for you)
Comment 5 Joost VandeVondele 2007-06-27 10:34:44 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > basically, you need -O2 and -march=native to trigger the bug
> 
> I can't reproduce that, what is your processor exactly? (ie what is "native"
> for you)
> 
... here is a suggestion for the gcc crew ... what about having gfortran -v also print the value fo -march=native. Honestly, I often don't now the precise CPU I'm running on (that's why I find -march=native useful in the first place). In this case, it is either on a intel core 2 duo or on an opteron. I'll see if I can figure out where I did these runs ....
Comment 6 Andrew Pinski 2007-06-27 11:25:55 UTC
> ... here is a suggestion for the gcc crew ... what about having gfortran -v

When you invoke gfortran with -v march=native and with a source file, it will show the values.  This is the recommended way of showing how you involved gcc/gfortran anyways.
Comment 7 Francois-Xavier Coudert 2007-06-27 11:38:19 UTC
(In reply to comment #6)
> When you invoke gfortran with -v march=native and with a source file, it will
> show the values.  This is the recommended way of showing how you involved
> gcc/gfortran anyways.

Yes, that works fine:

$ gcc -c -v -march=native a.c
[...]
/path/to/cc1 -quiet -v a.c -march=k8 -msahf --param l1-cache-size=1024 --param l1-cache-line-size=64 -mtune=k8 -quiet -dumpbase a.c -auxbase a -version -o /tmp/ccYMbEr2.s
Comment 8 Joost VandeVondele 2007-06-27 12:15:57 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > When you invoke gfortran with -v march=native and with a source file, it will

right.. that shows:

gfortran --verbose -O2 -march=native -pg all.f90
Driving: gfortran -v -O2 -march=native -pg all.f90 -lgfortranbegin -lgfortran -lm -shared-libgcc
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: /data/vondele/gcc_trunk/gcc/configure --prefix=/data/vondele/gcc_trunk/build --enable-languages=c,fortran --with-mpfr=/data/programs/mpfr/
Thread model: posix
gcc version 4.3.0 20070626 (experimental)
 /data/vondele/gcc_trunk/build/libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/f951 all.f90 -march=core2 -mcx16 -msahf --param l1-cache-size=512 --param l1-cache-line-size=64 -mtune=core2 -quiet -dumpbase all.f90 -auxbase all -O2 -version -p -fintrinsic-modules-path /data/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.3.0/finclude -o /tmp/ccLxqkOP.s
GNU F95 version 4.3.0 20070626 (experimental) (x86_64-unknown-linux-gnu)
        compiled by GNU C version 4.3.0 20070626 (experimental), GMP version 4.1.4, MPFR version 2.2.1.
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096


and that also leads to the executable which fails as:
bench02> ../src/cp2k.sopt JAC_gen.inp

 CP2K| Stopped by processor number                                             0
 CP2K|  cp_log_handling:cp_add_default_loggertoo many default loggers, increase max_stack_pointer in cp_log_handling
 CP2K| Error number was                                                      100

STOP mp_stop

Comment 9 Uroš Bizjak 2007-07-04 14:11:26 UTC
(In reply to comment #1)
> Most likely -pg is changing the alignment of the stack which is incorrect.  Oh
> -pg code is emitted by the target specific code so this is a target issue.

Hm, on x86_64 pg inserts:

fprintf (file, "\tleaq\t%sP%d@(%%rip),%%r11\n", LPREFIX, labelno);
fprintf (file, "\tcall\t*%s@GOTPCREL(%%rip)\n", MCOUNT_NAME);

So, it loads %r11 and calls mcount. The only thing that can go wrong is, that some value in %r11 gets rewritten. Could you look what happens with %r11 around the point of failure?

Comment 10 Uroš Bizjak 2007-07-04 14:16:35 UTC
(In reply to comment #9)

> So, it loads %r11 and calls mcount. The only thing that can go wrong is, that
> some value in %r11 gets rewritten.

Not even that because x86_64 is a NO_PROFILE_COUNTERS by default.
Comment 11 Uroš Bizjak 2007-07-04 14:26:44 UTC
Hm...

--cut here--
// just a stupid testcase, don't bother with source
long long test(long long a, long long b)
{
  return a / b;
}
--cut here--

cc1 -O2:

test:
.LFB2:
        movq    %rdi, %rdx
        movq    %rdi, %rax
        sarq    $63, %rdx
        idivq   %rsi
        ret

cc1 -O1 -p:
test:
.LFB2:
        pushq   %rbp
.LCFI0:
        movq    %rsp, %rbp
.LCFI1:
        call    mcount
        movq    %rdi, %rdx
        movq    %rdi, %rax
        sarq    $63, %rdx
        idivq   %rsi
        leave
        ret


cc1 -O2 -p
test:
.LFB2:
        pushq   %rbp
.LCFI0:
        movq    %rsp, %rbp
.LCFI1:
        call    mcount
        leave                       <<<<
        movq    %rdi, %rdx
        movq    %rdi, %rax
        sarq    $63, %rdx
        idivq   %rsi
        ret

Just a wild guess, could this depend on PR32450? Could you check if there is an access to stack after leave insn?
Comment 12 Joost VandeVondele 2007-07-04 14:51:19 UTC
(In reply to comment #11)
>
> Just a wild guess, could this depend on PR32450? Could you check if there is an
> access to stack after leave insn?
>
Hi Uros,

thanks for looking into this, but I'm afraid I don't really understand what you're asking for. Also the PR mentioned in the above comment is a circular reference. 

Comment 13 Joost VandeVondele 2007-07-04 18:51:02 UTC
just checked that current trunk (Wed Jul  4 17:21:37 UTC 2007 (revision 126328)) still exhibits the same problem. I don't see the same problem on an opteron, only on a core2 (both using -march=native), so it could be something specific to core2?
Comment 14 Joost VandeVondele 2007-07-04 19:02:47 UTC
Created attachment 13847 [details]
bad assembly
Comment 15 Joost VandeVondele 2007-07-04 19:03:21 UTC
Created attachment 13848 [details]
good assembly
Comment 16 Joost VandeVondele 2007-07-04 19:09:49 UTC
I've added the assembly as obtained from

 1044  gfortran -O2 -pg -S cp_log_handling.f90
 1045  mv cp_log_handling.s g.s
 1046  gfortran -O2 -pg -march=native -S cp_log_handling.f90
 1047  mv cp_log_handling.s b.s

(i.e. the good and the bad case) for the file that triggers the stop. The stop is triggered by a check on :

  INTEGER, PRIVATE            :: stack_pointer=0
  INTEGER, PARAMETER, PRIVATE :: max_stack_pointer=10

[...]
    IF (stack_pointer+1>max_stack_pointer) THEN
       CALL mp_stop(100,routineP//&
            "too many default loggers, increase max_stack_pointer in "//moduleN)
    ENDIF

it must be stack_pointer that gets corrupted somehow.
Comment 17 Joost VandeVondele 2007-07-04 19:17:22 UTC
actually, in gdb, I find that the variable is not corrupted: 
(gdb) bt
#0  0x000000000089dc94 in __message_passing_MOD_mp_stop ()
#1  0x000000000050ed2e in __cp_log_handling_MOD_cp_add_default_logger ()
#2  0x000000000059a897 in __f77_interface_MOD_init_cp2k ()
#3  0x0000000000d49638 in MAIN__ ()
#4  0x0000000000e8786c in main (argc=2, argv=0x7fff1e9b6768) at /data/vondele/gcc_trunk/gcc/libgfortran/fmain.c:22
(gdb) up
#1  0x000000000050ed2e in __cp_log_handling_MOD_cp_add_default_logger ()
(gdb) print __cp_log_handling_MOD_stack_pointer
$1 = 0

(the variable should be 0), but it looks like the assembly might be wrong:

g.s:

.LCFI15:
        call    mcount
        cmpl    $9, __cp_log_handling_MOD_stack_pointer(%rip)
        movq    %rdi, %rbx
        jle     .L21
        movl    $108, %edx
        movl    $.LC8, %esi
        movl    $.LC9, %edi
        call    __message_passing_MOD_mp_stop

b.s:

.LCFI15:
        cmpl    $9, __cp_log_handling_MOD_stack_pointer(%rip)
        call    mcount
        movq    %rdi, %rbx
        jle     .L21
        movl    $108, %edx
        movl    $.LC8, %esi
        movl    $.LC9, %edi
        call    __message_passing_MOD_mp_stop
Comment 18 Uroš Bizjak 2007-07-04 19:58:43 UTC
(In reply to comment #12)

> thanks for looking into this, but I'm afraid I don't really understand what
> you're asking for. Also the PR mentioned in the above comment is a circular
> reference. 

Sorry for the confusion, I was thinking on PR 32475, "function with asm() does not setup stack frame".
Comment 19 pinskia@gmail.com 2007-07-04 20:07:02 UTC
Subject: Re:  -pg seemingly causes miscompilation

On 4 Jul 2007 19:17:22 -0000, jv244 at cam dot ac dot uk
<gcc-bugzilla@gcc.gnu.org> wrote:
> b.s:
>
> .LCFI15:
>         cmpl    $9, __cp_log_handling_MOD_stack_pointer(%rip)
>         call    mcount
>         movq    %rdi, %rbx
>         jle     .L21

This is obviosuly wrong as the call will most likely clobber the flags register.

-- Pinski
Comment 20 Uroš Bizjak 2007-07-04 20:10:21 UTC
(In reply to comment #17)

> (the variable should be 0), but it looks like the assembly might be wrong:

> b.s:

>  (1)    cmpl    $9, __cp_log_handling_MOD_stack_pointer(%rip)
>         call    mcount
>         movq    %rdi, %rbx
>  (2)    jle     .L21

This _is_ wrong assembly. The call to mcount is clobbering flags reg between CC setting insn (1) and CC receiving insn (2). From there, the problem becomes trivial to track down, but I _really_ need a good sleep now.
Comment 21 Joost VandeVondele 2007-07-05 05:01:02 UTC
This is a small testcase (that aborts if miscompiled):

> cat test.f90
MODULE cp_log_handling
  INTEGER, PRIVATE            :: stack_pointer=0
  INTEGER, PARAMETER, PRIVATE :: max_stack_pointer=10
CONTAINS
  SUBROUTINE cp_add_default_logger()
    IF (stack_pointer+1>max_stack_pointer) THEN
       CALL mp_stop()
    ENDIF
    stack_pointer=stack_pointer+1
  END SUBROUTINE cp_add_default_logger
  SUBROUTINE mp_stop()
     CALL ABORT()
  END SUBROUTINE
END MODULE
USE cp_log_handling
CALL cp_add_default_logger()
END

gfortran -v -O2 -pg -march=native test.f90
Driving: gfortran -v -O2 -pg -march=native test.f90 -lgfortranbegin -lgfortran -lm -shared-libgcc
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: /data/vondele/gcc_trunk/gcc/configure --prefix=/data/vondele/gcc_trunk/build --enable-languages=c,fortran --with-mpfr=/data/programs/mpfr/
Thread model: posix
gcc version 4.3.0 20070704 (experimental)
 /data/vondele/gcc_trunk/build/libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/f951 test.f90 -march=core2 -mcx16 -msahf --param l1-cache-size=512 --param l1-cache-line-size=64 -mtune=core2 -quiet -dumpbase test.f90 -auxbase test -O2 -version -p -fintrinsic-modules-path /data/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.3.0/finclude -o /tmp/cckJso3I.s
GNU F95 version 4.3.0 20070704 (experimental) (x86_64-unknown-linux-gnu)
        compiled by GNU C version 4.3.0 20070704 (experimental), GMP version 4.1.4, MPFR version 2.2.1.
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
 as -V -Qy -o /tmp/ccA9EOak.o /tmp/cckJso3I.s
GNU assembler version 2.16.91.0.5 (x86_64-suse-linux) using BFD version 2.16.91.0.5 20051219 (SUSE Linux)
 /data/vondele/gcc_trunk/build/libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/collect2 --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib/../lib64/gcrt1.o /usr/lib/../lib64/crti.o /data/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.3.0/crtbegin.o -L/data/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.3.0 -L/data/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.3.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/data/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.3.0/../../.. /tmp/ccA9EOak.o -lgfortranbegin -lgfortran -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /data/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.3.0/crtend.o /usr/lib/../lib64/crtn.o
> ./a.out
Aborted
Comment 22 Joost VandeVondele 2007-07-05 09:27:52 UTC
It really seems core2 specific, -march=nocona doesn't fail.
Comment 23 Uroš Bizjak 2007-07-05 09:29:43 UTC
This is generic sched2 problem and affects all targets. The problem is that the call to mcount is emitted in final pass, so scheduler can't see it, and is free to move insns before

(note 26 25 2 NOTE_INSN_PROLOGUE_END)

that marks "call mcount" insertion point.

It happens that on core2, scheduler moves compare before the note, and we got:

.LCFI5:
        cmpl    $9, __cp_log_handling_MOD_stack_pointer(%rip)
        call    mcount
        jle     .L6

Using -fno-schedule-insns2 fixes this problem.

There are two ways to fix this problem:

- The gcc way is to insert clobber of CC reg after the prologue end note.

- The libc way is to fix _mcount to really do what it claims to do:

/* We need a special version of the `mcount' function since for ix86 it
   must not clobber any register...

and also save/restore flags reg to the stack.
Comment 24 Uroš Bizjak 2007-07-05 09:41:52 UTC
glibc bug report at:

http://sources.redhat.com/bugzilla/show_bug.cgi?id=4744
Comment 25 Uroš Bizjak 2007-07-05 09:44:04 UTC
Also fails for 32bit builds on core2 target:

__cp_log_handling_MOD_cp_add_default_logger:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        cmpl    $9, __cp_log_handling_MOD_stack_pointer
        call    mcount
        jle     .L6
        call    __cp_log_handling_MOD_mp_stop

Comment 26 Joost VandeVondele 2007-07-05 10:50:18 UTC
adding a c testcase:

int stack_pointer=0;
void mystop()
{
 abort();
}
void add()
{
 if (stack_pointer+1>10)
 {
   mystop();
 };
 stack_pointer=stack_pointer+1;
}
int main()
{
 add();
 return stack_pointer-1 ;
}

> gcc -O2  -pg -march=native test.c
> ./a.out
Aborted
Comment 27 Uroš Bizjak 2007-07-05 17:11:30 UTC
Patch at http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00460.html.
Comment 28 Joost VandeVondele 2007-07-05 17:35:55 UTC
(In reply to comment #27)
> Patch at http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00460.html.
> 
without knowing C to well, don't you need to initialize stack_pointer in the testcase you've added to the patch?
Comment 29 Uroš Bizjak 2007-07-05 17:42:26 UTC
(In reply to comment #28)

> without knowing C to well, don't you need to initialize stack_pointer in the
> testcase you've added to the patch?

No, written this way it is zero by default.
Comment 30 Joost VandeVondele 2007-07-05 17:52:21 UTC
(In reply to comment #29)
> (In reply to comment #28)
> 
> > without knowing C to well, don't you need to initialize stack_pointer in the
> > testcase you've added to the patch?
> 
> No, written this way it is zero by default.
> 

:-) you might have noticed from my bug reports I'm not a real C programmer...

thanks for fixing this bug quickly.
Comment 31 Uroš Bizjak 2007-07-06 09:24:26 UTC
Author: uros
Date: Fri Jul  6 08:53:15 2007
New Revision: 126403

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126403
Log:
	PR rtl_optimization/32450
	* function.c (thread_prologue_and_epilogue_insns): Emit blockage insn
	to ensure that instructions are not moved into the prologue when
	profiling is on.  Remove unused prologue_end variable.
	(expand_function_end): Emit blockage insn instead of ASM_INPUT rtx
	as a scheduling barrier.

testsuite/ChangeLog:

	PR rtl_optimization/32450
	* gcc.dg/pr32450.c: New runtime test.


Added:
    trunk/gcc/testsuite/gcc.dg/pr32450.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/function.c
    trunk/gcc/testsuite/ChangeLog
Comment 32 uros 2007-07-06 09:50:06 UTC
Subject: Bug 32450

Author: uros
Date: Fri Jul  6 09:49:52 2007
New Revision: 126407

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126407
Log:
        PR rtl-optimization/32450
        * function.c (thread_prologue_and_epilogue_insns): Emit blockage insn
        to ensure that instructions are not moved into the prologue when
        profiling is on.

testsuite/ChangeLog:

        PR rtl-optimization/32450
        * gcc.dg/pr32450.c: New runtime test.


Added:
    branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/pr32450.c
      - copied, changed from r126403, trunk/gcc/testsuite/gcc.dg/pr32450.c
Modified:
    branches/gcc-4_2-branch/gcc/ChangeLog
    branches/gcc-4_2-branch/gcc/function.c
    branches/gcc-4_2-branch/gcc/testsuite/ChangeLog

Comment 33 uros 2007-07-06 10:54:18 UTC
Subject: Bug 32450

Author: uros
Date: Fri Jul  6 10:54:03 2007
New Revision: 126411

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126411
Log:
        PR rtl-optimization/32450
        * function.c (thread_prologue_and_epilogue_insns): Emit blockage insn
        to ensure that instructions are not moved into the prologue when
        profiling is on.

testsuite/ChangeLog:

        PR rtl-optimization/32450
        * gcc.dg/pr32450.c: New runtime test.


Added:
    branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/pr32450.c
      - copied unchanged from r126407, branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/pr32450.c
Modified:
    branches/gcc-4_1-branch/gcc/ChangeLog
    branches/gcc-4_1-branch/gcc/function.c
    branches/gcc-4_1-branch/gcc/testsuite/ChangeLog

Comment 34 Uroš Bizjak 2007-07-06 11:16:01 UTC
Fixed everywhere.