A customer reported that compiling SPEC95 using FDO (i.e. -fprofile-generate and -fprofile-use) and -mabi=32 causes 146.wave5 to fail. I don't think I can distribute SPEC95 sources, but I have managed to create a small C testcase which I will attach. The problem here is a bit involved, and depends on several aspects of the MIPS ABI. The value profiling code is enabled by -fprofile-generate. This code instruments divides by registers, to check for cases where we are always dividing by the same value. Such cases can be optimized to use constant divides instead. This code emits DImode comparisons. For a 32-bit MIPS target, this means a cmpdi2 libcall. If you have a divide inside a leaf function, it will no longer be a leaf function after profile.c runs. This happens after virtual register instantiation, and before reload. On the MIPS, the ABI traditionally makes the FP equal to the SP immediately after the prologue. Thus STARTING_FRAME_OFFSET includes the size of the outgoing args and space for the gp save area. In this case, during virtual register elimination, current_function_outgoing_args_size is 0, so the local variables are placed at offset 8 (4 bytes for gp save area, 4 bytes for stack alignment). When we get to reload, current_function_outgoing_args_size is now 16. This is because cmpdi2 needs 16 bytes of arguments, and ABI32 allocates stack space for arguments in registers. INITIAL_ELIMINATION_OFFSET defines the offset between the fp and sp as zero. The FP gets eliminated, and now the local args start at sp+8. Meanwhile, compute_frame_size sees that we have 16 bytes of outgoing args, so the gp save area is placed at sp+16. We now have the problem that the local args area overlaps the gp save area. If you store to a local arg that overlaps the gp save area, call a function, and then restore gp after the function returns, then the gp is clobbered. The next use of the gp results in a bus error/segmentation fault or similar problem. If we want the ability to make libcalls in leaf functions, then it seems to me that this is a flaw in the MIPS ABI. We can't eliminate the FP to be the same as the SP. At best, we can make the FP be SP+outgoing args size. This implies that we remove current_function_outgoing_args_size from STARTING_FRAME_OFFSET, and add it to INITIAL_ELIMINATION_OFFSET. However, this is effectively an ABI change. This will perhaps effect EH info, debugging info, debuggers (that disassemble prologues), asms that make assumptions about FP values, and perhaps other things. It isn't clear that this change is safe or desirable. An alternative solution is to eliminate the cmpdi2 libcalls. One way to do this is to modify mips.md to add a 32-bit cmpdi2 pattern. I believe this can be done in 8 instructions, 4 for a subtract, and 4 to reduce the result to -1/0/1. This is a bit large for a define_expand/define_insn though. It isn't clear if this is desirable. Another way to eliminate the libcalls would be to change gcov_type from DImode to SImode. However, this potentially causes compatibility problems, as it will change the format of some profiling files. There are potential cross compilation problems. This overrides a deliberate decision by the profile maintainers to use DImode to avoid overflowing SImode counters. It isn't clear that this is desirable. Another solution is to just add extra bytes to leaf function stack frames when profiling. This isn't very elegant, but seems like the simplest and safest solution I can come up with. Since profiled code will already be slow, wasting a few bytes of stack space should not be an issue. If we always assume that a leaf function has 16 bytes of stack space, then the resulting code will still work if a cmpdi2 libcall is emitted. This is potentially fragile, as it assumes the profiling code will never emit a libcall that uses more than 16 bytes of stack space, but then I already said this was inelegant. I will attach a patch for this. This works for the testcase. I will be testing this against SPEC95 to make sure it works for all cases.
Created attachment 6672 [details] Reduced C language testcase Compile this with -O -fprofile-generate -mabi=32 on any mips*-linux-gnu system, and run it. It will generate a bus error because the gp reg was clobbered. This fails with both gcc-3.4.x and mainline.
Created attachment 6673 [details] pad STARTING_FRAME_OFFSET when profiling This is an inelegant untested patch that makes the testcase work.
Subject: Re: value profiling clobbers gp on mips At Fri, 2 Jul 2004 03:22:10 +0000 (UTC), "wilson at gcc dot gnu dot org" wrote: > This is an inelegant untested patch that makes the testcase work. Cute bug. As for the workaround: o64 will suffer the same problem that o32 does, since it's basically o32 w/ 64-bit GPRs (and therefore, like o32, is required to leave space for a0..a3 on the stack for use by called functions). I don't know if EABI does the same thing. Anyway, I think i'd advise either explicitly checking for o32 or o64, or maybe checking for !TARGET_NEWABI (if EABI does have the same arg space save requirement), instead of checking !TARGET_64BIT. chris
Subject: Re: value profiling clobbers gp on mips At 01 Jul 2004 20:58:36 -0700, Chris G. Demetriou wrote: > As for the workaround: gack, should have thought harder about this. if target_64bit, cmpdi2 won't be a function call, eh? well, then, there we go. ignore that comment. 8-)
This this the case the work STARTING_FRAME_OFFSET done too ealy?
Subject: Re: value profiling clobbers gp on mips On Thu, 2004-07-01 at 21:09, pinskia at gcc dot gnu dot org wrote: > This this the case the work STARTING_FRAME_OFFSET done too ealy? Or in the same line of thought, maybe we are doing the profiling instrumentation too late. Moving either one of these across the other would solve the problem. I doubt that we want to move the virtual register instantiation later, but moving the profiling support earlier might be feasible.
I expect this instrumentation will be done at the tree level soon enough. It was my understanding that this is happening on tree-profiling-branch.
It is not happening on the t-p-branch yet, but Zdenek has patches for it and I hope he'll finish them when he gets back from his holiday.
Subject: Bug 16325 CVSROOT: /cvs/gcc Module name: gcc Changes by: wilson@gcc.gnu.org 2004-07-15 00:35:29 Modified files: gcc : ChangeLog gcc/config/mips: mips.h gcc/testsuite : ChangeLog Added files: gcc/testsuite/gcc.dg: profile-generate-1.c Log message: Fix MIPS SPEC95 FP 146.wave5 -fprofile-generate failure. PR target/16325 * config/mips/mips.h (STARTING_FRAME_OFFSET): When flag_profile_value and ! TARGET_64BIT, include REG_PARM_STACK_SPACE. * gcc.dg/profile-generate-1.c: New. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.4536&r2=2.4537 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/mips/mips.h.diff?cvsroot=gcc&r1=1.352&r2=1.353 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/ChangeLog.diff?cvsroot=gcc&r1=1.4001&r2=1.4002 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/gcc.dg/profile-generate-1.c.diff?cvsroot=gcc&r1=NONE&r2=1.1
Subject: Bug 16325 CVSROOT: /cvs/gcc Module name: gcc Branch: gcc-3_4-branch Changes by: wilson@gcc.gnu.org 2004-07-15 00:42:49 Modified files: gcc : ChangeLog gcc/config/mips: mips.h gcc/testsuite : ChangeLog Added files: gcc/testsuite/gcc.dg: profile-generate-1.c Log message: Fix MIPS SPEC95 FP 146.wave5 FDO miscompilation. PR target/16325 * config/mips/mips.h (STARTING_FRAME_OFFSET): When flag_profile_value and ! TARGET_64BIT, include REG_PARM_STACK_SPACE. * gcc.dg/profile-generate-1.c: New. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=2.2326.2.557&r2=2.2326.2.558 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/mips/mips.h.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.310.4.7&r2=1.310.4.8 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.3389.2.228&r2=1.3389.2.229 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/gcc.dg/profile-generate-1.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=NONE&r2=1.1.2.1
Mine.
Fixed.