Bug 50251 - [4.7 Regression] Revision 178353 caused many test failures
Summary: [4.7 Regression] Revision 178353 caused many test failures
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.7.0
: P3 normal
Target Milestone: 4.7.0
Assignee: Tom de Vries
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-08-31 12:44 UTC by H.J. Lu
Modified: 2011-09-14 14:37 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-08-31 00:00:00


Attachments
testcase for -O2 (221 bytes, text/x-csrc)
2011-09-01 12:43 UTC, Tom de Vries
Details
optimized dump (840 bytes, text/plain)
2011-09-01 13:06 UTC, Tom de Vries
Details
dump before ira (4.01 KB, text/plain)
2011-09-01 13:25 UTC, Tom de Vries
Details
ira dump (7.95 KB, text/plain)
2011-09-01 16:29 UTC, Tom de Vries
Details

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2011-08-31 12:44:53 UTC
On Linux/x86, revision 178353:

http://gcc.gnu.org/ml/gcc-cvs/2011-08/msg01372.html

caused:

FAIL: gcc.c-torture/execute/20010209-1.c compilation,  -O2 -flto  (internal compiler error)
FAIL: gcc.c-torture/execute/20010209-1.c compilation,  -O2 -flto -flto-partition=none  (internal compiler error)
FAIL: gfortran.dg/actual_array_constructor_1.f90  -O1  (internal compiler error)
FAIL: gfortran.dg/actual_array_constructor_1.f90  -O1  (test for excess errors)
FAIL: gfortran.dg/actual_array_constructor_1.f90  -O2  (internal compiler error)
FAIL: gfortran.dg/actual_array_constructor_1.f90  -O2  (test for excess errors)
FAIL: gfortran.dg/actual_array_constructor_1.f90  -O3 -fomit-frame-pointer  (internal compiler error)
FAIL: gfortran.dg/actual_array_constructor_1.f90  -O3 -fomit-frame-pointer  (test for excess errors)
FAIL: gfortran.dg/actual_array_constructor_1.f90  -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions  (internal compiler error)
FAIL: gfortran.dg/actual_array_constructor_1.f90  -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions  (test for excess errors)
FAIL: gfortran.dg/actual_array_constructor_1.f90  -O3 -fomit-frame-pointer -funroll-loops  (internal compiler error)
FAIL: gfortran.dg/actual_array_constructor_1.f90  -O3 -fomit-frame-pointer -funroll-loops  (test for excess errors)
FAIL: gfortran.dg/actual_array_constructor_1.f90  -O3 -g  (internal compiler error)
FAIL: gfortran.dg/actual_array_constructor_1.f90  -O3 -g  (test for excess errors)
Comment 1 Tom de Vries 2011-08-31 14:56:24 UTC
reproduced first failure:
...
/home/vries/local/i686/src/gcc-mainline/gcc/testsuite/gcc.c-torture/execute/20010209-1.c: In function 'main':
/home/vries/local/i686/src/gcc-mainline/gcc/testsuite/gcc.c-torture/execute/20010209-1.c:21:1: internal compiler error: in print_reg, at config/i386/i386.c:13309
...

back-trace lto1 at failure:
...
(gdb) bt
#0  0xf7e6b4d4 in exit () from /lib32/libc.so.6
#1  0x08eab8af in diagnostic_action_after_output (context=0x965f240, diagnostic=0xffffd0f4) at /home/vries/local/i686/src/gcc-mainline/gcc/diagnostic.c:243
#2  0x08eac366 in diagnostic_report_diagnostic (context=0x965f240, diagnostic=0xffffd0f4) at /home/vries/local/i686/src/gcc-mainline/gcc/diagnostic.c:546
#3  0x08eac9c8 in internal_error (gmsgid=0x9358ca3 "in %s, at %s:%d") at /home/vries/local/i686/src/gcc-mainline/gcc/diagnostic.c:839
#4  0x08eaca90 in fancy_abort (file=0x91fdb24 "/home/vries/local/i686/src/gcc-mainline/gcc/config/i386/i386.c", line=13309, function=0x9213ccf "print_reg")
    at /home/vries/local/i686/src/gcc-mainline/gcc/diagnostic.c:893
#5  0x08a2d4ba in print_reg (x=0xf7d4d090, code=0, file=0x9692b68) at /home/vries/local/i686/src/gcc-mainline/gcc/config/i386/i386.c:13304
#6  0x08a2f44f in ix86_print_operand_address (file=0x9692b68, addr=0xf7e04cb4) at /home/vries/local/i686/src/gcc-mainline/gcc/config/i386/i386.c:14224
#7  0x083b6a8a in output_address (x=0xf7e04cb4) at /home/vries/local/i686/src/gcc-mainline/gcc/final.c:3540
#8  0x08a2ecd5 in ix86_print_operand (file=0x9692b68, x=0xf7e04cb4, code=0) at /home/vries/local/i686/src/gcc-mainline/gcc/config/i386/i386.c:14038
#9  0x083b6a2c in output_operand (x=0xf7e04cc0, code=0) at /home/vries/local/i686/src/gcc-mainline/gcc/final.c:3524
#10 0x083b6752 in output_asm_insn (templ=0x92d1e13 "mov{l}\t{%1, %0|%0, %1}", operands=0x962a4e0) at /home/vries/local/i686/src/gcc-mainline/gcc/final.c:3422
#11 0x083b56e8 in final_scan_insn (insn=0xf7d47f9c, file=0x9692b68, optimize_p=2, nopeepholes=0, seen=0xffffd5a0) at /home/vries/local/i686/src/gcc-mainline/gcc/final.c:2745
#12 0x083b3e39 in final (first=0xf7db4d20, file=0x9692b68, optimize_p=2) at /home/vries/local/i686/src/gcc-mainline/gcc/final.c:1786
#13 0x083b7b5f in rest_of_handle_final () at /home/vries/local/i686/src/gcc-mainline/gcc/final.c:4221
#14 0x08611dac in execute_one_pass (pass=0x9528da0) at /home/vries/local/i686/src/gcc-mainline/gcc/passes.c:2063
#15 0x08611f8d in execute_pass_list (pass=0x9528da0) at /home/vries/local/i686/src/gcc-mainline/gcc/passes.c:2118
#16 0x08611fb9 in execute_pass_list (pass=0x9529700) at /home/vries/local/i686/src/gcc-mainline/gcc/passes.c:2119
#17 0x08611fb9 in execute_pass_list (pass=0x95296c0) at /home/vries/local/i686/src/gcc-mainline/gcc/passes.c:2119
#18 0x087998b1 in tree_rest_of_compilation (fndecl=0xf7df2800) at /home/vries/local/i686/src/gcc-mainline/gcc/tree-optimize.c:420
#19 0x082b1573 in cgraph_expand_function (node=0xf7d493d8) at /home/vries/local/i686/src/gcc-mainline/gcc/cgraphunit.c:1797
#20 0x082b1727 in cgraph_expand_all_functions () at /home/vries/local/i686/src/gcc-mainline/gcc/cgraphunit.c:1856
#21 0x082b1e30 in cgraph_optimize () at /home/vries/local/i686/src/gcc-mainline/gcc/cgraphunit.c:2126
#22 0x081eec20 in lto_main () at /home/vries/local/i686/src/gcc-mainline/gcc/lto/lto.c:2803
#23 0x086ff03c in compile_file () at /home/vries/local/i686/src/gcc-mainline/gcc/toplev.c:548
#24 0x087010bf in do_compile () at /home/vries/local/i686/src/gcc-mainline/gcc/toplev.c:1886
#25 0x0870123d in toplev_main (argc=18, argv=0x9668ed8) at /home/vries/local/i686/src/gcc-mainline/gcc/toplev.c:1962
#26 0x081f17ab in main (argc=18, argv=0xffffd964) at /home/vries/local/i686/src/gcc-mainline/gcc/main.c:36
...

we're trying to print the following instruction using output template 'mov{l}\t{%1, %0|%0, %1}':
...
(gdb) call debug_rtx (insn)
(insn:TI 76 22 77 3 (set (reg:SI 2 cx [74])
        (mem:SI (plus:SI (plus:SI (mult:SI (reg:SI 0 ax [orig:62 ivtmp.5 ] [62])
                        (const_int 4 [0x4]))
                    (reg/f:SI 20 frame))
                (const_int -36 [0xffffffffffffffdc])) [2 MEM[symbol: D.2280, index: ivtmp.5_9, step: 4, offset: 4294967292B]+0 S4 A32])) /home/vries/local/i686/src/gcc-mainline/gcc/testsuite/gcc.c-torture/execute/20010209-1.c:9 64 {*movsi_internal}
     (nil))
...

The compiler asserts when trying to print '(reg/f:SI 20 frame)' using print_reg at:
...
13299	print_reg (rtx x, int code, FILE *file)
13300	{
13301	  const char *reg;
13302	  bool duplicated = code == 'd' && TARGET_AVX;
13303	
13304	  gcc_assert (x == pc_rtx
13305		      || (REGNO (x) != ARG_POINTER_REGNUM
13306			  && REGNO (x) != FRAME_POINTER_REGNUM
13307			  && REGNO (x) != FLAGS_REG
13308			  && REGNO (x) != FPSR_REG
13309			  && REGNO (x) != FPCR_REG));
...
Comment 2 Tom de Vries 2011-09-01 12:43:40 UTC
Created attachment 25161 [details]
testcase for -O2

To reproduce with O2:
i686-pc-linux-gnu-gcc 20010209-1.c -O2 -c
Comment 3 Tom de Vries 2011-09-01 13:06:45 UTC
Created attachment 25162 [details]
optimized dump

1. The alloca in main is transformed into this declaration:

  <unnamed-unsigned:8> D.2129[24];

and accessed like this:

  iftmp.6_13 = MEM[symbol: D.2129, index: ivtmp.30_31, step: 4, 
                   offset: 4294967292B];

  MEM[symbol: D.2129, index: ivtmp.30_31, step: 4, offset: 0B] = D.2105_15;

  D.2099_18 = MEM[(int *)&D.2129][5]{lb: 0 sz: 4};

  MEM[symbol: D.2129, index: ivtmp.30_31, step: 4, offset: 0B] = 0;


2. __builtin_stack_restore/__builtin_stack_restore in main are not optimized away, because the restored value is potentially used. The restored value is the same as the current value, so the restore is redundant, but that's not checked in optimize_stack_restore.

   saved_stack.3_5 = __builtin_stack_save ();

  __builtin_stack_restore (saved_stack.3_5);
Comment 4 Tom de Vries 2011-09-01 13:25:56 UTC
Created attachment 25163 [details]
dump before ira

main contains references to frame:

(insn 29 28 30 6 (set (reg:SI 63 [ D.2099 ])
        (mem/s:SI (plus:SI (reg/f:SI 20 frame)
                (const_int -12 [0xfffffffffffffff4])) [2 D.2129+20 S4 A32])) 20010209-1.c:16 64 {*movsi_internal}
     (nil))
Comment 5 Tom de Vries 2011-09-01 16:29:16 UTC
Created attachment 25166 [details]
ira dump

and after ira, the reference to framereg is still there:
...
(insn 29 28 30 6 (set (reg:SI 0 ax [orig:63 D.2099 ] [63])
        (mem/s:SI (plus:SI (reg/f:SI 20 frame)
                (const_int -12 [0xfffffffffffffff4])) [2 D.2129+20 S4 A32])) 20010209-1.c:16 64 {*movsi_internal}
     (nil))
...

Normally, framereg gets eliminated in ira using either of the following 2 rules:
...
 { FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM},		\
 { FRAME_POINTER_REGNUM, HARD_FRAME_POINTER_REGNUM}}	\
...

For main, ix86_can_eliminate limits this to only the first rule:
...
static bool
ix86_can_eliminate (const int from, const int to)
{
  if (stack_realign_fp)
    return ((from == ARG_POINTER_REGNUM
	     && to == HARD_FRAME_POINTER_REGNUM)
	    || (from == FRAME_POINTER_REGNUM
		&& to == STACK_POINTER_REGNUM));
  else
    return to == STACK_POINTER_REGNUM ? !frame_pointer_needed : true;
}
...
stack_realign_fp is true for main, so for 'from == FRAME_POINTER_REGNUM', only
'to == STACK_POINTER_REGNUM' is allowed.

reload then itself switches off the first rule here:
...
  /* If we have some registers we think can be eliminated, scan all insns to
     see if there is an insn that sets one of these registers to something
     other than itself plus a constant.  If so, the register cannot be
     eliminated.  Doing this scan here eliminates an extra pass through the
     main reload loop in the most common case where register elimination
     cannot be done.  */
  for (insn = first; insn && num_eliminable; insn = NEXT_INSN (insn))
    if (INSN_P (insn))
      note_stores (PATTERN (insn), mark_not_eliminable, NULL);
...

due to the stack restore:
...
(insn 32 31 33 6 (set (reg/f:SI 7 sp)
        (reg/f:SI 59 [ saved_stack.3D.2111 ])) 64 {*movsi_internal}
     (expr_list:REG_DEAD (reg/f:SI 59 [ saved_stack.3D.2111 ])
        (expr_list:REG_ARGS_SIZE (const_int 0 [0])
            (nil))))
...

reload then notes that there is no way to eliminate framereg and sets framereg in bad_spill_regs_global
...
      if (! can_eliminate)
	spill_hard_reg (from, 1);
...

The reloads for insn 29 are this, and framereg is still used:
...
Reloads for insn # 29
Reload 0: reload_in (SI) = (mem/s:SI (plus:SI (reg/f:SI 20 frame)
                                              (const_int -12))
                                     [2 D.2129+20 S4 A32])
	GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1), optional
	reload_in_reg: (mem/s:SI (plus:SI (reg/f:SI 20 frame)
                                          (const_int -12))
                                 [2 D.2129+20 S4 A32])
...

The documentation for HARD_FRAME_POINTER_REGNUM says the following:
...
When this macro is defined, you must also indicate in your definition of ELIMINABLE_REGS how to eliminate FRAME_POINTER_REGNUM into either HARD_FRAME_POINTER_REGNUM or STACK_POINTER_REGNUM.
...

I don't know whether ix86_can_eliminate in the current state violates this rule, but modifying ix86_can_eliminate to allow the 
'FRAME_POINTER_REGNUM ->  HARD_FRAME_POINTER_REGNUM' elimination allows the
compilation to succeed and the executable to run successfully.
Comment 6 H.J. Lu 2011-09-01 18:03:29 UTC
fold_builtin_alloca_for_var should record stack alignment
change due to

+  /* Declare array.  */
+  elem_type = build_nonstandard_integer_type (BITS_PER_UNIT, 1);
+  n_elem = size * 8 / BITS_PER_UNIT;
+  align = MIN (size * 8, BIGGEST_ALIGNMENT);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  array_type = build_array_type_nelts (elem_type, n_elem);
+  var = create_tmp_var (array_type, NULL);
+  DECL_ALIGN (var) = align;
+
Comment 7 Tom de Vries 2011-09-01 19:26:48 UTC
> fold_builtin_alloca_for_var should record stack alignment change

After expand_var for D.2129 with 128-bit alignment, x_rtl looks like this:
...
(gdb) p x_rtl
$20 = {expr = {x_pending_stack_adjust = 0, x_inhibit_defer_pop = 0, x_stack_pointer_delta = 0, x_saveregs_value = 0x0, x_apply_args_value = 0x0, x_forced_labels = 0x0}, emit = {x_reg_rtx_no = 70, 
    x_first_label_num = 11, x_first_insn = 0x0, x_last_insn = 0x0, sequence_stack = 0xf7d5557c, x_cur_insn_uid = 2, x_cur_debug_insn_uid = 1, x_last_location = 0, regno_pointer_align_length = 159, 
    regno_pointer_align = 0x981a428 ""}, varasm = {pool = 0xf7df9294, deferred_constants = 0}, args = {pops_args = 0, size = 0, pretend_args_size = 0, arg_offset_rtx = 0x0, info = {words = 0, nregs = 0, 
      regno = 0, fastcall = 0, sse_words = 0, sse_nregs = 0, warn_avx = 0, warn_sse = 0, warn_mmx = 0, sse_regno = 0, mmx_words = 0, mmx_nregs = 0, mmx_regno = 0, maybe_vaarg = 0, caller = 0, 
      float_in_sse = 0, call_abi = SYSV_ABI}, internal_arg_pointer = 0x0}, subsections = {hot_section_label = 0x0, cold_section_label = 0x0, hot_section_end_label = 0x0, cold_section_end_label = 0x0}, 
  eh = {ehr_stackadj = 0x0, ehr_handler = 0x0, ehr_label = 0x0, sjlj_fc = 0x0, sjlj_exit_after = 0x0, action_record_data = 0x0, call_site_record = {0x0, 0x0}}, outgoing_args_size = 0, return_rtx = 0x0, 
  hard_reg_initial_vals = 0x0, stack_protect_guard = 0x0, x_nonlocal_goto_handler_labels = 0x0, x_return_label = 0x0, x_naked_return_label = 0x0, x_stack_slot_list = 0x0, frame_space_list = 0x0, 
  x_stack_check_probe_note = 0x0, x_arg_pointer_save_area = 0x0, drap_reg = 0x0, x_frame_offset = 0, x_parm_birth_insn = 0x0, x_used_temp_slots = 0x0, x_avail_temp_slots = 0x0, x_temp_slot_level = 0, 
  stack_alignment_needed = 128, preferred_stack_boundary = 32, parm_stack_boundary = 0, max_used_stack_slot_alignment = 128, stack_alignment_estimated = 128, epilogue_delay_list = 0x0, 
  accesses_prior_frames = false, calls_eh_return = false, saves_all_registers = false, has_nonlocal_goto = false, has_asm_statement = false, all_throwers_are_sibcalls = false, limit_stack = false, 
  profile = false, uses_const_pool = false, uses_pic_offset_table = false, uses_eh_lsda = false, tail_call_emit = false, arg_pointer_save_area_init = false, frame_pointer_needed = false, 
  maybe_hot_insn_p = true, stack_realign_needed = false, stack_realign_tried = false, need_drap = false, stack_realign_processed = false, stack_realign_finalized = false, dbr_scheduled_p = false, 
  nothrow = false, asm_clobbers = {0, 0}}
...

These are the relevant values set by expand_one_var:
...
stack_alignment_estimated = 128
stack_alignment_needed = 128
max_used_stack_slot_alignment = 128
...

Is there anything needed that expand_var doesn't do?
Comment 8 H.J. Lu 2011-09-01 19:32:54 UTC
alloca is special with stack alignment. We may need to take
the new change into account.
Comment 9 Tom de Vries 2011-09-02 09:03:54 UTC
The following testcase reproduces the same failure without alloca, vla, or the 178353 patch.

stack-stave-restore.c:
...
extern void bar (int*);

char *p;

int
main ()
{
  int q;
  p = __builtin_stack_save ();
  bar (&q);
  __builtin_stack_restore (p);
  bar (&q);
  return 0;
}
...

Using &q makes sure we use virtual-stack-vars, which expands into something using FRAME_POINTER_REG.

Main has an incoming stack boundary of 32, and a preferred one of 128, so a realign is needed. Nothing sets crtl->need_drap, so we have stack_realign_fp rather than stack_realign_drap.
This forbids elimination from FRAME_POINTER_REG to HARD_FRAME_POINTER_REG.

The __builtin_stack_restore stays until ira (if we wouldn't by declaring p global), and this forbids elimination from FRAME_POINTER_REG to STACK_POINTER_REGNUM.

FRAME_POINTER_REG cannot be eliminated, and we assert.

This patch sets need_drap when a stack_restore is present at expand, which allows both the 20010209-1.c and the stack-stave-restore.c testcase to succeed:
...
Index: src/gcc-mainline/gcc/explow.c
===================================================================
--- src/gcc-mainline/gcc/explow.c (revision 178379)
+++ src/gcc-mainline/gcc/explow.c (working copy)
@@ -1062,6 +1062,9 @@ emit_stack_restore (enum save_level save
   /* The default is that we use a move insn.  */
   rtx (*fcn) (rtx, rtx) = gen_move_insn;
 
+  if (SUPPORTS_STACK_ALIGNMENT)
+    crtl->need_drap = true;
+
   /* See if this machine has anything special to do for this kind of save.  */
   switch (save_level)
     {
...
Comment 10 Tom de Vries 2011-09-02 09:24:33 UTC
> The __builtin_stack_restore stays until ira (if we wouldn't by declaring p
> global),

The __builtin_stack_restore stays until ira (if we wouldn't declare p
global, it would be eliminated),
Comment 11 Tom de Vries 2011-09-02 09:37:42 UTC
The problems for testcases 20010209-1.c and stack-stave-restore.c can be reproduced on x86_64 using -mpreferred-stack-boundary=12.
Comment 12 Richard Biener 2011-09-02 09:40:33 UTC
(In reply to comment #3)
> Created attachment 25162 [details]
> optimized dump
> 
> 1. The alloca in main is transformed into this declaration:
> 
>   <unnamed-unsigned:8> D.2129[24];
> 
> and accessed like this:
> 
>   iftmp.6_13 = MEM[symbol: D.2129, index: ivtmp.30_31, step: 4, 
>                    offset: 4294967292B];
> 
>   MEM[symbol: D.2129, index: ivtmp.30_31, step: 4, offset: 0B] = D.2105_15;
> 
>   D.2099_18 = MEM[(int *)&D.2129][5]{lb: 0 sz: 4};
> 
>   MEM[symbol: D.2129, index: ivtmp.30_31, step: 4, offset: 0B] = 0;
> 
> 
> 2. __builtin_stack_restore/__builtin_stack_restore in main are not optimized
> away, because the restored value is potentially used. The restored value is the
> same as the current value, so the restore is redundant, but that's not checked
> in optimize_stack_restore.
> 
>    saved_stack.3_5 = __builtin_stack_save ();
> 
>   __builtin_stack_restore (saved_stack.3_5);

It does try it here:

 second_stack_restore:

  /* If there's exactly one use, then zap the call to __builtin_stack_save.
     If there are multiple uses, then the last one should remove the call.
     In any case, whether the call to __builtin_stack_save can be removed
     or not is irrelevant to removing the call to __builtin_stack_restore.  */
  if (has_single_use (gimple_call_arg (call, 0)))

but for some reason it doesn't trigger?
Comment 13 Richard Biener 2011-09-02 09:42:13 UTC
(In reply to comment #6)
> fold_builtin_alloca_for_var should record stack alignment
> change due to
> 
> +  /* Declare array.  */
> +  elem_type = build_nonstandard_integer_type (BITS_PER_UNIT, 1);
> +  n_elem = size * 8 / BITS_PER_UNIT;
> +  align = MIN (size * 8, BIGGEST_ALIGNMENT);
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +  array_type = build_array_type_nelts (elem_type, n_elem);
> +  var = create_tmp_var (array_type, NULL);
> +  DECL_ALIGN (var) = align;
> +

If the stack alignment code cannot handle decls with any DECL_ALIGN
it is broken.  We for sure do not need to do anything here nor
in the vectorizer, which does

      if (vect_can_force_dr_alignment_p (decl, alignment))
        {
          DECL_ALIGN (decl) = TYPE_ALIGN (vectype);
          DECL_USER_ALIGN (decl) = 1;
Comment 14 Tom de Vries 2011-09-02 10:28:52 UTC
> but for some reason it doesn't trigger?

The bb containing the __builtin_stack_restore has 2 successors:
...
<bb 6>:
  D.2099_18 = MEM[(int *)&D.2129][5]{lb: 0 sz: 4};
  __builtin_stack_restore (saved_stack.3_5);
  if (D.2099_18 != 15)
    goto <bb 7>;
  else
    goto <bb 8>;
...

This case doesn't fall into the cases described in the header comment.
...
/* Try to optimize out __builtin_stack_restore.  Optimize it out
   if there is another __builtin_stack_restore in the same basic
   block and no calls or ASM_EXPRs are in between, or if this block's
   only outgoing edge is to EXIT_BLOCK and there are no calls or
   ASM_EXPRs after this __builtin_stack_restore.  */
...

It hits this piece of code in optimize_stack_restore and returns, so it doesn't reach second_stack_restore:
...
  /* Allow one successor of the exit block, or zero successors.  */
  switch (EDGE_COUNT (bb->succs))
    {
    case 0:
      break;
    case 1:
      if (single_succ_edge (bb)->dest != EXIT_BLOCK_PTR)
	return NULL_TREE;
      break;
    default:
      return NULL_TREE;
    }
 second_stack_restore:
...


For the stack-save-restore.c, if I declare p inside the function, cse creates a (set (sp) (sp)), which is eliminated.
But for the 20010209-1.c example, that doesn't happen.
Comment 15 Richard Biener 2011-09-02 11:02:22 UTC
(In reply to comment #14)
> > but for some reason it doesn't trigger?
> 
> The bb containing the __builtin_stack_restore has 2 successors:
> ...
> <bb 6>:
>   D.2099_18 = MEM[(int *)&D.2129][5]{lb: 0 sz: 4};
>   __builtin_stack_restore (saved_stack.3_5);
>   if (D.2099_18 != 15)
>     goto <bb 7>;
>   else
>     goto <bb 8>;
> ...
> 
> This case doesn't fall into the cases described in the header comment.
> ...
> /* Try to optimize out __builtin_stack_restore.  Optimize it out
>    if there is another __builtin_stack_restore in the same basic
>    block and no calls or ASM_EXPRs are in between, or if this block's
>    only outgoing edge is to EXIT_BLOCK and there are no calls or
>    ASM_EXPRs after this __builtin_stack_restore.  */
> ...
> 
> It hits this piece of code in optimize_stack_restore and returns, so it doesn't
> reach second_stack_restore:
> ...
>   /* Allow one successor of the exit block, or zero successors.  */
>   switch (EDGE_COUNT (bb->succs))
>     {
>     case 0:
>       break;
>     case 1:
>       if (single_succ_edge (bb)->dest != EXIT_BLOCK_PTR)
>     return NULL_TREE;
>       break;
>     default:
>       return NULL_TREE;
>     }
>  second_stack_restore:
> ...
> 
> 
> For the stack-save-restore.c, if I declare p inside the function, cse creates a
> (set (sp) (sp)), which is eliminated.
> But for the 20010209-1.c example, that doesn't happen.

Ok, the code really tries to optimize the case of __builtin_stack_restore
being called right before exiting the function only.

That's because it doesn't look for alloca() calls inbetween the
stack-save/restore pair at all (just for ones following a restore).
Comment 16 Tom de Vries 2011-09-02 22:47:42 UTC
Started testing patch from comment 9, augmented with comments:
...
Index: explow.c
===================================================================
--- explow.c (revision 178145)
+++ explow.c (working copy)
@@ -1062,6 +1062,20 @@ emit_stack_restore (enum save_level save
   /* The default is that we use a move insn.  */
   rtx (*fcn) (rtx, rtx) = gen_move_insn;
 
+  /* If stack_realign_drap, the x86 backend emits a prologue that aligns both
+     STACK_POINTER and HARD_FRAME_POINTER.
+     If stack_realign_fp, the x86 backend emits a prologue that aligns only
+     STACK_POINTER. This renders the HARD_FRAME_POINTER unusable for accessing
+     aligned variables, which is reflected in ix86_can_eliminate.
+     We normally still have the realigned STACK_POINTER that we can use.
+     But if there is a stack restore still present at reload, it can trigger 
+     mark_not_eliminable for the STACK_POINTER, leaving no way to eliminate
+     FRAME_POINTER into a hard reg.
+     To prevent this situation, we force need_drap if we emit a stack
+     restore.  */
+  if (SUPPORTS_STACK_ALIGNMENT)
+    crtl->need_drap = true;
+
   /* See if this machine has anything special to do for this kind of save.  */
   switch (save_level)
     {
...
Comment 17 Eric Botcazou 2011-09-03 09:47:08 UTC
> This patch sets need_drap when a stack_restore is present at expand, which
> allows both the 20010209-1.c and the stack-stave-restore.c testcase to succeed:
> ...
> Index: src/gcc-mainline/gcc/explow.c
> ===================================================================
> --- src/gcc-mainline/gcc/explow.c (revision 178379)
> +++ src/gcc-mainline/gcc/explow.c (working copy)
> @@ -1062,6 +1062,9 @@ emit_stack_restore (enum save_level save
>    /* The default is that we use a move insn.  */
>    rtx (*fcn) (rtx, rtx) = gen_move_insn;
> 
> +  if (SUPPORTS_STACK_ALIGNMENT)
> +    crtl->need_drap = true;
> +
>    /* See if this machine has anything special to do for this kind of save.  */
>    switch (save_level)
>      {
> ...

Does this force stack realignment, or only the use of the DRAP if we already do stack realignment?
Comment 18 Tom de Vries 2011-09-03 12:40:12 UTC
> Does this force stack realignment, or only the use of the DRAP if we already 
> do stack realignment?

only the use of the DRAP if we already do stack realignment.
Comment 19 Tom de Vries 2011-09-14 14:32:12 UTC
Author: vries
Date: Wed Sep 14 14:32:07 2011
New Revision: 178853

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=178853
Log:
2011-09-14  Tom de Vries  <tom@codesourcery.com>

	PR middle-end/50251
	* explow.c (emit_stack_restore): Set crtl->need_drap if
	stack_restore is emitted.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/explow.c
Comment 20 Tom de Vries 2011-09-14 14:33:40 UTC
Author: vries
Date: Wed Sep 14 14:33:35 2011
New Revision: 178854

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=178854
Log:
2011-09-14  Tom de Vries  <tom@codesourcery.com>

	PR middle-end/50251
	* gcc.dg/pr50251.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/pr50251.c
Modified:
    trunk/gcc/testsuite/ChangeLog
Comment 21 Tom de Vries 2011-09-14 14:37:46 UTC
patch and test-case checked in.