This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFA:] More compact EH frame info for synchronous EH


Some time ago I mentioned here that one drawback of an RTL
prologue compared to a "text" one, would be that RTL would lead
to redundant information in .eh_frame for synchronous EH (that
is, where only call insns and throw (or __builtin_eh_return) can
cause exceptions; like with ISO C++, unlike Ada and Java).  When
I mentioned that I was looking at non-current sources, I was
told that some changes had been done to make it more compact, so
that wasn't an issue anymore.  I see I was right however; the
CRIS EH frame info grew unnecessarily when moving to an RTL
prologue.  The frame info is indeed right for GDB use (for
stepping individual insns in the prologue, at least by visual
inspection) but for purposes of synchronous EH, it contains
redundant info.  I see this isn't really a target-specific
issue.  Take this C example function and -fexceptions -O2 -dA
for cris-elf:

 int x (int y)
 {
   int a = c (y);
   b ();
   return a + c (a);
 }

This code and EH info was generated with a text prologue and
carefully handcrafted calls to the dwarf2 machinery (unrelated
and unimportant code and .eh_frame info edited out):

_x:
.LFB2:
	Push $srp
	subq 8,$sp
	movem $r1,[$sp]
.LCFIT0:
	move.d _c,$r1
	...
	.dword	.LFE2-.LFB2	;# FDE address range
	.byte	0x4	;# DW_CFA_advance_loc4
	.dword	.LCFIT0-.LFB2
	.byte	0xe	;# DW_CFA_def_cfa_offset
	.uleb128 0xc
	.byte	0x90	;# DW_CFA_offset, column 0x10
	.uleb128 0x4
	.byte	0x80	;# DW_CFA_offset, column 0x0
	.uleb128 0x8
	.byte	0x81	;# DW_CFA_offset, column 0x1
	.uleb128 0xc

While this is generated with an RTL prologue (wherein "push" is
replaced with equivalent parts):

.LFB2:
	;# basic block 0
	subq 4,$sp
.LCFI0:
	move $srp,[$sp]
.LCFI1:
	subq 8,$sp
.LCFI2:
	movem $r1,[$sp]
.LCFI3:
	move.d _c,$r1
	...
	.dword	.LFE2-.LFB2	;# FDE address range
	.byte	0x4	;# DW_CFA_advance_loc4
	.dword	.LCFI0-.LFB2
	.byte	0xe	;# DW_CFA_def_cfa_offset
	.uleb128 0x4
	.byte	0x4	;# DW_CFA_advance_loc4
	.dword	.LCFI2-.LCFI0
	.byte	0xe	;# DW_CFA_def_cfa_offset
	.uleb128 0xc
	.byte	0x4	;# DW_CFA_advance_loc4
	.dword	.LCFI3-.LCFI2
	.byte	0x80	;# DW_CFA_offset, column 0x0
	.uleb128 0x8
	.byte	0x81	;# DW_CFA_offset, column 0x1
	.uleb128 0xc
	.byte	0x90	;# DW_CFA_offset, column 0x10
	.uleb128 0x4

So, we have two redundant CFI insns (corresponding to 3 bytes
with the help of an optimization in GNU as that changes the
DW_CFA_advance_loc4 into a one-byte DW_CFA_advance_loc)
appearing in read-only storage.  Disk storage may be cheap, but
flash memory still isn't, and people use C++ even for embedded
systems.

There *is* code to keep the CFI compact for synchronous EH info,
but only for non-frame-related insns and only for
!ACCUMULATE_OUTGOING_ARGS targets (usually non-optimal codewise
anyway), and the compaction is active at the time the CFI is
generated; when the same info is common to debug and EH frame
info.  It would only be right to omit any CFI insns in the
example above for EH info for synchronous exceptions.

To fix this, it seemed best to omit redundant EH info as it is
being output, and only for specific combinations and of course
only between call insns.  I needed some kind of mark for the
call insns.  In absence of a "marker" DW_CFA_* value, I had to
invent one.  Because it's not to be output, I just used the
obviously invalid (but for case-tables adjacent) -1.

If pruning info as it is being output (in output_cfi) is frowned
upon as being an unrelated activity, the cfi_replacement call
can be moved to a separate CFI-list-traversing function that
(for instance) changes replaceable opcodes into DW_CFI_nop and
output_cfi be changed to skip those at time of output.  I just
thought that would be somewhat unnecessarily complicating.

I fear that some may be overly concerned that that traversion of
the CFI lists is now O(n*n) with this patch, where O(n) before.
For the record I'll say that please remember that these lists
are normally short (I expect in the 100s in extreme cases), much
shorter than the number of insns if a long function, one per
function, and we only traverse it for CFI-insns that may be
redundant and usually only to the next redundant insn or call
insn.  Compare this to the time spent in the now avoided calls
to output the CFI insns.  The recursion is tail-recursion and
recent GCC should be able to transform it to a jump.

This patch can naturally be improved for targets that emit EH
info that uses other CFI opcodes than the ones mentioned in
cfi_replacement.  I tried to keep it as reasonably simple, and
to deliberately punt as not being replaceable, all other CFI
opcodes and combinations than the ones I noted in the libstdc++
test mentioned below.  With this patch, for the code above I now
get (the same assembly code and) this EH info, equivalent to the
handcrafted EH info:

	.dword	.LFE2-.LFB2	;# FDE address range
	.byte	0xe	;# DW_CFA_def_cfa_offset
	.uleb128 0xc
	.byte	0x4	;# DW_CFA_advance_loc4
	.dword	.LCFI3-.LFB2
	.byte	0x80	;# DW_CFA_offset, column 0x0
	.uleb128 0x8
	.byte	0x81	;# DW_CFA_offset, column 0x1
	.uleb128 0xc
	.byte	0x90	;# DW_CFA_offset, column 0x10
	.uleb128 0x4

Tested cross to cris-elf and cris-axis-linux-gnu (with RTL
prologue patch; to be committed), mmix-knuth-mmixware, arm-elf
and sh-elf as well as native i686-pc-linux-gnu.

I inspected the generated EH info for
libstdc++-v3/testsuite/22_locale/numpunct/members/pod/2.cc (a
test which found a EH info bug in a CRIS RTL-prologue candidate
patch) for cris-axis-linux-gnu, arm-elf and native to verify the
expected EH info size decrease.  No change on arm-elf, since
that uses SJLJ exceptions (still!)  Slightly more savings on
native i686-pc-linux-gnu than CRIS; it seems this patch catches
all trivial redundancy as for CRIS.

Some numbers for i686-pc-linux-gnu: libstdc++.so .eh_frame size
shrunk from 57352 to 53500 bytes (93.3%) and the test-program
.eh_frame from 5340 to 5044 bytes (94.5%).

I also stepped through the cfi_replacement function for
cris-axis-linux-gnu for that testcase to verify correct behavior
of all execution paths (including DW_CFA_def_cfa_register which
happens when a frame-pointer is present).
(The cases head=DW_CFA_def_cfa, tail={DW_CFA_def_cfa,
DW_CFA_def_cfa_offset, DW_CFA_def_cfa_register} did not trig,
though.)

Ok to commit?

	* dwarf2out.c (dwarf2out_frame_debug): Add DW_CFA_GNU_BARRIER
	placeholder CFI entries when seeing a call insn and emitting dwarf 2
	for synchronous EH.
	(cfi_oprnd1_desc): Handle DW_CFA_GNU_BARRIER.
	(cfi_replacement): New function.
	(output_cfi): Return early for a DW_CFA_GNU_BARRIER.  Also return
	with no action if emitting synchronous-only EH and cfi_replacement
	indicates the current one is redundant.
	* dwarf2.h (enum dwarf_call_frame_info): New member
	DW_CFA_GNU_BARRIER.

Next: Migrate CFI:s from the FDEs to the CIE.

Index: dwarf2.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/dwarf2.h,v
retrieving revision 1.29
diff -c -p -r1.29 dwarf2.h
*** dwarf2.h	6 Oct 2004 20:27:15 -0000	1.29
--- dwarf2.h	12 Apr 2005 15:13:02 -0000
*************** enum dwarf_line_number_x_ops
*** 669,674 ****
--- 669,677 ----
  /* Call frame information.  */
  enum dwarf_call_frame_info
    {
+     /* Use only internally, as a marker.  */
+     DW_CFA_GNU_BARRIER = -1,
+ 
      DW_CFA_advance_loc = 0x40,
      DW_CFA_offset = 0x80,
      DW_CFA_restore = 0xc0,
Index: dwarf2out.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/dwarf2out.c,v
retrieving revision 1.581
diff -c -p -r1.581 dwarf2out.c
*** dwarf2out.c	2 Apr 2005 17:08:01 -0000	1.581
--- dwarf2out.c	12 Apr 2005 15:13:03 -0000
*************** static void dwarf2out_stack_adjust (rtx,
*** 372,377 ****
--- 372,378 ----
  static void flush_queued_reg_saves (void);
  static bool clobbers_queued_reg_save (rtx);
  static void dwarf2out_frame_debug_expr (rtx, const char *);
+ static bool cfi_replacement (dw_cfi_ref, dw_cfi_ref);
  
  /* Support for complex CFA locations.  */
  static void output_cfa_loc (dw_cfi_ref);
*************** dwarf2out_frame_debug (rtx insn, bool af
*** 1837,1842 ****
--- 1838,1860 ----
    if (!NONJUMP_INSN_P (insn) || clobbers_queued_reg_save (insn))
      flush_queued_reg_saves ();
  
+   /* If we'll want compact EH frame tables, annotate call insns so we
+      can accumulate the info between them.  */
+   if (CALL_P (insn)
+       && !after_p
+       && (flag_unwind_tables || (flag_exceptions && ! USING_SJLJ_EXCEPTIONS))
+       && !flag_asynchronous_unwind_tables)
+     {
+       dw_cfi_ref call_cfi = new_cfi ();
+       call_cfi->dw_cfi_opc = DW_CFA_GNU_BARRIER;
+ 
+       /* Don't emit a new label (e.g. by using ""), as that'd get us a
+ 	 new DW_CFA_advance_loc4 that we'd have to eliminate as a
+ 	 special case between two DW_CFA_GNU_BARRIERs.  */
+       add_fde_cfi (fde_table[fde_table_in_use - 1].dw_fde_current_label,
+ 		   call_cfi);
+     }
+ 
    if (! RTX_FRAME_RELATED_P (insn))
      {
        if (!ACCUMULATE_OUTGOING_ARGS)
*************** dw_cfi_oprnd1_desc (enum dwarf_call_fram
*** 1865,1870 ****
--- 1883,1889 ----
  {
    switch (cfi)
      {
+     case DW_CFA_GNU_BARRIER:
      case DW_CFA_nop:
      case DW_CFA_GNU_window_save:
        return dw_cfi_oprnd_unused;
*************** dw_cfi_oprnd2_desc (enum dwarf_call_fram
*** 1935,1946 ****
--- 1954,2097 ----
  #define DWARF2_FRAME_REG_OUT(REGNO, FOR_EH) (REGNO)
  #endif
  
+ /* See if we can replace the CFI in HEAD with one in TAIL before we hit
+    a DW_CFA_GNU_BARRIER or the end of the list.  Items in the TAIL chain
+    can be modified to complete the replacement.  This depends on EH
+    frame info being output after debug info, else the size of the debug
+    info will increase.  */
+ 
+ static bool
+ cfi_replacement (dw_cfi_ref head, dw_cfi_ref tail)
+ {
+   if (tail == NULL || tail->dw_cfi_opc == DW_CFA_GNU_BARRIER)
+     return false;
+ 
+   switch (head->dw_cfi_opc)
+     {
+     case DW_CFA_advance_loc1:
+     case DW_CFA_advance_loc2:
+     case DW_CFA_advance_loc4:
+       switch (tail->dw_cfi_opc)
+ 	{
+ 	case DW_CFA_advance_loc1:
+ 	case DW_CFA_advance_loc2:
+ 	case DW_CFA_advance_loc4:
+ 	  /* If there's a DW_CFA_advance_loc[1-4] in the tail, then it's
+ 	     a complete replacement with no need to modify it.  */
+ 	  return true;
+ 
+ 	case DW_CFA_offset:
+ 	case DW_CFA_def_cfa:
+ 	case DW_CFA_def_cfa_offset:
+ 	case DW_CFA_def_cfa_register:
+ 	  /* Independent opcodes.  */
+ 	  return cfi_replacement (head, tail->dw_cfi_next);
+ 
+ 	default:
+ 	  /* For anything else in TAIL we don't know; no replacement.  */
+ 	  return false;
+ 	}
+       gcc_unreachable ();
+ 
+     case DW_CFA_def_cfa_offset:
+       switch (tail->dw_cfi_opc)
+ 	{
+ 	case DW_CFA_def_cfa:
+ 	case DW_CFA_def_cfa_offset:
+ 	  /* These are complete replacements.  */
+ 	  return true;
+ 
+ 	case DW_CFA_def_cfa_register:
+ 	  /* Replace with DW_CFA_def_cfa, to include the current offset.  */
+ 	  tail->dw_cfi_opc = DW_CFA_def_cfa;
+ 	  tail->dw_cfi_oprnd2.dw_cfi_offset
+ 	    = head->dw_cfi_oprnd1.dw_cfi_offset;
+ 	  return true;
+ 
+ 	case DW_CFA_offset:
+ 	  /* Passing a *use* of the CFA between two reference-point
+ 	     definitions doesn't matter; the CFA is constant.  */
+ 	case DW_CFA_advance_loc1:
+ 	case DW_CFA_advance_loc2:
+ 	case DW_CFA_advance_loc4:
+ 	  /* Independent opcodes, try the next.  */
+ 	  return cfi_replacement (head, tail->dw_cfi_next);
+ 
+ 	default:
+ 	  /* For anything else in TAIL we don't know; no replacement.  */
+ 	  return false;
+ 	}
+       gcc_unreachable ();
+ 
+     case DW_CFA_def_cfa:
+       switch (tail->dw_cfi_opc)
+ 	{
+ 	case DW_CFA_def_cfa:
+ 	  /* A complete replacement.  */
+ 	  return true;
+ 
+ 	case DW_CFA_def_cfa_offset:
+ 	  /* Replace with DW_CFA_def_cfa, to include the current
+ 	     register.  */
+ 	  tail->dw_cfi_opc = DW_CFA_def_cfa;
+ 	  tail->dw_cfi_oprnd2.dw_cfi_offset
+ 	    = tail->dw_cfi_oprnd1.dw_cfi_offset;
+ 	  tail->dw_cfi_oprnd1.dw_cfi_reg_num
+ 	    = head->dw_cfi_oprnd1.dw_cfi_reg_num;
+ 	  return true;
+ 
+ 	case DW_CFA_def_cfa_register:
+ 	  /* Replace with DW_CFA_def_cfa, to include the current offset.  */
+ 	  tail->dw_cfi_opc = DW_CFA_def_cfa;
+ 	  tail->dw_cfi_oprnd2.dw_cfi_offset
+ 	    = head->dw_cfi_oprnd2.dw_cfi_offset;
+ 	  return true;
+ 
+ 	case DW_CFA_offset:
+ 	  /* Passing a *use* of the CFA between two reference-point
+ 	     definitions doesn't matter; the CFA is constant.  */
+ 	case DW_CFA_advance_loc1:
+ 	case DW_CFA_advance_loc2:
+ 	case DW_CFA_advance_loc4:
+ 	  /* Independent opcodes, try the next.  */
+ 	  return cfi_replacement (head, tail->dw_cfi_next);
+ 
+ 	default:
+ 	  /* For anything else in TAIL we don't know; no replacement.  */
+ 	  return false;
+ 	}
+       gcc_unreachable ();
+ 
+       /* Usually, we'll have changed a DW_CFA_def_cfa_register into a
+ 	 DW_CFA_def_cfa when we reach it, so don't bother trying to
+ 	 eliminate it separately.  */
+ 
+     default:
+       /* For anything else in HEAD, we don't know; no replacement.  */
+       return false;
+     }
+ 
+   return false;
+ }
+ 
  /* Output a Call Frame Information opcode and its operand(s).  */
  
  static void
  output_cfi (dw_cfi_ref cfi, dw_fde_ref fde, int for_eh)
  {
    unsigned long r;
+ 
+   /* Not a real DW_CFA opcode, just a temporary marker.  */
+   if (cfi->dw_cfi_opc == DW_CFA_GNU_BARRIER)
+     return;
+ 
+   /* If we can replace this CFI info with some subsequent info, we're
+      done with this one.  */
+   if (for_eh
+       && !flag_asynchronous_unwind_tables
+       && cfi_replacement (cfi, cfi->dw_cfi_next))
+     return;
+ 
    if (cfi->dw_cfi_opc == DW_CFA_advance_loc)
      dw2_asm_output_data (1, (cfi->dw_cfi_opc
  			     | (cfi->dw_cfi_oprnd1.dw_cfi_offset & 0x3f)),

brgds, H-P


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]