This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

LTO plugin and comdat symbols


Hi,
this patch fixes three problems:
  1) Mozilla dynsym section is about 3 times bigger when mozilla is built
     with LTO than when it is built without.  This cause noticeable
     binary size growth and startup time problems.
     Looking into the objdump -T, the extra symbols are all of the form:

     0000000000000000  w   D  *UND*  0000000000000000  Base        _ZNSaIPN7mozilla6layers15ShadowableLayerEED1Ev

     Normal Mozilla binary has no dynsym entry matching UND.*Base pattern.
     Those are stale entries not really used anywhere in the binary later.

  2) When bisecting a problem, I noticed that linking part of mozilla
     without LTO with other part with LTO leads to undefined symbols.
     Again comdat inlines

  3) Mozilla -O3 performance with LTO is worse than one without LTO this is
     due to bad inlining decisions.

The problem can be demonstreated at:

#include <stdio.h>
extern void big (void);
inline int
test ()
{
  printf ("big\n");
  printf ("big\n");
}

int
main (void)
{
  while (true)
    {
      test ();
      test ();
    }
}

Now we have COMDAT function TEST that is not inlined during early optimization.
At streaming time, we thus include it in symtab as comdat function (correctly).
This is used by linker and plugin pass us resolution file as:

 2                                                                                                                                                                                  
85 faa3fa3d PREVAILING_DEF _Z4testv                                                                                                                                                 
91 faa3fa3d PREVAILING_DEF main

it says that both main and the comdat function are prevalining definition. According
to linker plugin specification:

PREVAILING_DEF (this is the prevailing definition of the symbol, with references from regular objects) 

....

Any symbol marked PREVAILING_DEF must be defined in one object file added to
the link after WPA is done, or an undefined symbol error will result. Any
symbol marked PREVAILING_DEF_IRONLY may be left undefined (provided all
references to the symbol have been removed), and the linker will not issue an
error. 

So GCC is now required to define test even if it inline it to main() leading to
extra unnecesary function.

GCC does not exactly behave this way:
  1) for hidden symbols it does honnor it with -flto.  It believes that external
     symbol is binding to it and thus it is not optimizing section away.
  2) for non-hidden symbols it looks if address was taken. As an optimization
     at LTO we bring comdats without address taken as static.  This is becuase
     we know we will end up at worst case with two copies of the same COMDAT.
     This is different from single unit compilation where possibly many copies
     would result from same transfrom.
  3) In whopr mode, GCC might just forget about the function as it is not
     partitioning COMDATs.

As a result we either get unnecesary function in binary (with -flto) or we get
that aforementioned stale dynamic symbol table entry by violating plugin
specification.

Now this patch attempts to improve situation by first making GCC to correctly
obey PREVAILING_DEF and also by actually not inserting COMDAT symbols into
the symbol table when their address is not taken.

It may seem odd to do so, but just obeying PREVAILING_DEF further increase
sizes of binaries of C++ programs (28% in Mozilla) so it is not really acceptable.
I believe not inserting the symbols is safe as we have two options

 1) address of the symbol is never taken in any of LTO modules.

    Than none of LTO modules exports the symbol and GCC will bring it static
    because of the rule I described earlier. Thus linker will never care

 2) address of the symbol is taken in one of LTO modules.

    Then linker will see its definition and will either mark it as PREVAILING_DEF
    or PREEMTED_DEF (it is defined in earlier non-lTO object file).

    Now GCC will avoid removing the symbol believing it is used from object files.
    This lose when we manage out all uses of the symbol and also for PREEMTED_DEF
    we don't really need to output the definition, but doing so is harmful.

As an followup I would like to make GCC to handle PREEMTED_DEF correctly too.

Now it would be nice if we solved the case when address of symbol is taken,
the symbol is not used by actual object files but it is externally visible
and it is optimized out.

The patch also fixes handling symbols with internal visibility that is equivalent
to hidden here.

I think in this case it would make sense to change gold's behaviour by marking
externally visible symbol that are not explicitely used by other object files
at PREVAILING_IRONLY instead of PREVAILING_DEF. I always assumed gold behaving
thisw way and it would give GCC freedom to take the symbol again.
GCC by itself has all the info it needs to know if the symbol is exported from DSO
so it will not remove non-COMDATs. Would that be possible?

Bootstrapped/regtested x86_64-linux, OK?


Index: lto-streamer-out.c
===================================================================
--- lto-streamer-out.c	(revision 164995)
+++ lto-streamer-out.c	(working copy)
@@ -2477,8 +2487,11 @@
   for (i = 0; i < lto_cgraph_encoder_size (encoder); i++)
     {
       node = lto_cgraph_encoder_deref (encoder, i);
-      if (node->alias)
+      if (DECL_COMDAT (node->decl)
+	  && cgraph_can_remove_if_no_direct_calls_p (node))
 	continue;
+      if (node->alias || node->global.inlined_to)
+	continue;
       write_symbol (cache, &stream, node->decl, seen, false);
       for (alias = node->same_body; alias; alias = alias->next)
         write_symbol (cache, &stream, alias->decl, seen, true);
Index: ipa.c
===================================================================
--- ipa.c	(revision 164995)
+++ ipa.c	(working copy)
@@ -238,15 +238,20 @@
 #endif
   varpool_reset_queue ();
   for (node = cgraph_nodes; node; node = node->next)
-    if ((!cgraph_can_remove_if_no_direct_calls_and_refs_p (node)
-	 /* Keep around virtual functions for possible devirtualization.  */
-	 || (!before_inlining_p
-	     && !node->global.inlined_to
-	     && DECL_VIRTUAL_P (node->decl)
-	     && (DECL_COMDAT (node->decl) || DECL_EXTERNAL (node->decl))))
-	&& ((!DECL_EXTERNAL (node->decl))
-            || before_inlining_p))
+    if (!node->analyzed)
       {
+        gcc_assert (!node->aux);
+	node->reachable = false;
+      }
+    else if ((!cgraph_can_remove_if_no_direct_calls_and_refs_p (node)
+	      /* Keep around virtual functions for possible devirtualization.  */
+	      || (!before_inlining_p
+		  && !node->global.inlined_to
+		  && DECL_VIRTUAL_P (node->decl)
+		  && (DECL_COMDAT (node->decl) || DECL_EXTERNAL (node->decl))))
+	     && ((!DECL_EXTERNAL (node->decl))
+		 || before_inlining_p))
+      {
         gcc_assert (!node->global.inlined_to);
 	enqueue_cgraph_node (node, &first);
 	node->reachable = true;
@@ -592,8 +597,13 @@
   if (aliased)
     return true;
 
+  /* If linker counts on us, we must preserve the function.  */
+  if (cgraph_used_from_object_file_p (node))
+    return true;
   /* When doing link time optimizations, hidden symbols become local.  */
-  if (in_lto_p && DECL_VISIBILITY (node->decl) == VISIBILITY_HIDDEN
+  if (in_lto_p
+      && (DECL_VISIBILITY (node->decl) == VISIBILITY_HIDDEN
+	  || DECL_VISIBILITY (node->decl) == VISIBILITY_INTERNAL)
       /* Be sure that node is defined in IR file, not in other object
 	 file.  In that case we don't set used_from_other_object_file.  */
       && node->analyzed)
@@ -621,8 +631,6 @@
 	      return true;
 	}
     }
-  if (cgraph_used_from_object_file_p (node))
-    return true;
   if (DECL_PRESERVE_P (node->decl))
     return true;
   if (MAIN_NAME_P (DECL_NAME (node->decl)))
@@ -794,7 +802,8 @@
 	       /* When doing linktime optimizations, all hidden symbols will
 		  become local.  */
 	       && (!in_lto_p
-		   || DECL_VISIBILITY (vnode->decl) != VISIBILITY_HIDDEN
+		   || (DECL_VISIBILITY (vnode->decl) != VISIBILITY_HIDDEN
+		       && DECL_VISIBILITY (vnode->decl) != VISIBILITY_INTERNAL)
 		   /* We can get prevailing decision in other object file.
 		      In this case we do not sed used_from_object_file.  */
 		   || !vnode->finalized))
Index: lto/lto.c
===================================================================
--- lto/lto.c	(revision 164995)
+++ lto/lto.c	(working copy)
@@ -839,7 +839,8 @@
     return false;
   /* Extern inlines and comdat are always only in partitions they are needed.  */
   if (DECL_EXTERNAL (node->decl)
-      || DECL_COMDAT (node->decl))
+      || (DECL_COMDAT (node->decl)
+	  && !cgraph_used_from_object_file_p (node)))
     return false;
   return true;
 }
@@ -854,7 +855,8 @@
     return false;
   /* Constant pool and comdat are always only in partitions they are needed.  */
   if (DECL_IN_CONSTANT_POOL (vnode->decl)
-      || DECL_COMDAT (vnode->decl))
+      || (DECL_COMDAT (vnode->decl)
+	  && !varpool_used_from_object_file_p (vnode)))
     return false;
   return true;
 }


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]