Attachment 'removing-global-state-from-gcc.txt'

Download

   1 Removal of Global State from GCC
   2 ================================
   3 
   4 
   5 Removal of Global State from GCC
   6 --------------------------------
   7 
   8 A proposal for major internal changes to GCC.
   9 
  10 This is just a summary.
  11 
  12 See http://gcc.gnu.org/ml/gcc/2013-06/msg00215.html 
  13 for the extended version.
  14 
  15 Why?
  16 ----
  17 
  18 Support embedding GCC as a shared library
  19 
  20 * thread-safe: the state of each GCC instance within the process is
  21   completely independent of each other GCC instance.
  22 
  23 * Just-In-Time compilation (JIT)
  24 ** language runtimes (Python, Ruby, Java, etc)
  25 ** spam filters
  26 ** OpenGL shaders
  27 ** etc.
  28 
  29 * Static code analysis
  30 
  31 * Documentation generators
  32 
  33 * etc
  34 
  35 
  36 Non-plans
  37 ---------
  38 
  39 * Outwardly-visible behavior changes
  40 
  41 * Changing the license
  42 
  43 * Changes to requirements of classic "monolithic binaries" use-case
  44 ** e.g. needing LTO
  45 ** e.g. needing TLS
  46 
  47 * Changes to (measurable) performance of said use-case
  48 
  49 
  50 What else would we need to support JIT-compilation?
  51 ---------------------------------------------------
  52 
  53 The following are out-of-scope of my state-removal plan:
  54 
  55 * Providing an API with an ABI that can have useful stability guarantee
  56 * Generating actual machine code rather than just assembler
  57   (e.g. embedding of binutils)
  58 * Picking an appropriate subset of passes for JIT
  59 * Providing an example for people to follow.
  60 
  61 
  62 Scale of Problem
  63 ----------------
  64 
  65 * 3500 global variables
  66 * 100000 sites in the code directly using them
  67 
  68 
  69 High-level Summary
  70 ------------------
  71 
  72 * Multiple "parallel universes" of state within one GCC process
  73 
  74 * Move all global variables and functions into classes
  75 ** these classes will be "singletons" in the normal build
  76 ** they will have multiple instances in a shared library build
  77 
  78 * Minimal disturbance to existing code: "just add classes" (minimizing
  79   merger risks and ability to grok the project history)
  80 
  81 * Various tricks to:
  82 ** maintain the performance of the standard "monolithic binaries" use case
  83 ** minimize the patching and backporting pain relative to older GCC source trees
  84 
  85 
  86 "Universe" vs "context"
  87 -----------------------
  88 
  89    class universe
  90    {
  91    public:
  92        /* Instance of the garbage collector.  */
  93        gc_heap *heap_;
  94        ...
  95        /* Instance of the callgraph.  */
  96        callgraph *cgraph_;
  97        ...
  98        /* Pass management.  */
  99        pipeline *passes_;
 100        ...
 101        /* Important objects.  */
 102        struct gcc_options global_options_;
 103        frontend *frontend_;
 104        backend *backend_;
 105        FILE * dump_file_;
 106        int dump_flags_;
 107        // etc
 108        ...
 109        location_t input_location_;
 110        ...
 111        /* State shared by many passes. */
 112        struct df_d *df_;
 113        redirect_edge_var_state *edge_vars_;
 114        ...
 115        /* Passes that have special state-handling needs.  */
 116        mudflap_state *mudflap_;
 117    }; // class universe
 118 
 119 Passes become C++ classes
 120 -------------------------
 121 
 122   static const pass_data pass_data_vrp =
 123   {
 124     GIMPLE_PASS, /* type */
 125     "vrp", /* name */
 126     OPTGROUP_NONE, /* optinfo_flags */
 127     true, /* has_gate */
 128     true, /* has_execute */
 129     TV_TREE_VRP, /* tv_id */
 130     PROP_ssa, /* properties_required */
 131     0, /* properties_provided */
 132     0, /* properties_destroyed */
 133     0, /* todo_flags_start */
 134     TODO_cleanup_cfg | TODO_update_ssa | TODO_verify_ssa | TODO_verify_flow,               /* todo_flags_finish */
 135   };
 136 
 137 Passes (2)
 138 ----------
 139 
 140   class pass_vrp : public gimple_opt_pass
 141   {
 142   public:
 143     pass_vrp(universe &uni)
 144       : gimple_opt_pass(pass_data_vrp, uni)
 145     {}
 146     /* opt_pass methods: */
 147     opt_pass * clone () { return new pass_vrp (uni_); }
 148     bool gate () { return gate_vrp (); }
 149     unsigned int execute () { return execute_vrp (); }
 150   }; // class pass_vrp
 151   
 152   gimple_opt_pass *
 153   make_pass_vrp (universe &uni)
 154   {
 155     return new pass_vrp (uni);
 156   }
 157 
 158 
 159 Pass state
 160 ----------
 161 Various types of per-pass state, which can be moved:
 162 
 163 * onto the stack
 164 * inside the pass instance
 165 * in a private object shared by all instances of a pass
 166 * in a semi-private object "owned" by the universe
 167 
 168 
 169 Which universe am I in?
 170 -----------------------
 171 * Passes become C++ classes, with a ref back to their universe (usable
 172   from execute hook)
 173 
 174 * a "universe *" is also available in thread-local store, for use
 175   in macros:
 176 
 177   #if SHARED_BUILD
 178      extern __thread universe *uni_ptr;
 179   #else
 180      extern universe g;
 181   #endif
 182 
 183   /* Macro for getting a (universe &) */
 184   #if SHARED_BUILD
 185     /* Read a thread-local pointer: */
 186     #define GET_UNIVERSE()  (*uni_ptr)
 187   #else
 188     /* Access the global singleton: */
 189     #define GET_UNIVERSE()  (g)
 190   #endif
 191 
 192 
 193 Minimizing merge pain vs "doing it properly"
 194 --------------------------------------------
 195 
 196 Consider:
 197 
 198   #define timevar_push(TV)  GET_UNIVERSE().timevars_->push (TV)
 199   #define timevar_pop(TV)   GET_UNIVERSE().timevars_->pop (TV)
 200   #define timevar_start(TV) GET_UNIVERSE().timevars_->start (TV)
 201   #define timevar_stop(TV)  GET_UNIVERSE().timevars_->stop (TV)
 202 
 203 vs a patch that touches all 200+ sites that use the timevar API:
 204 
 205    void
 206    jump_labels::
 207    rebuild_jump_labels_1 (rtx f, bool count_forced)
 208    {
 209      rtx insn;
 210   -  timevar_push (TV_REBUILD_JUMP);
 211   +  uni_.timevar_push (TV_REBUILD_JUMP);
 212      init_label_info (f);
 213 
 214 The universe sits below GTY/GGC
 215 -------------------------------
 216 
 217 * Each universe gets its own GC heap
 218 ** Needs special-case handling as its own root (not a pointer).
 219 ** Gradually becomes the only root, as global GTY roots are removed.
 220 
 221 Status:
 222 
 223 * I have this working for GC
 224 * Not yet working with PCH (but I think this is doable)
 225 
 226 * Assumption: the universe instance is the single thing that:
 227 ** can own refs on GC objects AND
 228 ** isn't itself in the GC heap
 229 
 230 
 231 Performance
 232 -----------
 233 
 234 * I won't be adding fields to any major types, so memory usage shouldn't
 235   noticably change.
 236 
 237 * We know there'll be a hit of a few % for adding -fPIC/-fpic (so this will
 238   be a configure-time opt-in).
 239 
 240 * We can't yet know what the impact of passing around context will
 241   be (register pressure etc).
 242 
 243 * How expensive is TLS on various archs?
 244 
 245 
 246 What should my benchmark suite look like?
 247 -----------------------------------------
 248 
 249 Benchmark 1: compile time of Linux kernel
 250 
 251 Benchmark 2: building Firefox with LTO
 252 
 253 I have a systemtap script to watch all process invocation, gathering various
 254 timings, so we can track per-TU timings "from outside".
 255 
 256 
 257 Ways of avoiding performance hit
 258 --------------------------------
 259 
 260 * Configure-time opt-in to shared library
 261 
 262 * Ways of eliminating context pointers
 263 
 264 
 265 Eliminating context ptrs (1)
 266 ----------------------------
 267 
 268   #if GLOBAL_STATE
 269   /* When using global state, all methods and fields of state classes
 270      become "static", so that there is effectively a single global
 271      instance of the state, and there is no implicit "this->" being passed
 272      around.  */
 273   # define MAYBE_STATIC static
 274   #else
 275   /* When using on-stack state, all methods and fields of state classes
 276      lose the "static", so that there can be multiple instances of the
 277      state with an implicit "this->" everywhere the state is used.  */
 278   # define MAYBE_STATIC
 279   #endif
 280 
 281 Example of MAYBE_STATIC
 282 -----------------------
 283 
 284 cgraph.h
 285 
 286    class GTY((user)) callgraph
 287    {
 288    public:
 289       callgraph(universe &uni);
 290       MAYBE_STATIC  void dump (FILE *) const;
 291       MAYBE_STATIC  void dump_cgraph_node (FILE *, struct cgraph_node *) const;
 292       MAYBE_STATIC  void remove_edge (struct cgraph_edge *);
 293       MAYBE_STATIC  void remove_node (struct cgraph_node *);
 294       MAYBE_STATIC  struct cgraph_edge *
 295                     create_edge (struct cgraph_node *,
 296                                  struct cgraph_node *,
 297                                  gimple, gcov_type, int);
 298       /* etc */
 299 
 300 
 301 Eliminating context ptrs (2)
 302 ----------------------------
 303 
 304    #if USING_IMPLICIT_STATIC
 305    #define SINGLETON_IN_STATIC_BUILD __attribute__((force_static))
 306    #else
 307    #define SINGLETON_IN_STATIC_BUILD
 308    #endif
 309 
 310    class GTY((user)) SINGLETON_IN_STATIC_BUILD callgraph
 311    { 
 312    public:
 313       callgraph(universe &uni);
 314       void dump (FILE *) const;
 315       void dump_cgraph_node (FILE *, struct cgraph_node *) const;
 316       void remove_edge (struct cgraph_edge *);
 317       void remove_node (struct cgraph_node *);
 318       struct cgraph_edge *
 319       create_edge (struct cgraph_node *,
 320                    struct cgraph_node *,
 321                    gimple, gcov_type, int);
 322       /* etc */
 323 
 324 
 325 Eliminating context ptrs (3)
 326 ----------------------------
 327 
 328    #if USING_SINGLETON_ATTRIBUTE
 329    #define SINGLETON_IN_STATIC_BUILD(INSTANCE) \
 330       __attribute__((singleton(INSTANCE))
 331    #else
 332    #define SINGLETON_IN_STATIC_BUILD(INSTANCE)
 333    #endif
 334 
 335    #if USING_SINGLETON_ATTRIBUTE
 336    class callgraph the_cgraph;
 337    #endif
 338 
 339    class GTY((user)) SINGLETON_IN_STATIC_BUILD(the_cgraph) callgraph
 340    { 
 341    public:
 342       callgraph(universe &uni);
 343       void dump (FILE *) const;
 344       void dump_cgraph_node (FILE *, struct cgraph_node *) const;
 345       void remove_edge (struct cgraph_edge *);
 346       void remove_node (struct cgraph_node *);
 347       struct cgraph_edge *
 348       create_edge (struct cgraph_node *,
 349                    struct cgraph_node *,
 350                    gimple, gcov_type, int);
 351       /* etc */
 352 
 353 
 354 Branch management
 355 -----------------
 356 Given perf concerns, my thinking is:
 357 
 358 * do it on a (git) branch, merging from trunk regularly
 359 * measure performance relative to 4.8 and to trunk regularly
 360 * tactical patches to trunk to minimize merger pain
 361 * when would the merge into trunk need to happen by for 4.9/4.10?
 362 * autogenerate burndown charts measuring # of globals and # of usage sites
 363 
 364 
 365 What I'm hoping for from Cauldron
 366 ---------------------------------
 367 
 368 * Consensus that this is desirable
 369 * Consensus that my work could be mergable
 370 * Branch management plans
 371 * Performance Guidelines
 372 
 373 
 374 Discussion
 375 ----------
 376 What nasty problems have I missed?

Attached Files

To refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.
  • [get | view] (2013-07-26 13:04:23, 110.0 KB) [[attachment:ARM-BOF-2013.pdf]]
  • [get | view] (2013-07-22 19:48:44, 1136.2 KB) [[attachment:AutoFDO.pdf]]
  • [get | view] (2013-07-12 21:36:48, 106.9 KB) [[attachment:Cauldron-2013-Abstracts.pdf]]
  • [get | view] (2013-07-13 21:33:38, 130.6 KB) [[attachment:Cauldron-2013-Schedule.pdf]]
  • [get | view] (2013-07-22 19:47:45, 934.2 KB) [[attachment:Debugging_data_races_with_gdb.pdf]]
  • [get | view] (2013-07-22 19:47:24, 609.0 KB) [[attachment:Developing_Interblock_Combine_Pass_in_GCC.pdf]]
  • [get | view] (2013-07-22 19:50:10, 157.0 KB) [[attachment:GCC_rearchitecture_bof.pdf]]
  • [get | view] (2013-07-22 19:46:50, 566.9 KB) [[attachment:Intrepid-GNU-UPC.pdf]]
  • [get | view] (2013-07-22 19:35:39, 213.4 KB) [[attachment:KGTP_a_GDB_knife_on_linux_kernel.pdf]]
  • [get | view] (2013-07-22 19:47:02, 934.9 KB) [[attachment:Lightweight_bounds_checking.pdf]]
  • [get | view] (2013-07-22 19:35:57, 193.1 KB) [[attachment:Linktime_optimization_bof.pdf]]
  • [get | view] (2013-07-22 19:49:35, 84.6 KB) [[attachment:RABOF.pdf]]
  • [get | view] (2013-07-22 19:36:09, 2102.2 KB) [[attachment:Status_of_interprocedural_optimizers.pdf]]
  • [get | view] (2013-07-22 19:47:35, 1099.5 KB) [[attachment:Using_TACT_for_GCC_Development.pdf]]
  • [get | view] (2013-07-22 19:48:27, 375.2 KB) [[attachment:abi_verification.pdf]]
  • [get | view] (2013-07-22 19:48:13, 461.2 KB) [[attachment:asan.pdf]]
  • [get | view] (2013-07-22 19:49:55, 3769.3 KB) [[attachment:impact_of_compiler_options_on_energy_consumption.pdf]]
  • [get | view] (2013-07-25 10:50:03, 2222.1 KB) [[attachment:machine_guided_energy_energy_efficient_compilation.pdf]]
  • [get | view] (2013-07-22 19:48:02, 2713.5 KB) [[attachment:metalibm.pdf]]
  • [get | view] (2013-07-22 19:49:24, 1117.7 KB) [[attachment:port-gdb-tic6x-qi.pdf]]
  • [get | view] (2013-07-22 21:01:04, 9.6 KB) [[attachment:removing-global-state-from-gcc.txt]]
 All files | Selected Files: delete move to page copy to page

You are not allowed to attach a file to this page.