WPA Implementation

This document outlines two approaches for implementing WPA and discusses their pros and cons. For a full description of WPA, see the WHOPR design document.

Cherry-Picking

Under this proposal, the WPA phase leaves its input files unmodified. Its output is one optimization plan per input file. LTRANS reads each plan and its associated object file. Then, following the plan's instructions, it cherry-picks specific inlinable functions from other object files. This approach is roughly equivalent to the 1-to-1 mapping approach described in the WHOPR design document.

Implementation Plan:

  1. Disable deserialization of function bodies during WPA.
  2. Disable non-IPA_PASS optimizations during WPA.
  3. Add serialization/deserialization of inlining decisions.
  4. Modify LTRANS to cherry-pick function bodies from non-primary files. Until we are able to disentangle type/object dependencies, this will likely require reading in all DECL's from those files. Flag non-primary functions and DECL's to prevent duplicate assembly output.
  5. Add LTRANS driver (so a single gcc invocation runs WPA followed by LTRANS).

Pros:

  1. No direct-to-ELF serialization! That's one less feature to implement.
  2. No need to index/repackage DECL's. We just load everything from the cherry-picked files.
  3. Probably easier to implement than the repackaging scheme.

Cons:

  1. We'll probably need to implement repackaging later. Several parallel build tools, like distcc, are stateless on the remote side and don't have access to locally-mounted network filesystems. The cherry-picking approach will require transmission of multiple object files per LTRANS process invocation. For example, if a.o uses inlined functions from b.o, c.o, and d.o, all four files must be transmitted to re-compile a.o.
  2. If we pursue repackaging later, LTRANS cherry-picking is throw-away code.

Repackaging

Under this proposal, WPA repackages its input files. Each output file consists of the contents of a primary input file plus additional DECL's and functions required for inlining. ELF data is output directly so that functions don't need to be deserialized. LTRANS reads each output file without reference to other files. Initially, only inlining will be supported. Because inlining decisions can also be made at the LTRANS phase, IPA serialization may be deferred to phase 2. This is roughly equivalent to the many-to-1/many-to-many/1-to-many mappings approach described in the WHOPR design document.

Implementation Plan:

  1. Disable deserialization of function bodies during WPA.
  2. Disable non-IPA_PASS optimizations during WPA.
  3. Add support for outputting ELF directly.
  4. Add support for identifying and serializing subsets of DECL's based on the collection of functions being output. This probably means adding a DECL index to each serialized function body.
  5. Add LTRANS driver (so a single gcc invocation runs WPA followed by LTRANS).

Pros:

  1. Closer to the approach we'll probably use in production. Will more easily integrate into parallel build tools while limiting excess network transmission.
  2. Initially, we don't need to implement IPA serialization. Repackaging implicitly allows LTRANS to perform inlining decisions that would not otherwise be available.

Cons:

  1. Requires implementing direct-to-ELF serialization.
  2. Requires (at least partial) re-serialization of DECL's and per-function DECL indexes.
  3. Probably harder to implement than cherry-picking.

None: whopr/wpa (last edited 2008-06-03 16:28:43 by DiegoNovillo)