This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Plan for removing global state from GCC's internals


On Thu, 2013-06-27 at 20:23 +0000, Joseph S. Myers wrote:
> On Thu, 27 Jun 2013, David Malcolm wrote:
> 
> > I want to focus on "removal of global state", and I want that to be
> > separate from "cleanups of internal APIs".

There are several interpretations of the word "global" in this
conversation, and I think I was unclear what I meant; sorry about that.

The word "global" can refer both to visibility, and to lifetime.

I'm interested in *lifetime*: when do variables get written to?  Where
does the value of a variable get used?

For example, consider the three variable declarations in tracer.c:

  static int probability_cutoff;
  static int branch_ratio_cutoff;
  sbitmap bb_seen;

It turns out that "bb_seen" is only used in that file, so it can be made
"static" there.  With that, these variables have file-local visibility,
but currently have "global lifetime": they live in the .bss section of
the built code, and further study of the code would be needed before a
new reader (be they human or a compiler) can say when these variables
change state, and where their state is used, beyond just saying
"sometime during the lifetime of the process".

As it happens, these variables follow a very common read/write pattern:
they are initialized near the start of the "execute" hook of a pass, and
cleaned up at the end of the hook (in this case, within the
"tail_duplicate" function called within the "tracer" execute hook of
pass_tracer).

Also, none of the variables are GTY-marked.

This is one of the common state-management patterns in GCC's passes:
http://dmalcolm.fedorapeople.org/gcc/global-state/pass-patterns.html#per-invocation-state-with-no-gty-markings
The plan there gives a way of moving this state to the stack for the
shared-library case (to allow thread-safe usage), whilst keeping it in
the .bss section for the traditional build case (for maximum
performance, or, at least, consistent performance), with relatively
little patching.

By contrast, consider this declaration from tree-ssa.c:

  static struct pointer_map_t *edge_var_maps;

Although this is marked as "static", it's used in the implementation of
an internal API "redirect_edge_var_map_*" used in 5 other source files.
So although this has file-local visibility, the "lifetime" of the
underlying state is considerably more complicated.

> Whereas I'm thinking of global state as being a symptom of a problem - 
> messy interfaces that have accreted over time - rather than the problem in 
> itself.  And moving things into "universe" allows a proof-of-concept of a 
> shared library build (much like Joern's multi-target patches with 
> namespaces three years ago provided a proof-of-concept of a multi-target 
> build) without really addressing the real problem (basically, I think of 
> state in "universe" as effectively being global state, and moving state 
> into something passed down to the places needing it - only the relevant 
> bits of state, not a whole universe pointer if there's a smaller logical 
> unit - rather than just accessed through TLS, as being the point where 
> global state is *really* eliminated).

>From a "lifetime" meaning of "global", the universe ceases to be "global
state" in a shared-library build: there are zero or more "parallel
universes" within one process, all independent of each other; such
parallel universes can be created and destroyed by client code.

You raise a concern about restricting where state can be used: the
universe object will indeed contain a grab-bag of pointers to various
other objects, and so it's possible to write code that pokes at one
aspect of state from another unrelated aspect of state, using the
universe object as a nexus.

I wrote about this in:
http://dmalcolm.fedorapeople.org/gcc/global-state/plan.html#parallel-universes-vs-modularity
where my view is that for an initial iteration of this work we *need* to
have such a nexus: we have a spaghetti of interactions already; I'm
merely trying to support having multiple, independent plates of
spaghetti, if you will, prior to distentangling.  I think Andrew
MacLeod's proposal is really the answer here for these concerns, and I
see our proposals as compatible.

There are various places in my plan where I use classes to restrict
access: for example, I have a "class frontend" and "class backend";
presumably stuff could be placed there in an effort to hide things
(either a "ravioli" or "lasagna" model, if that's not stretching the
metaphor too far: encapsulation and layering respectively).


> Now, the bulk conversion to universes seems a lot more maintainable than 
> Joern's multi-target patches, and a lot more plausibly an incremental step 
> to a proper fix, and so a lot more reasonable to go in as an incremental 
> step, but I'd still think of it as one of the infamous partial transitions 
> in the absence of a reason to believe, for each formerly-global object 
> being accessed via the universe (or some other piece of context), that 
> it's being accessed via the right piece of context being passed down to 
> functions that need it, rather than from the global universe for something 
> that doesn't need to be.

Viewing the top 40 globals, many of these are implicitly used by macros,
which makes it hard to provide a suitable piece of context (without
rewriting all the macros).

But if one looks at the *passes*, the picture is much more promising:
the plan breaks up the state by pass, and much of this state can only
ever be accessed from that pass.  For the state that needs to have a
lifetime on a par with that of the universe, that state can be opaque;
the universe has a pointer to say:

   redirect_edge_var_state *edge_vars_;

but the class declaration itself is only made visible in a few source
files, so that if someone starts using it other places, they at least
need to include a new header, and we can catch that at patch review
time.

> As for accessing globals directly versus via APIs: yes, I suppose you do 
> still have an access to a global class instance in each place you formerly 
> had a global variable (that's now a member of that class), so by itself 
> such a conversion to a better API doesn't reduce the number of global 
> variable accesses, just improves the interface in other ways - and it's 
> the changes to pass a pointer to an instance around that reduce the global 
> state usage.  In the case of dump files, pass-local state may be a better 
> place than the universe to keep the instance - it is after all passes.c 
> that calls dump_start / dump_finish.

So a pass instance should have its own dump_flags, and various dump
methods?  Perhaps, but as before, I'd prefer to fix the state issue
first, then provide better APIs.

> (Indeed, it would arguably be an improvement, in cases where an API uses 
> global state completely implicitly as opposed to via a class instance that 
> contains that state, to pull out that global state to the call sites, 
> reflecting that eventually those call sites should be passing in proper 
> non-global values.  For example, converting those diagnostic function uses 
> that implicitly use input_location to use it explicitly instead - each 
> should eventually move to an explicit location that isn't input_location.  
> Implicit use of global_dc matters rather less and it seems reasonable 
> enough for that context to continue to come from the universe in many 
> cases.  Cf. how a couple of years ago I made a lot more option-handling 
> code pass around struct gcc_options *, location_t and diagnostic_context * 
> values, rather than directly using the globals - although at some point, a 
> global value tends to get passed into these functions.)

This sounds similar to the "cfun" patch I posted here:
  http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01564.html
I'll try to finish up that work this week, based on this:
  http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00780.html

Would you be in favor killing off these macros:
  #define input_line LOCATION_LINE (input_location)
  #define input_filename LOCATION_FILE (input_location)
  #define in_system_header (in_system_header_at (input_location))
with patches that make the usage of "input_location" explicit?  (by
replacing all uses of these macros with their expansions, cleaning up
line-wraps as needed).

The only other macro that implicitly uses input_location is
EXPR_LOC_OR_HERE; that could be removed in favor of:
  EXPR_LOC_OR_LOC(expr, input_location)
again making input_location explicit.

(I'd be happy to work on that).

> (Similarly, for target macro to hook conversions (a) I consider a function 
> in targhooks.c that uses the target macro just to be an interim step and a 
> macro only really to be addressed once all the definitions in different 
> targets have been converted to define functions and the macro has been 
> poisoned, and (b) for each such conversion we at least consider what the 
> hook ought to look like rather than presuming a direct conversion of the 
> macro semantics, one macro to one hook, is appropriate.)




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]