This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

REC: gimplify - create a temp that is set at outermost block?


For UPC code generation, we're building an alternate
method of accessing thread-local data that does not depend upon
operating system support of the __thread qualifier.

The motivation for this change is that we've noticed that
__thread has varying levels of support across operating
system/hardware platforms, and that when used extensively,
we've seen capacity limitations on some target systems.
UPC programs, when compiled in "pthreads mode" implicitly
define all normal, file scoped or static, variables as
being thread-local, which can lead to many TLS variables
or to a TLS section that is quite large.

The alternate implementation of TLS begins by targeting
all TLS variables to a special named section.  As an example,
the declaration,
  __thread int x;
can be thought of as being re-written into:
  int x __attribute__ ((section("tls_section")));
The runtime will allocate a per-thread block of memory
that is the size of "tls_section", and initialized by the
contents of that dummy section.  This per-thread TLS base
address will be maintained in an OS-dependent fashion as
a per-thread value that will be returned by a function,
called __get_tls(), which will obtain the per-thread value
(possibly calling a function an OS-supplied function,
for example, pthread_getspecific()).

All references to 'x' will be rewritten by the UPC-specific
gimplify pass into:
  *((&x - __tls_section_start) + __get_tls())
Above, "&x" is the address of 'x' derived in the conventional
fashion as its address inside the TLS dummy section, which
starts at the address given by "__tls_section_start".

The gimplify code that currently implements this calculation
looks like this:

  tls_base = lookup_name (get_identifier (UPC_TLS_BEGIN_NAME_STR));
  if (!tls_base)
    fatal_error ("UPC thread-local section start address not found.  "
                 "Cannot find a definition for " UPC_TLS_BEGIN_NAME_STR);
  tls_base = build1 (ADDR_EXPR, char_ptr_type, tls_base);
  /* Refer to a shadow variable so that we don't try to re-gimplify
   * this TLS variable reference.  */
  var_addr = shadow_var_addr (var_decl);
  tls_offset = build_binary_op (MINUS_EXPR,
                                convert (ptrdiff_type_node, var_addr),
                                convert (ptrdiff_type_node, tls_base), 0);
  if (!useless_type_conversion_p (sizetype, TREE_TYPE (tls_offset)))
    tls_offset = convert (sizetype, tls_offset);
  tls_var_addr = build2 (POINTER_PLUS_EXPR, char_ptr_type,
                         cfun->upc_thread_ctx_tmp, tls_offset);
  tls_ref = build_fold_indirect_ref (tls_var_addr);
  *expr_p = tls_ref;
  return GS_OK;

(If you see any opportunities to improve/correct this code,
please feel free to comment.)

Above, you'll see a reference to "cfun->upc_thread_ctx_tmp";
this is a temporary variable that holds the value returned from
__get_tls().  The idea is to call __get_tls() only once
upon entry to the current function being compiled, and to re-use
its value where needed.

I made a first attempt at implementing this caching of the
__get_tls() value, but have so far been unsuccessful.  Here's
the current implementation:

  if (!cfun->upc_thread_ctx_tmp)
    {
      const char *libfunc_name = UPC_GET_TLS_LIBCALL;
      tree libfunc, lib_call, tmp;
      libfunc = lookup_name (get_identifier (libfunc_name));
      if (!libfunc)
        internal_error ("runtime function %s not found", libfunc_name);
      lib_call = build_function_call (libfunc, NULL_TREE);
      if (!lang_hooks.types_compatible_p (char_ptr_type, TREE_TYPE (lib_call)))
        lib_call = build1 (NOP_EXPR, char_ptr_type, lib_call);
      tmp = create_tmp_var_raw (char_ptr_type, "TLS");
      TREE_READONLY (tmp) = 1;
      DECL_INITIAL (tmp) = lib_call;
      /* Record the TLS base address at the outermost level of
       * this function.  */
      DECL_CONTEXT (tmp) = current_function_decl;
      DECL_SEEN_IN_BIND_EXPR_P (tmp) = 1;
      declare_vars (tmp, DECL_SAVED_TREE (current_function_decl), false);
      cfun->upc_thread_ctx_tmp = tmp;
    }

(The code from "TREE_READONLY" to "DECL_SEEN_IN_BIND_EXPR" above
is cribbed from "gimple_add_tmp_var()" and
"gimplify_init_constructor()".)

The idea above is to initialize a temporary variable at
the outer scope of the current function.  Presumably,
setting the initial value to the value returned by calling
__get_tls(), and then calling "declare_vars()" to declare the
temp. variable at the outermost scope of the function will
do the job, but this code isn't having the intended effect.

My sense is that the DECL_INITIAL() value above is being
ignored and that code isn't being generated for it, and
it seems possible that it won't be properly rescanned
for gimplification.

I'd appreciate any observations that you might have on
why the implementation above doesn't work, and how to
re-implement this section of code so that it has the
desired effect.  Perhaps, there's is code in GCC that
currently does something like this, that I can refer to?

There are some workarounds that I can think of, including
just calling __get_tls() every time it's needed, and
letting the optimizer commonize calls to that function
(on the assumption that the function is declared with
__attribute__(("const")) ), but I'd rather find a way
that generates reasonable code without the need for an
optimization pass to fix things up.

Thanks in advance for your help/suggestions.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]