This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 4/4] define ASM_OUTPUT_LABEL to the name of a function


Trevor Saunders <tbsaunde@tbsaunde.org> writes:
> On Thu, Aug 06, 2015 at 08:36:36PM +0100, Richard Sandiford wrote:
>> An integrated assembler or tighter asm output would be nice, but when
>> I last checked LLVM was usually faster than GCC even when compiling to asm,
>> even though LLVM does use indirection (in the form of virtual functions)
>> for its output routines.  I don't think indirect function calls themselves
>> are the problem -- as long as we get the abstraction right :-)
>
> yeah, last time I looked (tbf a while ago) the C++ front end took up by
> far the largest part of the time.  So it may not be terribly important,
> but it would still be nice to figure out what a good design looks like.

I tried getting final to output the code a large number of times.
Obviously just sticking "for (i = 0; i < n; ++i)" around something
isn't the best way of measuring performance (for all the usual reasons)
but it was interesting even so.  A lot of the time is taken in calls to
strlen and in assemble_name itself (called by ASM_OUTPUT_LABEL).
Each time we call assemble_name we do:

  real_name = targetm.strip_name_encoding (name);

  id = maybe_get_identifier (real_name);
  if (id)
    {
      tree id_orig = id;

      mark_referenced (id);
      ultimate_transparent_alias_target (&id);
      if (id != id_orig)
	name = IDENTIFIER_POINTER (id);
      gcc_assert (! TREE_CHAIN (id));
    }

Doing an identifier lookup every time we output a reference to a label
is pretty expensive.  Especially when many of the labels we're dealing
with are internal ones (basic block labels, debug labels, etc.) for which
the lookup is bound to fail.

So if compile-time for asm output is a concern, that seems like a good
place to start.  We should try harder to keep track of the identifier
behind a name (when there is one) and avoid this overhead for
internal labels.

Converting ASM_OUTPUT_LABEL to an indirect function call was in the
noise even with my for-loop hack.  The execution time of the hook is
dominated by assemble_name itself.  I hope patches like yours aren't
held up simply because they have the equivalent of a virtual function.

Also, although we seem to be paranoid about virtual functions and
indirect calls, it's worth remembering that on most targets every
call to fputs(_unlocked), fwrite(_unlocked) and strlen is a PLT call.
Our current code calls fputs several times for one line of assembly,
including for short strings like register names.  This is doubly
inefficient because:

(a) we could reduce the number of PLT calls by doing the buffering
    ourselves and

(b) the names of those registers are known at compile time (or at least
    at start-up time) and are short, but we call strlen() on them
    each time we write them out.

E.g. for the attached microbenchmark I get:

  Time taken, normalised to VERSION==1

  VERSION==1:  1.000
  VERSION==2:  1.377
  VERSION==3:  3.202 (1.638 with -minline-all-stringops)
  VERSION==4:  4.242 (2.921 with -minline-all-stringops)
  VERSION==5:  4.526
  VERSION==6:  4.543
  VERSION==7: 10.884

where the results for 5 vs. 6 are in the noise.

The 5->4 gain is by doing the buffering ourselves.  The 4->3 gain is for
keeping track of the string length rather than recomputing it each time.

This suggests that if we're serious about trying to speed up the asm output,
it would be worth adding an equivalent of LLVM's StringRef that pairs a
const char * string with its length.

Thanks,
Richard

#define _GNU_SOURCE 1

#include <stdio.h>
#include <string.h>
#include <iostream>

struct S
{
  S () : end (buffer) {}

  ~S ()
  {
    fwrite_unlocked (buffer, end - buffer, 1, stdout);
  }

#if VERSION == 3
  void __attribute__((noinline))
#else
  void
#endif
  write (const char *x, size_t len)
  {
    if (__builtin_expect (buffer + sizeof (buffer) - end < len, 0))
      {
	fwrite_unlocked (buffer, end - buffer, 1, stdout);
	end = buffer;
      }
    memcpy (end, x, len);
    end += len;
  }

#if VERSION == 1 || VERSION == 3
  template <size_t N>
  void
  write (const char (&x)[N])
  {
    write (x, N - 1);
  }
#elif VERSION == 2
  template <size_t N>
  void __attribute__((noinline))
  write (const char (&x)[N])
  {
    write (x, N - 1);
  }
#else
  void __attribute__((noinline))
  write (const char *x)
  {
    write (x, strlen (x));
  }
#endif
  char buffer[4096];
  char *end;
};

int
main ()
{
  S s;
  for (int i = 0; i < 100000000; ++i)
    {
#if VERSION <= 4
      s.write ("Hello!");
#elif VERSION == 5
      fputs_unlocked ("Hello!", stdout);
#elif VERSION == 6
      fwrite_unlocked ("Hello!", 6, 1, stdout);
#elif VERSION == 7
      std::cout << "Hello!";
#else
#error Please define VERSION
#endif
    }
  return 0;
}

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]