[PATCH 4/4] define ASM_OUTPUT_LABEL to the name of a function

Trevor Saunders tbsaunde@tbsaunde.org
Fri Aug 7 13:50:00 GMT 2015


On Fri, Aug 07, 2015 at 10:45:57AM +0100, Richard Sandiford wrote:
> Trevor Saunders <tbsaunde@tbsaunde.org> writes:
> > On Thu, Aug 06, 2015 at 08:36:36PM +0100, Richard Sandiford wrote:
> >> An integrated assembler or tighter asm output would be nice, but when
> >> I last checked LLVM was usually faster than GCC even when compiling to asm,
> >> even though LLVM does use indirection (in the form of virtual functions)
> >> for its output routines.  I don't think indirect function calls themselves
> >> are the problem -- as long as we get the abstraction right :-)
> >
> > yeah, last time I looked (tbf a while ago) the C++ front end took up by
> > far the largest part of the time.  So it may not be terribly important,
> > but it would still be nice to figure out what a good design looks like.
> 
> I tried getting final to output the code a large number of times.
> Obviously just sticking "for (i = 0; i < n; ++i)" around something
> isn't the best way of measuring performance (for all the usual reasons)
> but it was interesting even so.  A lot of the time is taken in calls to
> strlen and in assemble_name itself (called by ASM_OUTPUT_LABEL).

yeah, this data looks great.  I find it interesting that you say we
spend so much time outputting labels as opposed to instructions.

> Each time we call assemble_name we do:
> 
>   real_name = targetm.strip_name_encoding (name);
> 
>   id = maybe_get_identifier (real_name);
>   if (id)
>     {
>       tree id_orig = id;
> 
>       mark_referenced (id);
>       ultimate_transparent_alias_target (&id);
>       if (id != id_orig)
> 	name = IDENTIFIER_POINTER (id);
>       gcc_assert (! TREE_CHAIN (id));
>     }
> 
> Doing an identifier lookup every time we output a reference to a label
> is pretty expensive.  Especially when many of the labels we're dealing
> with are internal ones (basic block labels, debug labels, etc.) for which
> the lookup is bound to fail.

well, there's ASm_OUTPUT_INTERNAL_LABEL, and I think something similar
for debug labels.  I guess we don't always use those where we could.  Or
maybe the problem is we have places where we need to look at data to
find out.  Maybe it would make sense to have the generally used
output_label routine take a tree / rtx, and check if its a internal or
debug label and dispatch appropriately.

> So if compile-time for asm output is a concern, that seems like a good
> place to start.  We should try harder to keep track of the identifier
> behind a name (when there is one) and avoid this overhead for
> internal labels.
> 
> Converting ASM_OUTPUT_LABEL to an indirect function call was in the
> noise even with my for-loop hack.  The execution time of the hook is
> dominated by assemble_name itself.  I hope patches like yours aren't
> held up simply because they have the equivalent of a virtual function.

Well, I think it makes sense to reroll this series, but I think I'll
keep working on trying to replace these macros with something else.

> Also, although we seem to be paranoid about virtual functions and
> indirect calls, it's worth remembering that on most targets every
> call to fputs(_unlocked), fwrite(_unlocked) and strlen is a PLT call.
> Our current code calls fputs several times for one line of assembly,
> including for short strings like register names.  This is doubly
> inefficient because:
> 
> (a) we could reduce the number of PLT calls by doing the buffering
>     ourselves and

yeah, I mentioned that earlier, but its great to have data showing its a
win!  I think its also probably important to enabling the other
optimizations below.

> (b) the names of those registers are known at compile time (or at least
>     at start-up time) and are short, but we call strlen() on them
>     each time we write them out.

yeah, that seems like something that should be fixed, but I'm not sure
off hand where to look for the code doing this.

> E.g. for the attached microbenchmark I get:
> 
>   Time taken, normalised to VERSION==1
> 
>   VERSION==1:  1.000
>   VERSION==2:  1.377
>   VERSION==3:  3.202 (1.638 with -minline-all-stringops)
>   VERSION==4:  4.242 (2.921 with -minline-all-stringops)
>   VERSION==5:  4.526
>   VERSION==6:  4.543
>   VERSION==7: 10.884
> 
> where the results for 5 vs. 6 are in the noise.
> 
> The 5->4 gain is by doing the buffering ourselves.  The 4->3 gain is for
> keeping track of the string length rather than recomputing it each time.
> 
> This suggests that if we're serious about trying to speed up the asm output,
> it would be worth adding an equivalent of LLVM's StringRef that pairs a
> const char * string with its length.

I've thought a tiny bit about working on that, so its nice to have data.

Trev

> 
> Thanks,
> Richard
> 

> #define _GNU_SOURCE 1
> 
> #include <stdio.h>
> #include <string.h>
> #include <iostream>
> 
> struct S
> {
>   S () : end (buffer) {}
> 
>   ~S ()
>   {
>     fwrite_unlocked (buffer, end - buffer, 1, stdout);
>   }
> 
> #if VERSION == 3
>   void __attribute__((noinline))
> #else
>   void
> #endif
>   write (const char *x, size_t len)
>   {
>     if (__builtin_expect (buffer + sizeof (buffer) - end < len, 0))
>       {
> 	fwrite_unlocked (buffer, end - buffer, 1, stdout);
> 	end = buffer;
>       }
>     memcpy (end, x, len);
>     end += len;
>   }
> 
> #if VERSION == 1 || VERSION == 3
>   template <size_t N>
>   void
>   write (const char (&x)[N])
>   {
>     write (x, N - 1);
>   }
> #elif VERSION == 2
>   template <size_t N>
>   void __attribute__((noinline))
>   write (const char (&x)[N])
>   {
>     write (x, N - 1);
>   }
> #else
>   void __attribute__((noinline))
>   write (const char *x)
>   {
>     write (x, strlen (x));
>   }
> #endif
>   char buffer[4096];
>   char *end;
> };
> 
> int
> main ()
> {
>   S s;
>   for (int i = 0; i < 100000000; ++i)
>     {
> #if VERSION <= 4
>       s.write ("Hello!");
> #elif VERSION == 5
>       fputs_unlocked ("Hello!", stdout);
> #elif VERSION == 6
>       fwrite_unlocked ("Hello!", 6, 1, stdout);
> #elif VERSION == 7
>       std::cout << "Hello!";
> #else
> #error Please define VERSION
> #endif
>     }
>   return 0;
> }



More information about the Gcc-patches mailing list