This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH 4/4] define ASM_OUTPUT_LABEL to the name of a function
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Trevor Saunders <tbsaunde at tbsaunde dot org>,David Malcolm <dmalcolm at redhat dot com>,tbsaunde+gcc at tbsaunde dot org,GCC Patches <gcc-patches at gcc dot gnu dot org>,richard dot sandiford at arm dot com
- Date: Fri, 07 Aug 2015 22:24:07 +0200
- Subject: Re: [PATCH 4/4] define ASM_OUTPUT_LABEL to the name of a function
- Authentication-results: sourceware.org; auth=none
- References: <CAFiYyc2eP1e=yBUVwVQbCRHn1jsB3oE3b8exV6qfEYZm9F9WVQ at mail dot gmail dot com> <20150805105650 dot GA27755 at tsaunders-iceball dot corp dot tor1 dot mozilla dot com> <CAFiYyc0UQXp=X__=gynjd7ibBOOebAapJzvyeP+mUEn7Nc3XhA at mail dot gmail dot com> <1438788499 dot 21752 dot 39 dot camel at surprise> <1438788868 dot 21752 dot 45 dot camel at surprise> <20150805202212 dot GA6847 at tsaunders-iceball dot corp dot tor1 dot mozilla dot com> <1438886869 dot 21752 dot 62 dot camel at surprise> <87io8sp5tn dot fsf at googlemail dot com> <20150807043113 dot GA24176 at tsaunders-iceball dot corp dot tor1 dot mozilla dot com> <87k2t7ig8a dot fsf at e105548-lin dot cambridge dot arm dot com> <20150807135033 dot GA26612 at tsaunders-iceball dot corp dot tor1 dot mozilla dot com>
On August 7, 2015 3:50:33 PM GMT+02:00, Trevor Saunders <tbsaunde@tbsaunde.org> wrote:
>On Fri, Aug 07, 2015 at 10:45:57AM +0100, Richard Sandiford wrote:
>> Trevor Saunders <tbsaunde@tbsaunde.org> writes:
>> > On Thu, Aug 06, 2015 at 08:36:36PM +0100, Richard Sandiford wrote:
>> >> An integrated assembler or tighter asm output would be nice, but
>when
>> >> I last checked LLVM was usually faster than GCC even when
>compiling to asm,
>> >> even though LLVM does use indirection (in the form of virtual
>functions)
>> >> for its output routines. I don't think indirect function calls
>themselves
>> >> are the problem -- as long as we get the abstraction right :-)
>> >
>> > yeah, last time I looked (tbf a while ago) the C++ front end took
>up by
>> > far the largest part of the time. So it may not be terribly
>important,
>> > but it would still be nice to figure out what a good design looks
>like.
>>
>> I tried getting final to output the code a large number of times.
>> Obviously just sticking "for (i = 0; i < n; ++i)" around something
>> isn't the best way of measuring performance (for all the usual
>reasons)
>> but it was interesting even so. A lot of the time is taken in calls
>to
>> strlen and in assemble_name itself (called by ASM_OUTPUT_LABEL).
>
>yeah, this data looks great. I find it interesting that you say we
>spend so much time outputting labels as opposed to instructions.
>
>> Each time we call assemble_name we do:
>>
>> real_name = targetm.strip_name_encoding (name);
>>
>> id = maybe_get_identifier (real_name);
>> if (id)
>> {
>> tree id_orig = id;
>>
>> mark_referenced (id);
>> ultimate_transparent_alias_target (&id);
>> if (id != id_orig)
>> name = IDENTIFIER_POINTER (id);
>> gcc_assert (! TREE_CHAIN (id));
>> }
>>
>> Doing an identifier lookup every time we output a reference to a
>label
>> is pretty expensive. Especially when many of the labels we're
>dealing
>> with are internal ones (basic block labels, debug labels, etc.) for
>which
>> the lookup is bound to fail.
>
>well, there's ASm_OUTPUT_INTERNAL_LABEL, and I think something similar
>for debug labels. I guess we don't always use those where we could.
>Or
>maybe the problem is we have places where we need to look at data to
>find out. Maybe it would make sense to have the generally used
>output_label routine take a tree / rtx, and check if its a internal or
>debug label and dispatch appropriately.
>
>> So if compile-time for asm output is a concern, that seems like a
>good
>> place to start. We should try harder to keep track of the identifier
>> behind a name (when there is one) and avoid this overhead for
>> internal labels.
>>
>> Converting ASM_OUTPUT_LABEL to an indirect function call was in the
>> noise even with my for-loop hack. The execution time of the hook is
>> dominated by assemble_name itself. I hope patches like yours aren't
>> held up simply because they have the equivalent of a virtual
>function.
>
>Well, I think it makes sense to reroll this series, but I think I'll
>keep working on trying to replace these macros with something else.
>
>> Also, although we seem to be paranoid about virtual functions and
>> indirect calls, it's worth remembering that on most targets every
>> call to fputs(_unlocked), fwrite(_unlocked) and strlen is a PLT call.
>> Our current code calls fputs several times for one line of assembly,
>> including for short strings like register names. This is doubly
>> inefficient because:
>>
>> (a) we could reduce the number of PLT calls by doing the buffering
>> ourselves and
>
>yeah, I mentioned that earlier, but its great to have data showing its
>a
>win! I think its also probably important to enabling the other
>optimizations below.
>
>> (b) the names of those registers are known at compile time (or at
>least
>> at start-up time) and are short, but we call strlen() on them
>> each time we write them out.
>
>yeah, that seems like something that should be fixed, but I'm not sure
>off hand where to look for the code doing this.
>
>> E.g. for the attached microbenchmark I get:
>>
>> Time taken, normalised to VERSION==1
>>
>> VERSION==1: 1.000
>> VERSION==2: 1.377
>> VERSION==3: 3.202 (1.638 with -minline-all-stringops)
>> VERSION==4: 4.242 (2.921 with -minline-all-stringops)
>> VERSION==5: 4.526
>> VERSION==6: 4.543
>> VERSION==7: 10.884
>>
>> where the results for 5 vs. 6 are in the noise.
>>
>> The 5->4 gain is by doing the buffering ourselves. The 4->3 gain is
>for
>> keeping track of the string length rather than recomputing it each
>time.
>>
>> This suggests that if we're serious about trying to speed up the asm
>output,
>> it would be worth adding an equivalent of LLVM's StringRef that pairs
>a
>> const char * string with its length.
>
>I've thought a tiny bit about working on that, so its nice to have
>data.
Tree identifiers have an embedded length.
So its all about avoidibg this target hook mangling the labels.
Richard.
>Trev
>
>>
>> Thanks,
>> Richard
>>
>
>> #define _GNU_SOURCE 1
>>
>> #include <stdio.h>
>> #include <string.h>
>> #include <iostream>
>>
>> struct S
>> {
>> S () : end (buffer) {}
>>
>> ~S ()
>> {
>> fwrite_unlocked (buffer, end - buffer, 1, stdout);
>> }
>>
>> #if VERSION == 3
>> void __attribute__((noinline))
>> #else
>> void
>> #endif
>> write (const char *x, size_t len)
>> {
>> if (__builtin_expect (buffer + sizeof (buffer) - end < len, 0))
>> {
>> fwrite_unlocked (buffer, end - buffer, 1, stdout);
>> end = buffer;
>> }
>> memcpy (end, x, len);
>> end += len;
>> }
>>
>> #if VERSION == 1 || VERSION == 3
>> template <size_t N>
>> void
>> write (const char (&x)[N])
>> {
>> write (x, N - 1);
>> }
>> #elif VERSION == 2
>> template <size_t N>
>> void __attribute__((noinline))
>> write (const char (&x)[N])
>> {
>> write (x, N - 1);
>> }
>> #else
>> void __attribute__((noinline))
>> write (const char *x)
>> {
>> write (x, strlen (x));
>> }
>> #endif
>> char buffer[4096];
>> char *end;
>> };
>>
>> int
>> main ()
>> {
>> S s;
>> for (int i = 0; i < 100000000; ++i)
>> {
>> #if VERSION <= 4
>> s.write ("Hello!");
>> #elif VERSION == 5
>> fputs_unlocked ("Hello!", stdout);
>> #elif VERSION == 6
>> fwrite_unlocked ("Hello!", 6, 1, stdout);
>> #elif VERSION == 7
>> std::cout << "Hello!";
>> #else
>> #error Please define VERSION
>> #endif
>> }
>> return 0;
>> }