This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Assembly output optimisations (was: PR 51094 - fprint_w() in output_addr_const() reinstated)
- From: Dimitrios Apostolou <jimis at gmx dot net>
- To: gcc-patches at gcc dot gnu dot org
- Cc: Andrey Belevantsev <abel at ispras dot ru>, jason at gcc dot gnu dot org, Hans-Peter Nilsson <hp at bitrange dot com>, Mike Stump <mikestump at comcast dot net>, Andreas Schwab <schwab at linux-m68k dot org>
- Date: Tue, 7 Aug 2012 03:18:45 +0300 (EEST)
- Subject: Assembly output optimisations (was: PR 51094 - fprint_w() in output_addr_const() reinstated)
Hello list,
these clean-ups and minor speedups complete some TODOs and semi-finished
changes I have gathered in the ELF backend. In a nutshell:
Fixed comment style, used INT_BITS_STRLEN_BOUND from gnulib to be future
proof on integer representation string length, replaced long arguments in
fast printing functions with HOST_WIDE_INT that is always a larger type
(also asserted that), converted some never-negative ints to unsigned.
Guarded the output.h:default_elf_asm_output_* declarations, mimicking
varasm.c (I'm not exactly sure why this is guarded in the first place).
Changed default_elf_asm_output_* to be clearer and faster, they now
fwrite() line by line instead of putting char by char. Implemented fast
octal output in default_elf_asm_output_*, this should give a good boost to
-flto, but I haven't measured a big testcase for this one.
All in all I get a speed-up of ~30 M instr out of ~2 G instr, for -g3
compilation of reload.c. Actually saving all the putc() calls gives more
significant gain, but I lost a tiny bit because of converting [sf]print_*
functions to HOST_WIDE_INT from long, for PR 51094. So on i586 which has
HOST_WIDE_INT 8 byte wide, I can see slow calls to __u{div,mod}di3 taking
place. I don't know whether there is a meaning in writing LEB128 values
greater than 2^31 but I could change all that to HOST_WIDEST_FAST_INT if
you think so.
Time savings are minor too, about 10 ms out of 0.85 s. Memory usage is the
same. Bootstrapped on x86, no regressions for C,C++ testsuite.
Thanks Andreas, hp, Mike, for your comments. Mike I'd appreciate if you
elaborated on how to speed-up sprint_uw_rev(), I don't think I understood
what you have in mind.
Thanks,
Dimitris
Attachment:
Changelog-dwarf2out-3
Description: Text document
Attachment:
dwarf2out-3.diff
Description: Text document