This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
genattrtab speedup 4/4
- From: Michael Matz <matz at suse dot de>
- To: gcc-patches at gcc dot gnu dot org
- Date: Mon, 1 Aug 2005 01:10:10 +0200 (CEST)
- Subject: genattrtab speedup 4/4
Hi,
this last patch concludes the series (mostly), and gives back even more
performance by reviving on old idea of mine. Instead of accessing the
individual attributes by calling get_attr_* functions each time, they are
cached at function start, and accessed from there.
I.e. this:
if (get_attr_length (insn) == 1 || get_attr_type (insn) == TYPE_BLA)
...
else if (get_attr_type (insn) == TYPE_OTHER)
...
this is done:
// function start
attr_length = get_attr_length (insn);
attr_type = get_attr_type (insn);
...
switch ...
...
if (attr_length == 1 || attr_type == TYPE_BLA)
...
else if (attr_type == TYPE_OTHER)
...
this again reduces the size of insn-attrtab.o and the speed of compiling
it. It also reduces run time. There might be the concern that now
attributes are looked up, which might not be necessary for the insn at
hand. I haven't found this to be an issue at all. (If this would lead to
compiler errors or so, then the .md file is written incorrectly anyway).
So, now for the results. First, I have bootstrapped and regtested on
x86-64-linux with all patches applied (all languages), including a C++ fix
I shall post shortly.
Second I have run small correctness and performance tests with
cross-compiler from x86-64 to 11 different architectures. I have tested
correctness by compiling a file of C code using preprocessor to generate
large functions using various types and features. And I compiled a C++
source file representing the whole libkdecore of KDE. Each time I
compared the resulting assembler file with that produced by the cross
compiler from CVS. I did this for every of the four patches, i.e. five
tests for each of the 11 architectures. There were zero differences
(except on sparc on kdecore.cc, which is fixed by the mentioned C++
patch). So far about correctness. For performance and other statistics I
have some tables. For brevity I will only show the comparison between CVS
(no patches) and nocall (all patches applied).
The cross compiler were created by
% ../gcc/configure --enable-languages=c,c++ $arch-linux
% make CFLAGS=-g -j7 all-build all-libcpp configure-gcc
% make CFLAGS=-g -j7 -C gcc cc1 cc1plus
I.e. the cc1, cc1plus compilers are compiled without optimizations. The
rows have the following meaning:
gen_u User time of running genattrtab (compiled by the system gcc)
c_tab_u User time of compiling insn-attrtab.c with the system gcc (!)
big_u User time of compiling big-code.c with resulting cross cc1 -O2
itab_s Size of insn-attrtab.c in KB
itab_l Number of lines of insn-attrtab.c
itab_o Size of insn-attrtab.o in KB (compiled with system gcc)
st1_u User time of running cc1 on the generated insn-attrtab.c with -O2
kde_u User time of compiling kdecore.cc with resulting cross cc1 -O2
So, st1_u is the time the stage1 compiler would need to compile the
generated insn-attrtab.c (but remember this is a cross compiler, so only
the relative performance is of interested)
alpha alpha arm arm hppa hppa i386 i386
cvs nocall cvs nocall cvs nocall cvs nocall
gen_u 0 0 4 0 0 0 17 6
c_tab_u 0 0 1 0 0 0 4 0
big_u 53 53 5 5 51 51 47 47
itab_s 557 428 1009 482 536 342 2986 777
itab_l 14160 8535 32293 12530 15208 7365 90949 19118
itab_o 192 121 635 309 316 138 1530 297
st1_u 4 2 1 1 5 3 136 10
kde_u 437.90 446.60 450.76 428.41
(the arm and hppa cross cc1plus compilers had an ICE with kdecore.cc, even
without patches, so no results for them).
ia64 ia64 ppc ppc s390 s390 sh sh
cvs nocall cvs nocall cvs nocall cvs nocall
gen_u 143 140 26 17 3 3 1 0
c_tab_u 5 4 4 3 0 0 0 0
big_u 63 63 46 46 55 56 64 64
itab_s 8917 8548 6847 6004 283 119 478 303
itab_l 144953 130543 127301 95033 11476 4576 15298 8149
itab_o 1624 1344 1372 1109 211 106 425 212
st1_u 24 14 33 17 4 2 7 4
kde_u 522.39 523.36 436.67 438.66 451.05 451.24 550.03 539.58
mips mips sparc sparc x86_64 x86_64
cvs nocall cvs nocall cvs nocall
gen_u 6 1 0 0 22 6
c_tab_u 1 0 0 0 5 1
big_u 53 54 48 49 48 49
itab_s 1283 665 496 292 3484 811
itab_l 43830 16938 16063 7885 106123 20330
itab_o 656 310 307 176 1830 321
st1_u 25 11 6 4 156 10
kde_u * * * * 471.74 448.22
I need to rerun the kdecore.cc compilation for mips and sparc, as the
machine had a hickup when they were run, and the timing is wrong (but the
.s file is correct).
Note that e.g. for x86_64 we go from 2:36 minutes to 10 seconds in
compiling insn-attrtab.c in stage1, insn-attrtab.o becomes 1.5 MB smaller
and the resulting compiler even is a bit faster on kdecore.cc.
These patches remove the serialization point for a parallel bootstrap very
well, except on ia64 unfortunately. There genattrtab itself is relatively
slow, but not because of the decision tree for the attributes, but because
of construction of the scheduling automata. This area is not touched by
these patches, so the speedup is relatively small.
Okay (all four patches)?
Ciao,
Michael.
--
* genattrtab.c (write_cache_used_attributes): New.
(write_attr_switch, write_attr_get): Use it.
(write_attr_set): Make write_test_expr use cached values.
--- genattrtab.icode.c 2005-07-31 22:53:26.104092740 +0200
+++ genattrtab.nocall.c 2005-07-31 22:53:49.262701813 +0200
@@ -3677,6 +3677,23 @@ walk_attr_value (rtx exp)
}
static void
+write_cache_used_attributes (struct attr_desc *attr)
+{
+ struct attr_value *av;
+ struct attr_desc *attr2;
+ int i;
+
+ for (i = 0; i < MAX_ATTRS_INDEX; ++i)
+ for (attr2 = attrs[i]; attr2; attr2 = attr2->next)
+ if (!attr2->is_const)
+ for (av = attr->first_value; av; av = av->next)
+ if (av->num_insns != 0)
+ if (write_expr_attr_cache (av->value, attr2))
+ break;
+}
+
+
+static void
write_attr_switch (struct attr_desc *attr, int indent, const char *prefix,
const char *suffix)
{
@@ -3690,6 +3707,7 @@ write_attr_switch (struct attr_desc *att
printf ("{\n");
write_indent (indent);
printf (" int insn_code = recog_memoized (insn);\n");
+ write_cache_used_attributes (attr);
write_indent (indent);
printf (" switch (insn_code)\n");
write_indent (indent);
@@ -3733,6 +3751,7 @@ write_attr_get (struct attr_desc *attr)
printf ("get_attr_%s (void)\n", attr->name);
printf ("{\n");
+ write_cache_used_attributes (attr);
for (av = attr->first_value; av; av = av->next)
if (av->num_insns == 1)
write_attr_set (attr, 2, av->value, "return", ";",
@@ -3832,7 +3851,7 @@ write_attr_set (struct attr_desc *attr,
write_indent (indent);
printf ("%sif ", first_if ? "" : "else ");
first_if = 0;
- write_test_expr (testexp, 0);
+ write_test_expr (testexp, 2);
printf ("\n");
write_indent (indent + 2);
printf ("{\n");