This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

genattrtab speedup 4/4


Hi,

this last patch concludes the series (mostly), and gives back even more 
performance by reviving on old idea of mine.  Instead of accessing the 
individual attributes by calling get_attr_* functions each time, they are 
cached at function start, and accessed from there.

I.e. this:
   if (get_attr_length (insn) == 1 || get_attr_type (insn) == TYPE_BLA)
     ...
   else if (get_attr_type (insn) == TYPE_OTHER)
     ...

this is done:
  // function start
  attr_length = get_attr_length (insn);
  attr_type = get_attr_type (insn);
  ...
  switch ...
    ...
    if (attr_length == 1 || attr_type == TYPE_BLA)
      ...
    else if (attr_type == TYPE_OTHER)
      ...

this again reduces the size of insn-attrtab.o and the speed of compiling 
it.  It also reduces run time.  There might be the concern that now 
attributes are looked up, which might not be necessary for the insn at 
hand.  I haven't found this to be an issue at all.  (If this would lead to 
compiler errors or so, then the .md file is written incorrectly anyway).

So, now for the results.  First, I have bootstrapped and regtested on 
x86-64-linux with all patches applied (all languages), including a C++ fix 
I shall post shortly.

Second I have run small correctness and performance tests with 
cross-compiler from x86-64 to 11 different architectures.  I have tested 
correctness by compiling a file of C code using preprocessor to generate 
large functions using various types and features.  And I compiled a C++ 
source file representing the whole libkdecore of KDE.  Each time I 
compared the resulting assembler file with that produced by the cross 
compiler from CVS.  I did this for every of the four patches, i.e. five 
tests for each of the 11 architectures.  There were zero differences 
(except on sparc on kdecore.cc, which is fixed by the mentioned C++ 
patch).  So far about correctness.  For performance and other statistics I 
have some tables.  For brevity I will only show the comparison between CVS 
(no patches) and nocall (all patches applied).

The cross compiler were created by
% ../gcc/configure --enable-languages=c,c++ $arch-linux 
% make CFLAGS=-g -j7 all-build all-libcpp configure-gcc
% make CFLAGS=-g -j7 -C gcc cc1 cc1plus

I.e. the cc1, cc1plus compilers are compiled without optimizations.  The 
rows have the following meaning:

gen_u   User time of running genattrtab (compiled by the system gcc)
c_tab_u User time of compiling insn-attrtab.c with the system gcc (!)
big_u   User time of compiling big-code.c with resulting cross cc1 -O2
itab_s  Size of insn-attrtab.c in KB
itab_l  Number of lines of insn-attrtab.c
itab_o  Size of insn-attrtab.o in KB (compiled with system gcc)
st1_u   User time of running cc1 on the generated insn-attrtab.c with -O2
kde_u   User time of compiling kdecore.cc with resulting cross cc1 -O2

So, st1_u is the time the stage1 compiler would need to compile the 
generated insn-attrtab.c (but remember this is a cross compiler, so only 
the relative performance is of interested)

        alpha   alpha   arm     arm     hppa    hppa    i386    i386
        cvs     nocall  cvs     nocall  cvs     nocall  cvs     nocall
gen_u   0       0       4       0       0       0       17      6
c_tab_u 0       0       1       0       0       0       4       0
big_u   53      53      5       5       51      51      47      47
itab_s  557     428     1009    482     536     342     2986    777
itab_l  14160   8535    32293   12530   15208   7365    90949   19118
itab_o  192     121     635     309     316     138     1530    297
st1_u   4       2       1       1       5       3       136     10
kde_u   437.90  446.60                                  450.76  428.41

(the arm and hppa cross cc1plus compilers had an ICE with kdecore.cc, even 
without patches, so no results for them).

        ia64    ia64    ppc     ppc     s390    s390    sh      sh
        cvs     nocall  cvs     nocall  cvs     nocall  cvs     nocall
gen_u   143     140     26      17      3       3       1       0
c_tab_u 5       4       4       3       0       0       0       0
big_u   63      63      46      46      55      56      64      64
itab_s  8917    8548    6847    6004    283     119     478     303
itab_l  144953  130543  127301  95033   11476   4576    15298   8149
itab_o  1624    1344    1372    1109    211     106     425     212
st1_u   24      14      33      17      4       2       7       4
kde_u   522.39  523.36  436.67  438.66  451.05  451.24  550.03  539.58

        mips    mips    sparc   sparc   x86_64  x86_64
        cvs     nocall  cvs     nocall  cvs     nocall
gen_u   6       1       0       0       22      6
c_tab_u 1       0       0       0       5       1
big_u   53      54      48      49      48      49
itab_s  1283    665     496     292     3484    811
itab_l  43830   16938   16063   7885    106123  20330
itab_o  656     310     307     176     1830    321
st1_u   25      11      6       4       156     10
kde_u   *       *       *       *       471.74  448.22

I need to rerun the kdecore.cc compilation for mips and sparc, as the 
machine had a hickup when they were run, and the timing is wrong (but the 
.s file is correct).

Note that e.g. for x86_64 we go from 2:36 minutes to 10 seconds in 
compiling insn-attrtab.c in stage1, insn-attrtab.o becomes 1.5 MB smaller 
and the resulting compiler even is a bit faster on kdecore.cc.

These patches remove the serialization point for a parallel bootstrap very 
well, except on ia64 unfortunately.  There genattrtab itself is relatively 
slow, but not because of the decision tree for the attributes, but because 
of construction of the scheduling automata.  This area is not touched by 
these patches, so the speedup is relatively small.

Okay (all four patches)?


Ciao,
Michael.
-- 
	* genattrtab.c (write_cache_used_attributes): New.
	(write_attr_switch, write_attr_get): Use it.
	(write_attr_set): Make write_test_expr use cached values.

--- genattrtab.icode.c	2005-07-31 22:53:26.104092740 +0200
+++ genattrtab.nocall.c	2005-07-31 22:53:49.262701813 +0200
@@ -3677,6 +3677,23 @@ walk_attr_value (rtx exp)
 }
 
 static void
+write_cache_used_attributes (struct attr_desc *attr)
+{
+  struct attr_value *av;
+  struct attr_desc *attr2;
+  int i;
+
+  for (i = 0; i < MAX_ATTRS_INDEX; ++i)
+    for (attr2 = attrs[i]; attr2; attr2 = attr2->next)
+      if (!attr2->is_const)
+        for (av = attr->first_value; av; av = av->next)
+          if (av->num_insns != 0)
+            if (write_expr_attr_cache (av->value, attr2))
+              break;
+}
+
+
+static void
 write_attr_switch (struct attr_desc *attr, int indent, const char *prefix,
 		   const char *suffix)
 {
@@ -3690,6 +3707,7 @@ write_attr_switch (struct attr_desc *att
   printf ("{\n");
   write_indent (indent);
   printf ("  int insn_code = recog_memoized (insn);\n");
+  write_cache_used_attributes (attr);
   write_indent (indent);
   printf ("  switch (insn_code)\n");
   write_indent (indent);
@@ -3733,6 +3751,7 @@ write_attr_get (struct attr_desc *attr)
       printf ("get_attr_%s (void)\n", attr->name);
       printf ("{\n");
 
+      write_cache_used_attributes (attr);
       for (av = attr->first_value; av; av = av->next)
 	if (av->num_insns == 1)
 	  write_attr_set (attr, 2, av->value, "return", ";",
@@ -3832,7 +3851,7 @@ write_attr_set (struct attr_desc *attr, 
 	  write_indent (indent);
 	  printf ("%sif ", first_if ? "" : "else ");
 	  first_if = 0;
-	  write_test_expr (testexp, 0);
+	  write_test_expr (testexp, 2);
 	  printf ("\n");
 	  write_indent (indent + 2);
 	  printf ("{\n");


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]