PATCH: Improve Java build times

Mark Mitchell mark@codesourcery.com
Sun May 19 10:02:00 GMT 2002


I analyzed why building libjava (and presumably any other Java
program) took so long to build.

Here is some background:

1. When you need to load a class in Java, you search the "class path"
   (which is like the PATH environment variable) for the class.  You
   go through each of the directories and use the first ".class" or
   ".java" file that matches the name of the class for which you're
   looking.

2. Often, lookup fails -- you don't find the class at all.  That
   happens when you're not quite sure what the name of the class is; 
   you have to try several different possibilities until you get the
   right one.

3. GCJ was caching the result of a successful lookup, so that once you
   find the class you odn't ever go back to the filesystem to find it
   again for that particular class.  However, it was not caching
   *failures*, with the result that it *would* go back to the
   filesystem again to reconfirm negative results.

Compiling a randomly chosen file from libjava, therefore, took 86,101
system calls, of which 82,099 were calls to stat.  (Measurements on 
GNU/Linux using strace.)  On a GNU/Linux box, "time" reported:

  real	0m1.854s
  user	0m1.430s
  sys	0m0.400s

After the change, attached, the figures were 5,589 system calls, of
which 681 were calls to stat.   "time" now reports:

  real	0m1.037s
  user	0m0.940s
  sys	0m0.090s

This is a very significant win (44% total speedup, and a dramatic 77%
reduction in system time).

On GNU/Linux, the total time taken to build libgcj (including
configure) drops from:

  real	33m58.371s
  user	21m58.910s
  sys	9m7.340s

to:

  real	29m31.282s
  user	20m54.540s
  sys	5m53.200s

On systems with filesystems that do not perform as well -- either
because the OS is bad or because you do not have a fast local disk on
which to work -- the win is absolutely vast.  My IRIX 6.5 builds *of
the entire compiler* -- configuring, bootstrapping C, C++, and Java,
the C++ and Java runtime libraries -- went from 15 hours to less than
two as a result of this patch.

Here are techniques I used:

1. Cache negative lookups of class names so that we do not do them 
   again.  (This is the part of the patch responsible for the
   reduction in user time; we avoid running through the linked list
   of directories.)

2. Cache the contents of directories in the class path (using scandir
   to read the directory) and avoid calling stat for files we know do
   not exist.

Both techniques are necessary to get the full win, although even (1)
alone helps considerably.  (2) is conditional on the system having
"scandir" and "alphasort"; if you don't have that, configure will not
enable that portion of the new code.  (Someone could implement these
functions in terms of readdir and put them in libiberty, and then they
could always be on.)

Bootstrapped and tested on i686-pc-linux-gnu and mips-sgi-irix6.5.
Installed on the mainline.

-- 
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

2002-05-18  Mark Mitchell  <mark@codesourcery.com>

	* configure.in (AC_CHECK_FUNCS): Add checks for scandir and
	alphasort.
	* config.in: Regenerated.
	* configure: Regenerated.

2002-05-18  Mark Mitchell  <mark@codesourcery.com>

	* gjavah.c (throwable_p): Do not free the name of the class after
	passing it to find_class.
	* java-tree.h (CLASS_BEING_LAIDOUT): Remove duplicate definition.
	* jcf-io.c (dirent.h): Include it.
	(fnmatch.h): Likewise.
	(compare_path): New function.
	(java_or_class_file): Likewise.
	(memoized_dirlist_entry): New type.
	(memoized_dirlist_lookup_eq): New function.
	(memoized_dirlists): New variable.
	(caching_stat): New function.
	(memoized_class_lookup_eq): New function.
	(memoized_class_lookups): Likewise.
	(find_class): Use memoized_class_lookups and caching_stat.
	* jcf.h (JCF_USE_SCANDIR): Define.
	* parse.y (java_expand_classes): Write the class files in reverse
	order.
	
Index: gcc/configure.in
===================================================================
RCS file: /cvs/gcc/gcc/gcc/configure.in,v
retrieving revision 1.587
diff -c -p -r1.587 configure.in
*** gcc/configure.in	7 May 2002 22:52:18 -0000	1.587
--- gcc/configure.in	19 May 2002 02:49:03 -0000
*************** dnl gcc_AC_C_ENUM_BF_UNSIGNED
*** 614,620 ****
  
  AC_CHECK_FUNCS(times clock dup2 kill getrlimit setrlimit atoll atoq \
  	sysconf strsignal putc_unlocked fputc_unlocked fputs_unlocked \
! 	fwrite_unlocked fprintf_unlocked getrusage nl_langinfo lstat)
  
  AC_CHECK_TYPE(ssize_t, int)
  
--- 614,621 ----
  
  AC_CHECK_FUNCS(times clock dup2 kill getrlimit setrlimit atoll atoq \
  	sysconf strsignal putc_unlocked fputc_unlocked fputs_unlocked \
! 	fwrite_unlocked fprintf_unlocked getrusage nl_langinfo lstat \
!         scandir alphasort)
  
  AC_CHECK_TYPE(ssize_t, int)
  
Index: gjavah.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/java/gjavah.c,v
retrieving revision 1.88
diff -c -p -r1.88 gjavah.c
*** gjavah.c	8 May 2002 04:51:50 -0000	1.88
--- gjavah.c	19 May 2002 16:57:21 -0000
*************** throwable_p (signature)
*** 1194,1201 ****
        JCF_FINISH (&jcf);
      }
  
-   if (current)
-     free (current);
    return result;
  }
  
--- 1194,1199 ----
Index: java-tree.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/java/java-tree.h,v
retrieving revision 1.149
diff -c -p -r1.149 java-tree.h
*** java-tree.h	7 May 2002 03:32:00 -0000	1.149
--- java-tree.h	19 May 2002 16:57:22 -0000
*************** extern tree *type_map;
*** 1440,1450 ****
     layout of a class.  */
  #define CLASS_BEING_LAIDOUT(TYPE) TYPE_LANG_FLAG_6 (TYPE)
  
- /* True if class TYPE is currently being laid out. Helps in detection
-    of inheritance cycle occurring as a side effect of performing the
-    layout of a class.  */
- #define CLASS_BEING_LAIDOUT(TYPE) TYPE_LANG_FLAG_6 (TYPE)
- 
  /* True if class TYPE has a field initializer finit$ function */
  #define CLASS_HAS_FINIT_P(TYPE) TYPE_FINIT_STMT_LIST (TYPE)
  
--- 1440,1445 ----
Index: jcf-io.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/java/jcf-io.c,v
retrieving revision 1.33
diff -c -p -r1.33 jcf-io.c
*** jcf-io.c	3 Dec 2001 19:13:40 -0000	1.33
--- jcf-io.c	19 May 2002 16:57:22 -0000
***************
*** 1,5 ****
  /* Utility routines for finding and reading Java(TM) .class files.
!    Copyright (C) 1996, 1997, 1998, 1999, 2000  Free Software Foundation, Inc.
  
  This program is free software; you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
--- 1,5 ----
  /* Utility routines for finding and reading Java(TM) .class files.
!    Copyright (C) 1996, 1997, 1998, 1999, 2000, 2002  Free Software Foundation, Inc.
  
  This program is free software; you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
*************** The Free Software Foundation is independ
*** 29,34 ****
--- 29,39 ----
  #include "tree.h"
  #include "toplev.h"
  #include "java-tree.h"
+ #include "hashtab.h"
+ #if JCF_USE_SCANDIR
+ #include <dirent.h>
+ #include <fnmatch.h>
+ #endif
  
  #include "zlib.h"
  
*************** DEFUN(find_classfile, (filename, jcf, de
*** 304,313 ****
  #endif
  }
  
  /* Returns a freshly malloc'd string with the fully qualified pathname
!    of the .class file for the class CLASSNAME.  Returns NULL on
!    failure.  If JCF != NULL, it is suitably initialized.
!    SOURCE_OK is true if we should also look for .java file. */
  
  const char *
  DEFUN(find_class, (classname, classname_length, jcf, source_ok),
--- 309,460 ----
  #endif
  }
  
+ #if JCF_USE_SCANDIR
+ 
+ /* A comparison function (as for qsort) that compares KEY (a char *
+    giving the basename of a file) with the name stored in ENTRY (a
+    dirent **).  */
+ 
+ static int
+ DEFUN(compare_path, (key, entry),
+       const void *key AND const void *entry)
+ {
+   return strcmp ((const char *) key, 
+ 		 (*((const struct dirent **) entry))->d_name);
+ }
+ 
+ /* Returns nonzero if ENTRY names a .java or .class file.  */
+ 
+ static int
+ DEFUN(java_or_class_file, (entry),
+       const struct dirent *entry)
+ {
+   const char *base = basename (entry->d_name);
+   return (fnmatch ("*.java", base, 0) == 0 || 
+ 	  fnmatch ("*.class", base, 0) == 0);
+ }
+ 
+ /* Information about the files present in a particular directory.  */
+ typedef struct memoized_dirlist_entry 
+ {
+   /* The name of the directory.  */
+   const char *dir;
+   /* The number of .java and .class files present, or -1 if we could
+      not, for some reason, obtain the list.  */
+   int num_files;
+   /* The .java and .class files in the directory, in alphabetical
+      order.  */
+   struct dirent **files;
+ } memoized_dirlist_entry;
+ 
+ /* Returns true if ENTRY (a memoized_dirlist_entry *) correponds to
+    the directory given by KEY (a char *) giving the directory 
+    name.  */
+ 
+ static int
+ DEFUN(memoized_dirlist_lookup_eq, (entry, key),
+       const void *entry AND const void *key)
+ {
+   return strcmp ((const char *) key,
+ 		 ((const memoized_dirlist_entry *) entry)->dir) == 0;
+ }
+ 
+ /* A hash table mapping directory names to the lists of .java and
+    .class files in that directory.  */
+ 
+ static htab_t memoized_dirlists;
+ 
+ #endif
+ 
+ /* Like stat, but avoids actually making the stat system call if we
+    know that it cannot succeed.  FILENAME and BUF are as for stat.  */
+ 
+ static int
+ DEFUN(caching_stat, (filename, buf),
+       char *filename AND struct stat *buf)
+ {
+ #if JCF_USE_SCANDIR
+   char *sep;
+   char *base;
+   memoized_dirlist_entry *dent;
+   void **slot;
+   
+   /* If the hashtable has not already been created, create it now.  */
+   if (!memoized_dirlists)
+     memoized_dirlists = htab_create (37,
+ 				     htab_hash_string,
+ 				     memoized_dirlist_lookup_eq,
+ 				     NULL);
+ 
+   /* Get the name of the directory.  */
+   sep = strrchr (filename, DIR_SEPARATOR);
+   if (sep)
+     {
+       *sep = '\0';
+       base = sep + 1;
+     }
+   else
+     base = filename;
+ 
+   /* Obtain the entry for this directory form the hash table.  */
+   slot = htab_find_slot (memoized_dirlists, filename, INSERT);
+   if (!*slot)
+     {
+       /* We have not already scanned this directory; scan it now.  */
+       dent = ((memoized_dirlist_entry *) 
+ 	      ALLOC (sizeof (memoized_dirlist_entry)));
+       dent->dir = xstrdup (filename);
+       /* Unfortunately, scandir is not fully standardized.  In
+ 	 particular, the type of the function pointer passed as the
+ 	 third argument sometimes takes a "const struct dirent *"
+ 	 parameter, and sometimes just a "struct dirent *".  We rely
+ 	 on the ability to interchange these two types of function
+ 	 pointers.  */
+       dent->num_files = scandir (filename, &dent->files, 
+ 				 java_or_class_file, 
+ 				 alphasort);
+       *slot = dent;
+     }
+   else
+     dent = *((memoized_dirlist_entry **) slot);
+ 
+   /* Put the spearator back.  */
+   if (sep)
+     *sep = DIR_SEPARATOR;
+ 
+   /* If the file is not in the list, there is no need to stat it; it
+      does not exist.  */
+   if (dent->num_files != -1
+       && !bsearch (base, dent->files, dent->num_files,
+ 		   sizeof (struct dirent *), compare_path))
+     return -1;
+ #endif
+   
+   return stat (filename, buf);
+ }
+ 
+ /* Returns 1 if the CLASSNAME (really a char *) matches the name
+    stored in TABLE_ENTRY (also a char *).  */
+ 
+ static int
+ DEFUN(memoized_class_lookup_eq, (table_entry, classname),
+       const void *table_entry AND const void *classname)
+ {
+   return strcmp ((const char *)classname, (const char *)table_entry) == 0;
+ }
+ 
+ /* A hash table keeping track of class names that were not found
+    during class lookup.  (There is no need to cache the values
+    associated with names that were found; they are saved in
+    IDENTIFIER_CLASS_VALUE.)  */
+ static htab_t memoized_class_lookups;
+ 
  /* Returns a freshly malloc'd string with the fully qualified pathname
!    of the .class file for the class CLASSNAME.  CLASSNAME must be
!    allocated in permanent storage; this function may retain a pointer
!    to it.  Returns NULL on failure.  If JCF != NULL, it is suitably
!    initialized.  SOURCE_OK is true if we should also look for .java
!    file. */
  
  const char *
  DEFUN(find_class, (classname, classname_length, jcf, source_ok),
*************** DEFUN(find_class, (classname, classname_
*** 324,334 ****
    char *dep_file;
    void *entry;
    char *java_buffer;
  
    /* Allocate and zero out the buffer, since we don't explicitly put a
       null pointer when we're copying it below.  */
!   int buflen = jcf_path_max_len () + classname_length + 10;
!   char *buffer = (char *) ALLOC (buflen);
    memset (buffer, 0, buflen);
  
    java_buffer = (char *) alloca (buflen);
--- 471,497 ----
    char *dep_file;
    void *entry;
    char *java_buffer;
+   int buflen;
+   char *buffer;
+   hashval_t hash;
+ 
+   /* Create the hash table, if it does not already exist.  */
+   if (!memoized_class_lookups)
+     memoized_class_lookups = htab_create (37, 
+ 					  htab_hash_string, 
+ 					  memoized_class_lookup_eq,
+ 					  NULL);
+ 
+   /* Loop for this class in the hashtable.  If it is present, we've
+      already looked for this class and failed to find it.  */
+   hash = htab_hash_string (classname);
+   if (htab_find_with_hash (memoized_class_lookups, classname, hash))
+     return NULL;
  
    /* Allocate and zero out the buffer, since we don't explicitly put a
       null pointer when we're copying it below.  */
!   buflen = jcf_path_max_len () + classname_length + 10;
!   buffer = (char *) ALLOC (buflen);
    memset (buffer, 0, buflen);
  
    java_buffer = (char *) alloca (buflen);
*************** DEFUN(find_class, (classname, classname_
*** 381,387 ****
  	      else
  		continue;
  	    }
! 	  class = stat (buffer, &class_buf);
  	}
  
        if (source_ok)
--- 544,550 ----
  	      else
  		continue;
  	    }
! 	  class = caching_stat(buffer, &class_buf);
  	}
  
        if (source_ok)
*************** DEFUN(find_class, (classname, classname_
*** 393,399 ****
  	  for (m = 0; m < classname_length; ++m)
  	    java_buffer[m + l] = (classname[m] == '.' ? '/' : classname[m]);
  	  strcpy (java_buffer + m + l, ".java");
! 	  java = stat (java_buffer, &java_buf);
  	  if (java == 0)
  	    break;
  	}
--- 556,562 ----
  	  for (m = 0; m < classname_length; ++m)
  	    java_buffer[m + l] = (classname[m] == '.' ? '/' : classname[m]);
  	  strcpy (java_buffer + m + l, ".java");
! 	  java = caching_stat (java_buffer, &java_buf);
  	  if (java == 0)
  	    break;
  	}
*************** DEFUN(find_class, (classname, classname_
*** 464,469 ****
--- 627,638 ----
  #endif
  
    free (buffer);
+ 
+   /* Remember that this class could not be found so that we do not
+      have to look again.  */
+   *htab_find_slot_with_hash (memoized_class_lookups, classname, hash, INSERT) 
+     = (void *) classname;
+ 
    return NULL;
   found:
  #if JCF_USE_STDIO
Index: jcf.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/java/jcf.h,v
retrieving revision 1.30
diff -c -p -r1.30 jcf.h
*** jcf.h	12 Apr 2002 14:28:49 -0000	1.30
--- jcf.h	19 May 2002 16:57:22 -0000
*************** The Free Software Foundation is independ
*** 63,68 ****
--- 63,76 ----
  #define JCF_word JCF_u4
  #endif
  
+ /* If we have both "scandir" and "alphasort", we can cache directory
+    listings to reduce the time taken to search the classpath.  */
+ #if defined(HAVE_SCANDIR) && defined(HAVE_ALPHASORT)
+ #define JCF_USE_SCANDIR 1
+ #else
+ #define JCF_USE_SCANDIR 0
+ #endif 
+ 
  struct JCF;
  typedef int (*jcf_filbuf_t) PARAMS ((struct JCF*, int needed));
  
Index: parse.y
===================================================================
RCS file: /cvs/gcc/gcc/gcc/java/parse.y,v
retrieving revision 1.377
diff -c -p -r1.377 parse.y
*** parse.y	7 May 2002 18:42:50 -0000	1.377
--- parse.y	19 May 2002 16:57:25 -0000
*************** java_expand_classes ()
*** 9048,9057 ****
    for (cur_ctxp = ctxp_for_generation; cur_ctxp; cur_ctxp = cur_ctxp->next)
      {
        tree current;
        ctxp = cur_ctxp;
!       for (current = ctxp->class_list; current; current = TREE_CHAIN (current))
  	{
! 	  current_class = TREE_TYPE (current);
  	  outgoing_cpool = TYPE_CPOOL (current_class);
  	  if (flag_emit_class_files)
  	    write_classfile (current_class);
--- 9048,9077 ----
    for (cur_ctxp = ctxp_for_generation; cur_ctxp; cur_ctxp = cur_ctxp->next)
      {
        tree current;
+       tree reversed_class_list = NULL;
+ 
        ctxp = cur_ctxp;
! 
!       /* We write out the classes in reverse order.  This ensures that
! 	 inner classes are written before their containing classes,
! 	 which is important for parallel builds.  Otherwise, the
! 	 class file for the outer class may be found, but the class
! 	 file for the inner class may not be present.  In that
! 	 situation, the compiler cannot fall back to the original
! 	 source, having already read the outer class, so we must
! 	 prevent that situation.  */
!       for (current = ctxp->class_list; 
! 	   current; 
! 	   current = TREE_CHAIN (current))
! 	reversed_class_list
! 	  = tree_cons (NULL_TREE, current, reversed_class_list);
!       ggc_add_tree_root (&reversed_class_list, 1);
! 
!       for (current = reversed_class_list; 
! 	   current; 
! 	   current = TREE_CHAIN (current))
  	{
! 	  current_class = TREE_TYPE (TREE_VALUE (current));
  	  outgoing_cpool = TYPE_CPOOL (current_class);
  	  if (flag_emit_class_files)
  	    write_classfile (current_class);
*************** java_expand_classes ()
*** 9063,9068 ****
--- 9083,9090 ----
  	      finish_class ();
  	    }
  	}
+ 
+       ggc_del_root (&reversed_class_list);
      }
  }
  



More information about the Java mailing list