This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch, fortran] PR 25708 Reduce seeks when loading module files


On Wednesday 30 November 2011 18:32:11 Janne Blomqvist wrote:
> Hi,
> 
> this patch expands a bit on the recent work done by Thomas Koenig.
> Using aermod.f90 from the polyhedron benchmark suite as a test case,
> the lseek() calls as reported by strace -c -f go roughly as
> 
> - trunk before Thomas' patch: 21 million
> 
> - current trunk: 5.7 million
> 
> - with attached patch: 2.7 million
> 
> As can be seen, this patch roughly halves the seeks. Of course, 2.7
> million is still a ridiculously high number, but further work requires
> different kind of changes, perhaps also a bit riskier, which is why
> I'd like to get this in separately.
> 
> Regtested on x86_64-unknown-linux-gnu, Ok for trunk?
> 
> 2011-11-30  Janne Blomqvist  <jb@gcc.gnu.org>
> 
> 	PR fortran/25708
> 	* module.c (parse_string): Read string into resizable array
> 	instead of parsing twice and seeking.
> 	(verify_atoms): New function.
> 	(require_atom): Move checking to verify_atoms, call it.
> 	(mio_typespec): Don't peek.
> 	(mio_constructor): Likewise.
> 	(mio_typebound_proc): Likewise.
> 	(mio_full_typebound_tree): Likewise.
> 	(mio_f2k_derived): Likewise.
> 	(load_operator_interfaces): Likewise.
> 	(load_generic_interfaces): Likewise.
> 	(load_commons): Likewise.
> 	(load_equiv): Likewise.
> 	(load_derived_extensions): Likewise.
> 	(read_module): Likewise.

Hi, 

> 
> modseek1.diff
>   diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
> index 70f8565..982425d 100644
> --- a/gcc/fortran/module.c
> +++ b/gcc/fortran/module.c
> @@ -1069,51 +1069,49 @@ module_unget_char (void)
>  static void
>  parse_string (void)
>  {
> -  module_locus start;
> -  int len, c;
> -  char *p;
> -
> -  get_module_locus (&start);
> +  int c;
> +  size_t cursz = 30;
> +  size_t len = 0;
>  
> -  len = 0;
> +  atom_string = XNEWVEC (char, cursz);
>  
> -  /* See how long the string is.  */
>    for ( ; ; )
>      {
>        c = module_char ();
> -      if (c == EOF)
> -       bad_module ("Unexpected end of module in string constant");
>  
> -      if (c != '\'')
> +      if (c == '\'')
>         {
> -         len++;
> -         continue;
> +         int c2 = module_char ();
> +         if (c2 == '\'')
> +           {
> +             if (len + 1 >= cursz)
> +               {
> +                 cursz *= 2;
> +                 atom_string = XRESIZEVEC (char, atom_string, cursz);
> +               }
> +             atom_string[len] = c;
> +             len++;
> +             atom_string[len] = c2;
> +             len++;

I think you are changing the behaviour here. Single quotes in a string are 
encoded as two single quotes in a row. So when decoding, only one of them 
should be kept.
Honestly, it may well be dead/unused code, but it's something we don't want to 
change at this point.

[...]

> @@ -1293,22 +1291,15 @@ peek_atom (void)
>  }
>  
>  
> -/* Read the next atom from the input, requiring that it be a
> -   particular kind.  */
> +/* Verify that two atoms are equal, fatal error otherwise.  */
>  
>  static void
> -require_atom (atom_type type)
> +verify_atoms (atom_type got, atom_type expected)
>  {
> -  module_locus m;
> -  atom_type t;
>    const char *p;
> -
> -  get_module_locus (&m);
> -
> -  t = parse_atom ();
> -  if (t != type)
> +  if (got != expected)
>      {
> -      switch (type)
> +      switch (expected)
>         {
>         case ATOM_NAME:
>           p = _("Expected name");
> @@ -1329,12 +1320,24 @@ require_atom (atom_type type)
>           gfc_internal_error ("require_atom(): bad atom type required");
>         }
>  
> -      set_module_locus (&m);
>        bad_module (p);

I'm certainly nitpicking here, but for me it's better to have the file 
position before the unexpected atom than after it when throwing the error in 
bad_module. 
To save the call to fgetpos, (through get_module_locus) one could save/restore 
module_column and module_line as the call to bad_module doesn't need anything 
else.


About the rest of the patch, it's unfortunate that it makes the code slightly 
more difficult to read [this is my nitpicking day]. I think it's better to 
optimize peek_atom: the first character is sufficient to distinguish between 
atom types, so ungetc/module_unget_char can be used, even for strings, names 
or numbers.

Mikael


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]