Bug 31587 - Module files shouldn't be updated if their content doesn't change
Summary: Module files shouldn't be updated if their content doesn't change
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: 4.3.0
: P3 trivial
Target Milestone: 4.3.0
Assignee: Francois-Xavier Coudert
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords: patch
Depends on:
Blocks:
 
Reported: 2007-04-16 10:17 UTC by Francois-Xavier Coudert
Modified: 2007-04-24 22:38 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2007-04-16 10:21:35


Attachments
Patch that allows for module to be overwritten only if they changed (1.79 KB, patch)
2007-04-17 07:52 UTC, Francois-Xavier Coudert
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Francois-Xavier Coudert 2007-04-16 10:17:56 UTC
If we recompile a file that outputs the same .mod files, they shouldn't get updated: it makes dependency analysis useless and probably makes parallel compiling (make -jN) less efficient.

$ cat a.f90
module foo
  integer, parameter :: bar = 42
end module foo
$ gfortran -c a.f90
$ stat foo.mod | grep Modify
Modify: 2007-04-16 11:15:54.000000000 +0200
$ gfortran -c a.f90         
$ stat foo.mod | grep Modify
Modify: 2007-04-16 11:15:58.000000000 +0200
Comment 1 Tobias Schlüter 2007-04-16 13:28:17 UTC
An easy solution that I thought about implementing in the past would be to put a checksum into the file header.  First the module file would be written to a temporary file.  This file would be checksummed and only moved to the final location if the checksum is different from that of the extant file.
Comment 2 Tobias Schlüter 2007-04-16 13:31:48 UTC
Given your other PR, another solution that comes to mind is to include the filename and modification date of the source file in the module file, and to compare these before writing a module file.  This would be more prone to inconsistencies than the approach from comment #1, as it wouldn't actually be verified that the .mod file remains unchanged.
Comment 3 Francois-Xavier Coudert 2007-04-16 13:33:02 UTC
Better than my current idea for that, which was to compare line after line the old and new files. What cheap hash function should we use? MD5?
Comment 4 Francois-Xavier Coudert 2007-04-16 13:34:19 UTC
(In reply to comment #2)
> include the
> filename and modification date of the source file in the module file, and to
> compare these before writing a module file

I think that defeats the purpose: if the source file changed but the module file didn't change (ie the module interface is the same), we don't need to recompile the files that depend on the module, and the module file shouldn't be rewritten.
Comment 5 Tobias Schlüter 2007-04-16 13:35:16 UTC
(In reply to comment #3)
> Better than my current idea for that, which was to compare line after line the
> old and new files. What cheap hash function should we use? MD5?

Probably, as it is included in libiberty, and is good enough for our purposes.

And please disregard my second idea as it wouldn't help in any way :-)

Comment 6 Francois-Xavier Coudert 2007-04-16 23:15:13 UTC
OK, I've researched the libiberty md5.c interface a bit and here's the first half of a patch :)  It computes the MD5 sum of the module file (except the first 3 lines) and it writes it at the top, like this:

GFORTRAN module created from test_mod.f90 on Tue Apr 17 00:12:12 2007
MD5:18a257e13c90e3872b7b9400c2fc6e4b -- If you edit this, you'll get what you deserve.

I checked that it gives the same result as the md5sum command on my linux box :)


Index: module.c
===================================================================
--- module.c    (revision 123388)
+++ module.c    (working copy)
@@ -72,6 +72,7 @@
 #include "arith.h"
 #include "match.h"
 #include "parse.h" /* FIXME */
+#include "md5.h"
 
 #define MODULE_EXTENSION ".mod"
 
@@ -170,6 +171,9 @@
 /* The FILE for the module we're reading or writing.  */
 static FILE *module_fp;
 
+/* MD5 context structure.  */
+static struct md5_ctx ctx;
+
 /* The name of the module we're reading (USE'ing) or writing.  */
 static char module_name[GFC_MAX_SYMBOL_LEN + 1];
 
@@ -1275,6 +1279,9 @@
   if (fputc (out, module_fp) == EOF)
     gfc_fatal_error ("Error writing modules file: %s", strerror (errno));
 
+  /* Add this to our MD5.  */
+  md5_process_bytes (&out, sizeof (out), &ctx);
+  
   if (out != '\n')
     module_column++;
   else
@@ -3926,7 +3933,10 @@
   int n;
   char *filename, *p;
   time_t now;
+  fpos_t md5_pos;
+  unsigned char md5_new[16]; /*, md5_old[16]; */
 
+
   n = strlen (name) + strlen (MODULE_EXTENSION) + 1;
   if (gfc_option.module_dir != NULL)
     {
@@ -3959,8 +3969,14 @@
 
   fprintf (module_fp, "GFORTRAN module created from %s on %s\n", 
           gfc_source_file, p);
-  fputs ("If you edit this, you'll get what you deserve.\n\n", module_fp);
+  fputs ("MD5:", module_fp);
+  fgetpos (module_fp, &md5_pos);
+  fputs ("00000000000000000000000000000000 -- "
+        "If you edit this, you'll get what you deserve.\n\n", module_fp);
 
+  /* Initialize the MD5 context that will be used for output.  */
+  md5_init_ctx (&ctx);
+  
   iomode = IO_OUTPUT;
   strcpy (module_name, name);
 
@@ -3973,6 +3989,11 @@
 
   write_char ('\n');
 
+  md5_finish_ctx (&ctx, md5_new);
+  fsetpos (module_fp, &md5_pos);
+  for (n = 0; n < 16; n++)
+    fprintf (module_fp, "%02x", md5_new[n]);
+
   if (fclose (module_fp))
     gfc_fatal_error ("Error writing module file '%s' for writing: %s",
                     filename, strerror (errno));
Comment 7 Francois-Xavier Coudert 2007-04-17 07:52:08 UTC
Created attachment 13376 [details]
Patch that allows for module to be overwritten only if they changed

This is the complete patch. Have fun!
Comment 8 Tobias Schlüter 2007-04-17 13:20:22 UTC
The patch looks good, though it would probably be a better idea to use tmpnam() to get the name for the temporary file.

A further thing one could do: instead of threatening "If you edit this, you'll get what you deserve.\n\n", one could actually verify that the MD5 sum matches the files contents.
Comment 9 Tobias Schlüter 2007-04-17 13:22:56 UTC
Oh, one more issue: do you have an idea how to write testcases for this?  I'm a bit at a loss, though I've only thought about this for a few minutes.
Comment 10 Francois-Xavier Coudert 2007-04-17 13:33:06 UTC
(In reply to comment #8)
> The patch looks good, though it would probably be a better idea to use tmpnam()
> to get the name for the temporary file.

Why not. But I like the idea that it is predictable :)

> A further thing one could do: instead of threatening "If you edit this, you'll
> get what you deserve.\n\n", one could actually verify that the MD5 sum matches
> the files contents.

Too expensive! I'm against this additional cost (plus the developer cost of writing and maintaining it).

About testcases, you mean for the dejagnu testsuite? I have really no idea...
Comment 11 Tobias Schlüter 2007-04-17 13:41:39 UTC
(In reply to comment #10)
> (In reply to comment #8)
> > The patch looks good, though it would probably be a better idea to use tmpnam()
> > to get the name for the temporary file.
> 
> Why not. But I like the idea that it is predictable :)

Given that it's going to be deleted, it's not that much of a gain :)  People always do stuff one wouldn't expect, so having filenames that could not possibly collide with user files is a good thing.

> > A further thing one could do: instead of threatening "If you edit this, you'll
> > get what you deserve.\n\n", one could actually verify that the MD5 sum matches
> > the files contents.
> 
> Too expensive! I'm against this additional cost (plus the developer cost of
> writing and maintaining it).

I agree that this is probably not worth it.

> About testcases, you mean for the dejagnu testsuite? I have really no idea...

Time to CC Janis?
Comment 12 Francois-Xavier Coudert 2007-04-20 22:13:29 UTC
(In reply to comment #11)
> Time to CC Janis?

No need. There's nothing a bit of trial-and-error can't help you write :)

The following adds the necessary dejagnu directive, and uses it in a new test. I guess the MD5 sum should be the same on all platforms, so it's a good test, although it means the test will need updating every time someone changes the format of module files :)


Index: lib/gcc-dg.exp
===================================================================
--- lib/gcc-dg.exp      (revision 123942)
+++ lib/gcc-dg.exp      (working copy)
@@ -455,6 +455,24 @@
     }
 }
 
+# Scan Fortran modules for a given regexp.
+#
+# Argument 0 is the module name
+# Argument 1 is the regexp to match
+proc scan-module { args } {
+    set modfilename [string tolower [lindex $args 0]].mod
+    set fd [open $modfilename r]
+    set text [read $fd]
+    close $fd
+
+    upvar 2 name testcase
+    if [regexp -- [lindex $args 1] $text] {
+      pass "$testcase scan-module [lindex $args 1]"
+    } else {
+      fail "$testcase scan-module [lindex $args 1]"
+    }
+}
+
 # Verify that the compiler output file exists, invoked via dg-final.
 proc output-exists { args } {
     # Process an optional target or xfail list.
Index: gfortran.dg/module_md5_1.f90
===================================================================
--- gfortran.dg/module_md5_1.f90        (revision 0)
+++ gfortran.dg/module_md5_1.f90        (revision 0)
@@ -0,0 +1,14 @@
+! Check that we can write a module file, that it has a correct MD5 sum,
+! and that we can read it back.
+!
+! { dg-do compile }
+module foo
+  integer(kind=4), parameter :: pi = 3_4
+end module foo
+
+program test
+  use foo
+  print *, pi
+end program test
+! { dg-final { scan-module "foo" "MD5:18a257e13c90e3872b7b9400c2fc6e4b" } }
+! { dg-final { cleanup-modules "foo" } }
Comment 13 Francois-Xavier Coudert 2007-04-24 22:37:47 UTC
Subject: Bug 31587

Author: fxcoudert
Date: Tue Apr 24 22:37:37 2007
New Revision: 124126

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=124126
Log:
	PR fortran/31587

	* lib/gcc-dg.exp (scan-module): New function.
	* gfortran.dg/module_md5_1.f90: New test.

	* module.c (write_char): Add character to the MD5 buffer.
	(read_md5_from_module_file): New function.
	(gfc_dump_module): Compute MD5 for new module file. Call
	read_md5_from_module_file. Only overwrite old module file
	if the new MD5 is different.

Added:
    trunk/gcc/testsuite/gfortran.dg/module_md5_1.f90
Modified:
    trunk/gcc/fortran/ChangeLog
    trunk/gcc/fortran/module.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/lib/gcc-dg.exp

Comment 14 Francois-Xavier Coudert 2007-04-24 22:38:37 UTC
Fixed on mainline.