This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Extending -flto parallel feature to the rest of the build


Hello-

I recently started using -flto in my builds, it's a very impressive
feature, thanks very much for adding it. One thing that occurred to me
while switching over to using it: In an LTO world, the object files,
it seems to me, are becoming increasingly less relevant, at least for
some applications. Since you are already committing to the build
taking a long time, in return for the run-time performance benefit, it
makes sense in a lot of cases to go whole-hog and just compile
everything every time anyway. This comes with a lot of advantages,
besides fewer large files laying around, it simplifies things a lot,
say I don't need to worry about accidentally linking in an object file
compiled differently vs the rest (different -march, different
compiler, etc.), since I am just rebuilding from scratch every time.
In my use case, I do such things a lot, and find it very freeing to
know I don't need to worry about any state from a previous build.

In any case, the above was some justification for why I think the
following feature would be appreciated and used by others as well.
It's perhaps a little surprising, or at least disappointing, that
this:

g++ -flto=jobserver *.o

will be parallelized, but this:

g++ -flto=jobserver *.cpp

will effectively not be; each .cpp is compiled serially, then the LTO
runs in parallel, but in many cases the first step dominates the build
time. Now it's clear why things are done this way, if the user wants
to parallelize the compile, they are free to do so by just naming each
object as a separate target in their Makefile and running a parallel
make. But this takes some effort to set up, especially if you want to
take care to remove the intermediate .o files automatically, and since
-flto has already opened the door to gcc providing parallelization
features, it seems like it would be nice to enable parallelizing more
generally, for all parts of the build that could benefit from it.

I took a stab at implementing this. The below patch adds an option
-fparallel=(jobserver|N) that works analogously to -flto=, but applies
to the whole build. It generates a Makefile from each spec, with
appropriate dependencies, and then runs make to execute it. The
combination -fparallel=X -flto will also be parallelized on the lto
side as well, as if -flto=jobserver were specified; the idea would be
any downstream tool that could naturally offer parallel features would
do so in the presence of the -fparallel switch.

I am sure this must be very rough around the edges, it's my first-ever
look at the gcc codebase, but I tried not to make it overly
restrictive. I only really have experience with Linux and C++ so I may
have inadvertently specialized something to these cases, but I did try
to keep it general. Here is a list of potential issues that could be
addressed:

-For some jobs there are environment variables set on a per-job basis.
I attempted to identify all of them and came up with COMPILER_PATH,
LIBRARY_PATH, and COLLECT_GCC_OPTIONS. This would need to be kept up
to date if others are added.

-The mechanism I used to propagate environment variables (export +
unset) is probably specific to the Bourne shell and wouldn't work on
other platforms, but there would be some simple platform-specific code
to do it right for Windows and others.

-Similarly for -pipe mode, I put pipes into the Makefile recipe, so
there may be platforms where this is not the correct syntax.

Anyway, here it is, in case there is any interest to pursue it
further. Thanks for listening...

-Lewis

=============

diff --git gcc/common.opt gcc/common.opt
index 3b8b14d..4417847 100644
--- gcc/common.opt
+++ gcc/common.opt
@@ -1575,6 +1575,10 @@ flto=
 Common RejectNegative Joined Var(flag_lto)
 Link-time optimization with number of parallel jobs or jobserver.

+fparallel=
+Common Driver RejectNegative Joined Var(flag_parallel)
+Enable parallel build with number of parallel jobs or jobserver.
+
 Enum
 Name(lto_partition_model) Type(enum lto_partition_model)
UnknownError(unknown LTO partitioning model %qs)

diff --git gcc/gcc.c gcc/gcc.c
index a5408a4..6f9c1cd 100644
--- gcc/gcc.c
+++ gcc/gcc.c
@@ -1716,6 +1716,73 @@ static int have_c = 0;
 /* Was the option -o passed.  */
 static int have_o = 0;

+/* Parallel mode  */
+static int parallel = 0;
+static int parallel_ctr = 0;
+static int parallel_sctr = 0;
+static enum {
+  parallel_mode_off,
+  parallel_mode_first_job_in_spec,
+  parallel_mode_continued_spec
+} parallel_mode = parallel_mode_off;
+static bool jobserver = false;
+static FILE* mstream = NULL;
+static const char* makefile = NULL;
+
+/* helper to turn $ -> $$ for make and
+   maybe escape single quotes for the shell. */
+static void
+mstream_escape_puts (const char* string, bool single_quote)
+{
+  if (single_quote)
+    fputc ('\'', mstream);
+  for (; *string; string++)
+    {
+      if (*string == '$')
+        fputs ("$$", mstream);
+      else if (single_quote && *string == '\'')
+        fputs ("\'\\\'\'", mstream);
+      else
+        fputc (*string, mstream);
+    }
+  if (single_quote)
+    fputc ('\'', mstream);
+}
+
+/* In parallel mode, if environment variables are changing for each job,
+   then we need to store them in the makefile. */
+static void
+propagate_environment_to_makefile ()
+{
+  static const char *const vars[] = {
+    "COMPILER_PATH",
+    LIBRARY_PATH_ENV,
+    "COLLECT_GCC_OPTIONS",
+  };
+  unsigned int i;
+  for (i = 0; i < sizeof(vars)/sizeof(*vars); i++)
+    {
+      const char *const v = vars[i];
+      const char *const val = getenv(v);
+      fprintf (mstream, "job%d: __environment", parallel_ctr);
+      fputs (i ? "+=" : "=", mstream);
+      if (val == NULL)
+        {
+          fputs ("unset ", mstream);
+          mstream_escape_puts (v, false);
+        }
+      else
+        {
+          mstream_escape_puts (v, false);
+          fputc ('=', mstream);
+          mstream_escape_puts (val, true);
+          fputs ("; export ", mstream);
+          mstream_escape_puts (v, false);
+        }
+      fputs (";\n", mstream);
+    }
+}
+
 /* Pointer to output file name passed in with -o. */
 static const char *output_file = 0;

@@ -1727,6 +1794,7 @@ static struct temp_name {
   const char *suffix;  /* suffix associated with the code.  */
   int length;          /* strlen (suffix).  */
   int unique;          /* Indicates whether %g or %u/%U was used.  */
+  int parallel_sctr;    /* which parallel spec was it for. */
   const char *filename;        /* associated filename.  */
   int filename_length; /* strlen (filename).  */
   struct temp_name *next;
@@ -2831,6 +2899,39 @@ execute (void)
     }
 #endif

+  /* In parallel mode, just update the Makefile and return.  */
+  if (parallel_mode != parallel_mode_off)
+    {
+      parallel_ctr++;
+      fprintf (mstream,
+        ".PHONY: job%d\n" "all: job%d\n",
+        parallel_ctr, parallel_ctr);
+      propagate_environment_to_makefile ();
+      fprintf (mstream, "job%d:", parallel_ctr);
+      if (parallel_mode == parallel_mode_first_job_in_spec)
+        parallel_mode = parallel_mode_continued_spec;
+      else
+        fprintf (mstream, " job%d", parallel_ctr - 1);
+      fputs ("\n\t@+$(__environment)", mstream);
+      /* TODO: if -pipe is in effect, probably this only works on
unix-like systems? */
+      for (i = 0; i < n_commands; i++)
+        {
+          if (i)
+            fputs(" |", mstream);
+          const char** arg;
+          for (arg = commands[i].argv; *arg != NULL; arg++)
+            {
+              fputc (' ', mstream);
+              mstream_escape_puts (*arg, true);
+            }
+          if (commands[i].argv[0] != commands[i].prog)
+            free (CONST_CAST (char*, commands[i].argv[0]));
+        }
+      fputc ('\n', mstream);
+      execution_count++;
+      return 0;
+    }
+
   /* Run each piped subprocess.  */

   pex = pex_init (PEX_USE_PIPES | ((report_times || report_times_to_file)
@@ -3843,6 +3944,24 @@ driver_handle_option (struct gcc_options *opts,
       handle_foffload_option (arg);
       break;

+    case OPT_fparallel_:
+      if (strcmp (arg, "jobserver") == 0)
+        {
+          jobserver = true;
+          parallel = 1;
+        }
+      else
+        {
+          parallel = atoi(arg);
+          if (parallel <= 1)
+            parallel = 0;
+        }
+      /* Downstream tools need jobserver mode since
+         they will be called from our Makefile.  */
+      if (parallel)
+        save_switch ("-fparallel=jobserver", 0, NULL, true, true);
+      return true;
+
     default:
       /* Various driver options need no special processing at this
         point, having been handled in a prescan above or being
@@ -4510,6 +4629,12 @@ do_spec (const char *spec)
 {
   int value;

+  if (parallel)
+    {
+      parallel_mode = parallel_mode_first_job_in_spec;
+      parallel_sctr++;
+    }
+
   value = do_spec_2 (spec);

   /* Force out any unfinished command.
@@ -4526,6 +4651,8 @@ do_spec (const char *spec)
        value = execute ();
     }

+  parallel_mode = parallel_mode_off;
+
   return value;
 }

@@ -5135,7 +5262,8 @@ do_spec_1 (const char *spec, int inswitch, const
char *soft_matched_part)
                for (t = temp_names; t; t = t->next)
                  if (t->length == suffix_length
                      && strncmp (t->suffix, suffix, suffix_length) == 0
-                     && t->unique == (c == 'u' || c == 'U' || c == 'j'))
+                     && t->unique == (c == 'u' || c == 'U' || c == 'j')
+                     && t->parallel_sctr == parallel_sctr)
                    break;

                /* Make a new association if needed.  %u and %j
@@ -5161,6 +5289,7 @@ do_spec_1 (const char *spec, int inswitch, const
char *soft_matched_part)
                    temp_filename_length = strlen (temp_filename);
                    t->filename = temp_filename;
                    t->filename_length = temp_filename_length;
+                   t->parallel_sctr = parallel_sctr;
                  }

                free (saved_suffix);
@@ -6869,6 +6998,7 @@ class driver
   bool prepare_infiles ();
   void do_spec_on_infiles () const;
   void maybe_run_linker (const char *argv0) const;
+  void maybe_run_make () const;
   void final_actions () const;
   int get_exit_code () const;

@@ -6918,6 +7048,7 @@ driver::main (int argc, char **argv)

   do_spec_on_infiles ();
   maybe_run_linker (argv[0]);
+  maybe_run_make ();
   final_actions ();
   return get_exit_code ();
 }
@@ -7624,6 +7755,17 @@ driver::prepare_infiles ()
   if (!combine_inputs && have_c && have_o && lang_n_infiles > 1)
     fatal_error ("cannot specify -o with -c, -S or -E with multiple files");

+  /* Check if we are using a makefile to implement parallel mode.  */
+  if (parallel)
+    {
+      makefile = make_temp_file (".mk");
+      record_temp_file (makefile, 1, 0);
+      mstream = fopen (makefile, "w");
+      if (mstream == NULL)
+        fatal_error ("failed to open temporary Makefile %s",
+                     makefile);
+    }
+
   /* No early exit needed from main; we can continue.  */
   return false;
 }
@@ -7863,6 +8005,75 @@ driver::maybe_run_linker (const char *argv0) const
          && !(infiles[i].language && infiles[i].language[0] == '*'))
        warning (0, "%s: linker input file unused because linking not done",
                 outfiles[i]);
+
+  /* in parallel mode, add the dependencies for the final link.  */
+  if (parallel_ctr > 1 && linker_was_run)
+    {
+      int j;
+      fprintf (mstream, "job%d:", parallel_ctr);
+      for (j = 1; j < parallel_ctr; j++)
+        fprintf (mstream, " job%d", j);
+      putc('\n', mstream);
+    }
+}
+
+/* in parallel mode, actually do the build now.  */
+void
+driver::maybe_run_make() const
+{
+  char jobs[32];
+  const char *new_argv[6];
+  const char *errmsg;
+  int err = 0;
+  int status = 0;
+
+  if (!parallel) return;
+
+  if (ferror (mstream) != 0
+      || fclose (mstream) != 0)
+    fatal_error ("error writing to Makefile %s", makefile);
+
+  if (!jobserver)
+    {
+      /* Avoid passing --jobserver-fd= and similar flags
+         unless jobserver mode is explicitly enabled.  */
+      putenv (xstrdup ("MAKEFLAGS="));
+      putenv (xstrdup ("MFLAGS="));
+    }
+
+  new_argv[0] = getenv ("MAKE");
+  if (!new_argv[0])
+    new_argv[0] = "make";
+  new_argv[1] = "-f";
+  new_argv[2] = makefile;
+  if (!jobserver)
+    {
+      snprintf (jobs, 31, "-j%d", parallel);
+      new_argv[3] = jobs;
+    }
+  else
+    new_argv[3] = "-j";
+  new_argv[4] = "all";
+  new_argv[5] = NULL;
+
+  errmsg = pex_one (PEX_SEARCH,
+                    new_argv[0],
+                    CONST_CAST(char *const*, new_argv),
+                    new_argv[0],
+                    NULL, NULL, &status, &err);
+  if (errmsg != NULL)
+    {
+      if (err == 0)
+        fatal_error (errmsg);
+      else
+        {
+          errno = err;
+          pfatal_with_name (errmsg);
+        }
+    }
+  if (WIFSIGNALED (status)
+      || (WIFEXITED (status) && WEXITSTATUS (status) >= MIN_FATAL_STATUS))
+    errorcount++;
 }

 /* The end of "main".  */
diff --git gcc/lto-wrapper.c gcc/lto-wrapper.c
index f75c0dc..ce50269 100644
--- gcc/lto-wrapper.c
+++ gcc/lto-wrapper.c
@@ -982,6 +982,9 @@ run_gcc (unsigned argc, char *argv[])
            no_partition = true;
          break;

+       case OPT_fparallel_:
+         /* Fallthru.  */
+
        case OPT_flto_:
          if (strcmp (option->arg, "jobserver") == 0)
            {
@@ -1239,7 +1242,11 @@ cont:
            {
              fprintf (mstream, "%s:\n\t@%s ", output_name, new_argv[0]);
              for (j = 1; new_argv[j] != NULL; ++j)
-               fprintf (mstream, " '%s'", new_argv[j]);
+                /* don't propagate parallel when we call gcc again,
+                   it is wasteful since we are only giving it
+                   one file.  */
+                if (strcmp (new_argv[j], "-fparallel=jobserver") != 0)
+                  fprintf (mstream, " '%s'", new_argv[j]);
              fprintf (mstream, "\n");
              /* If we are not preserving the ltrans input files then
                 truncate them as soon as we have processed it.  This


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]