This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [Bug middle-end/15855] [3.4/4.0 Regression] g++ crash with -O2and -O3 on input file

From: Jeffrey A Law <law at redhat dot com>
To: gcc-patches at gcc dot gnu dot org
Cc: gcc-bugzilla at gcc dot gnu dot org
Date: Sat, 27 Nov 2004 00:49:42 -0700
Subject: Re: [Bug middle-end/15855] [3.4/4.0 Regression] g++ crash with -O2and -O3 on input file
Organization: Red Hat, Inc
Reply-to: law at redhat dot com

It's always amazing to see how such simple oversights can result in such
a dramatic difference in the code we generate and to a smaller extent
our compile-time performance.


For this PR we actually spend considerable time compiling the the static
initialization and destruction routine.  Yes, that's right...

The C++ front-end presents us with code like this:



<<< Unknown tree: if_stmt
  __priority == 65535 && __initialize_p == 1
  <<cleanup_point <<< Unknown tree: expr_stmt
  __comp_ctor  (&__ioinit) >>>
>>
   >>>
;
<<< Unknown tree: if_stmt
  __priority == 65535 && __initialize_p == 1
  <<cleanup_point <<< Unknown tree: expr_stmt
  __comp_ctor  (&phylum_info, 131, (const char *) "trans_unit", 3, 0, 1,
0, 0, 0, 0, 1, 1) >>>
>>
   >>>
;

Which repeats over and over and over (around a thousand times).  The
if conditions remain the same, but the actions within the IF statement
change.

We gimplify that into:

  if (__priority == 65535)
    {
      if (__initialize_p == 1)
        {
          __comp_ctor  (&__ioinit);
        }
      else
        {

        }
    }
  else
    {

    }
  if (__priority == 65535)
    {
      if (__initialize_p == 1)
        {
          __comp_ctor  (&phylum_info, 131, &"trans_unit"[0], 3, 0, 1, 0,
0, 0, 0, 1, 1);
        }
      else
        {

        }
    }
 


[ ... ]

It doesn't take a rocket scientist to realize that we've got a lot of
redundant tests in this code and it really should look something like

  if (__priority == 65535)
    if (__initialize == 1)
      {
         action1;
         action2;
          ...
         actionN;
      }

When I looked at the DOM1 dump file I was rather annoyed to find that
while it successfully threaded away all the __priority tests, but
left in all the __initialize tests.  Ugh.  That can't be good. 

I was pleasantly surprised to see that one iteration of DOM was
sufficient to do all the threading of the __priority tests, that's
good from a compile-time performance standpoint.  What I was surprised
to find was that DOM1 did not iterate!  Thus it didn't thread all
the __initialize tests until DOM2.

cleanup_tree_cfg didn't find any control statements to remove, 
unreachable blocks or jumps to thread.  So it returned false.
It did however merge roughly a thousand blocks.  But we do
not propagate that to the callers of cleanup_tree_cfg.

Which is the root of the problem.  DOM's jump threader doesn't
look through multiple blocks.  So while block merging won't
expose new control flow cleanups, unreachable blocks or
jump threads for cleanup_tree_cfg, it may expose new jump
threading opportunities for DOM's jump threader.

Fixing this little oversight resulted in DOM1 threading all
the conditional in the target function leaving us with optimal
code in a total of 4 basic blocks.

And the best news of all, this _improves_ compile time performance
for this testcase (by about a percent).

Bootstrapped and regression tested on i686-pc-linux-gnu.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]