This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: New attempt: subword-based DCE, PR42575

From: Bernd Schmidt <bernds at codesourcery dot com>
To: Eric Botcazou <ebotcazou at adacore dot com>
Cc: gcc-patches at gcc dot gnu dot org
Date: Thu, 29 Jul 2010 13:33:20 +0200
Subject: Re: New attempt: subword-based DCE, PR42575
References: <4C507E3C.7060900@codesourcery.com> <201007291014.04916.ebotcazou@adacore.com>

On 07/29/2010 10:14 AM, Eric Botcazou wrote:
> A figure is worth a thousand words when it comes to compilation time.  Did you 
> measure the impact on a bootstrap of the core compiler on x86 for example?

Previously I've only compiled things like crafty/evaluate.i or combine.i
and tried to make the pass show up on the profiles, and was satisfied
when I couldn't.

So, here are some times for a full bootstrap:

First attempt, with the new optimization:

real	15m22.247s user	98m30.661s sys	2m33.157s

Second attempt, optimization guarded by if (0):

real	14m1.564s user	99m3.789s sys	2m35.787s

Slightly shocked by the difference in real time (which, thinking about
it, can probably be explained by differences in OS scheduling for a -j20
run), here's the third run with the optimization reenabled:

real	13m55.250s user	98m55.346s sys	2m34.763s

Fourth run, again with if (0), just to make sure:

real	14m5.177s user	99m13.385s sys	2m36.373s

The machine was lightly loaded in all cases.  User time was lower in the
cases when the optimization was enabled, which probably shows the
numbers aren't meaningful (unless it found an opportunity to optimize
itself), but at least it suggests there isn't a problem.  As expected -
even with occasional HOST_WIDEST_INT usage in gcc source code, I don't
think the code is even run often enough to make a difference.

>> This triggers in several testcases other than the one in PR42575, for
>> example in crafty or the XFS filesystem (tested with an ARM compiler).
>> This demonstrates that it fixes a real-world, general problem that can
>> occur for more than one reason.
> 
> What reasons?

Consider anything that prevents lower-subreg from splitting a reg.
PR42575 is about multiplication, but it can vary between targets: not
all of them may even want to split DImode operations.  Then, in
combination, either user code may compute more than it needs (think a 64
bit shift of which only the lower half is used), or copy propagation may
cause a situation like in PR42575.  Or something else may happen
entirely.  Can you enumerate all the ways dead code can happen for
non-subregs and fix each in a way that does not involve running DCE?
I'm sure you'll agree that attempting this would be absurd.

Dead code happens, so DCE is a pretty normal optimization to run.  The
fact that we can't do it on a whole class of objects is weak.

Here's an example from crafty's attacks.c:

 (insn 391 79 80 5 (set (reg:DI 269) (const_int 0 [0]))

[... setting subreg 0 in various ways ...]

-(insn 85 83 89 5 (set (subreg:SI (reg:DI 269) 4)
-        (lshiftrt:SI (reg:SI 272 [ tree_5(D)->pos.occupied_rl45+4 ])
-            (reg:SI 266))) attacks.c:47 119 {*arm_shiftsi3}
-     (expr_list:REG_DEAD (reg:SI 272 [ tree_5(D)->pos.occupied_rl45+4 ])
-        (expr_list:REG_DEAD (reg:SI 266)
-            (nil))))
-

with the high-half subreg unused after that.  No multiplication in sight
so your patches wouldn't do anything.

That example will go away when I finally commit a lower-subreg patch
Jeff approved a while ago, but that's beside the point: we have a class
of such issues, and we can certainly fix some of them piecewise, but
that'll never eliminate the whole problem.  It's not terribly hard to
imagine insn 391 being a different DImode operation on a target where it
shouldn't be split.

Speaking of the latter, when we improve lower-subreg to decide whether
to split a DImode reg based on costs, leaving an unnecessary insn like
this in the RTL may even degrade its analysis.

> There are 2 ways to "fix" problems: analyzing the causes and 
> eliminating them, or treating the symptoms.  I'm fond of the former approach, 
> you're apparently more fond of the latter.

I'd describe it as I'm fond of fixing a general problem in a general way
rather than adding hacks for every symptom.

This reminds me that I was completely baffled by your response when I
suggested a generalization of one of your patches, which was "we aren't
trying to enhance lower-subreg, only to solve a problem related to
double-word multiplication".  Same difference in approaches, I was
trying to find a big-picture, general enhancement while you were focused
on just hacking up the code for a very specific single testcase.

>> Last time, Eric posted counter-patches which tried hackish ways of
>> tweaking the generated RTL so that the problem is hidden for the PR42575
>> testcase; both of these made code generation worse on average and failed
>> to fix the more general issue of dead stores into subwords.
> 
> Well, I first asked you to investigate why lower-subreg + DCE couldn't achieve 
> what you were looking for and whether this could be fixed.  You apparently 
> didn't try very hard so I had to; early patches may indeed resemble hacks.

It's not about trying, IMO it's about understanding the big picture.
Once I realized there's dead code in the RTL (caused by copy-propagation
which is probably typical even when subwords aren't involved) it was
obvious to me that we can't do DCE on subwords, and this is a general
problem that it is desirable to fix.  Doctoring around the symptoms of a
particular instance of the problem is something that did not occur to me.

As for effort - for reference, below is the lower-subreg patch I
experimented with (and intend to finish, eventually) after you suggested
your second patch.  You can judge for yourself whether that constitutes
trying very hard.

> We already run 2 lower-subreg passes and 2+ DCE passes so we should be able to 
> eliminate dead assignments to subwords of multiword pseudo-registers with a 
> few efforts.  Instead we now have a new DCE pass and we still don't know what
> it would have taken to enhance existing lower-subreg and DCE passes.

Well, we can _never_ know what it would take to enhance existing code,
unless you manage to enumerate every way dead code can appear in the
RTL.  Your approach can never fix the problem entirely, since it would
only add one band-aid for every instance of the problem we find.

I don't disagree that we should be doing these kinds of fixes if we find
they make the compiler better (again, see the patch below), but it can
never fix the _class_ of problems.

Bernd

Attachment: ls-dest2.diff
Description: Text document

References:
- New attempt: subword-based DCE, PR42575
  - From: Bernd Schmidt
- Re: New attempt: subword-based DCE, PR42575
  - From: Eric Botcazou

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]