Bug 39543 - [4.4/4.5 Regression] Reload failure on mplayer from SVN
Summary: [4.4/4.5 Regression] Reload failure on mplayer from SVN
Alias: None
Product: gcc
Classification: Unclassified
Component: inline-asm (show other bugs)
Version: 4.4.0
: P2 normal
Target Milestone: 4.4.1
Assignee: Not yet assigned to anyone
Keywords: ice-on-valid-code
: 39990 40064 (view as bug list)
Depends on:
Reported: 2009-03-24 11:25 UTC by Jakub Jelinek
Modified: 2009-05-07 22:18 UTC (History)
5 users (show)

See Also:
Target: i686-linux
Known to work:
Known to fail:
Last reconfirmed: 2009-03-25 16:49:20

gcc44-pr39543.patch (2.30 KB, patch)
2009-03-25 15:49 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2009-03-24 11:25:23 UTC
The following testcase errors while reloading the asms with -O3 -m32 or
-O3 -m32 -DOMIT_FRAME_POINTER -fomit-frame-pointer since r139993 (before that
it worked even with IRA, after that -fno-ira cured it while we still had it).
I agree mplayer pushes the limits a little bit too far by having so many "m" and "=m" constraints, on the other side e.g. in baz when it is inlined none of the asms actually need any registers to reload them, all the mems are actually s[012345] + some small constant, so on i386 all of them can be expanded as s0+16 and similar.  In the *.optimized dump they are still constants, wonder why during expand we decide to force all the MEM addresses into registers.
Comment 1 Jakub Jelinek 2009-03-24 11:27:54 UTC
Ah, forgot the testcase:

float __attribute__ ((aligned (16))) s0[128];
const float s1 = 0.707;
float s2[8] __attribute__ ((aligned (16)));
float s3[8] __attribute__ ((aligned (16)));
float s4[16] __attribute__ ((aligned (16)));
float s5[16] __attribute__ ((aligned (16)));

foo (int k, float *x, float *y, const float *d, const float *z)
  float *a, *b, *c, *e;

  a = x + 2 * k;
  b = a + 2 * k;
  c = b + 2 * k;
  e = y + 2 * k;
  __asm__ volatile (""
    : "=m" (x[0]), "=m" (b[0]), "=m" (a[0]), "=m" (c[0])
    : "m" (y[0]), "m" (y[k * 2]), "m" (x[0]), "m" (a[0])
    : "memory");
  for (;;)
      __asm__ volatile (""
: "m" (y[2]), "m" (d[2]), "m" (e[2]), "m" (z[2])
                        : "memory");
      if (!--k)
  __asm__ volatile (""
    : "=m" (x[2]), "=m" (x[10]), "=m" (x[6]), "=m" (x[14])
    : "m" (y[2]), "m" (y[6]), "m" (x[2]), "m" (x[6]), "m" (s1)
    : "memory");

bar (float *a)
  foo (4, a, a + 16, s2, s3);
  foo (8, a, a + 32, s4, s5);

baz (void)
  bar (s0);
Comment 2 Jakub Jelinek 2009-03-24 12:54:02 UTC
The reason for forcing the MEM addresses in "m"/"=m" into register is:
  /* By passing constant addresses through registers
     we get a chance to cse them.  */
  if (! cse_not_expected && CONSTANT_P (x) && CONSTANT_ADDRESS_P (x))
    x = force_reg (Pmode, x);
in memory_address.  On a simple s0[0] = 0; s0[4] = 0; s0[8] = 0; s0[12] = 0;
the addresses are propagated back into the MEMs during fwprop1 (or during combine for -fno-forward-propagate), but this for some reason doesn't happen for the asm operands.
Either we need to disable this kind of forcing into register for (constant) memory operands, or better find out and fix why neither fwprop nor combiner merges the memory operands back.
Comment 3 Jakub Jelinek 2009-03-24 14:18:09 UTC
Ok, so the reason why fwprop doesn't propagate this is:

forward_propagate_and_simplify doing:
  rtx use_set = single_set (use_insn);
  if (!use_set)
    return false;

ASM_OPERANDS with multiple output regs obviously isn't single_set, so nothing is propagated.  Paolo, any reason to restrict this, or at least could forward_propagate_and_simplify specially propagate also to memory addresses in asms?

The reason why combiner doesn't do anything is that when the pseudos initialized to constants are used multiple times (not dead on the asm insn), try_combine
sets added_sets_2 and as the insn is a PARALLEL, appends the added set to that, making it invalid (asm_noperands (body) returns -1).

So, to fix this bug, either we can teach fwprop to do this kind of propagation (my preference), or e.g. temporarily set cse_not_expected in expand_asm_operands.  I can easily test the latter as a fallback variant if the former is deemed too unsafe for 4.4.
Comment 4 Paolo Bonzini 2009-03-24 15:03:46 UTC
The only thing to be careful is to have set_reg_equal == FALSE if the insn has multiple sets, and find which set in a multiple-set insn is actually referring to USE (a combination of note_stores and loc_mentioned_in_p will do).  Such a patch is definitely backportable to 4.4 even if it is not ready for 4.4.0.

However, I'm not sure it is good to have the best possible addressing mode in an asm, because I wonder if this could cause other kinds of reload failure.  Think of:

   char p[10];
   a = b * 4 + c;
   asm ("" : : "m" (p[a]))

Which fwprop would likely change to a base/index/displacement address, with 2 registers gone.

Setting cse_not_expected in practice would not use an addressing mode that is more complicated than the one specified by the user; for an asm, I think this is actually the least surprising behavior.
Comment 5 Jakub Jelinek 2009-03-25 13:10:32 UTC
That can be solved easily, just compare whether new_rtx doesn't need more registers than old_rtx and only propagate into asm_noperands () >= 0 insns those that require the same number or fewer registers.

Consider e.g.:

int s[128];

f1 (void)
  int i;
  asm volatile ("# %0 %1 %2 %3 %4 %5 %6 %7 %8 %9 %10 %11 %12 %13 %14 %15 %16 %17"
: "=r" (i)
: "m" (s[0]), "m" (s[2]), "m" (s[4]), "m" (s[6]), "m" (s[8]),
  "m" (s[10]), "m" (s[12]), "m" (s[14]), "m" (s[16]), "m" (s[18]),
  "m" (s[20]), "m" (s[22]), "m" (s[24]), "m" (s[26]), "m" (s[28]),
  "m" (s[30]), "m" (s[32]));
  asm volatile ("# %0 %1 %2 %3 %4 %5 %6 %7 %8 %9 %10 %11 %12 %13 %14 %15 %16 %17"
: "=r" (i)
: "m" (s[0]), "m" (s[2]), "m" (s[4]), "m" (s[6]), "m" (s[8]),
  "m" (s[10]), "m" (s[12]), "m" (s[14]), "m" (s[16]), "m" (s[18]),
  "m" (s[20]), "m" (s[22]), "m" (s[24]), "m" (s[26]), "m" (s[28]),
  "m" (s[30]), "m" (s[32]));

f2 (int *q)
  int i;
  int *p = q + 32;
  asm volatile ("# %0 %1 %2 %3 %4 %5 %6 %7 %8 %9 %10 %11 %12 %13 %14 %15 %16 %17"
: "=r" (i)
: "m" (p[0]), "m" (p[2]), "m" (p[4]), "m" (p[6]), "m" (p[8]),
  "m" (p[10]), "m" (p[12]), "m" (p[14]), "m" (p[16]), "m" (p[18]),
  "m" (p[20]), "m" (p[22]), "m" (p[24]), "m" (p[26]), "m" (p[28]),
  "m" (p[30]), "m" (p[32]));
  asm volatile ("# %0 %1 %2 %3 %4 %5 %6 %7 %8 %9 %10 %11 %12 %13 %14 %15 %16 %17"
: "=r" (i)
: "m" (p[0]), "m" (p[2]), "m" (p[4]), "m" (p[6]), "m" (p[8]),
  "m" (p[10]), "m" (p[12]), "m" (p[14]), "m" (p[16]), "m" (p[18]),
  "m" (p[20]), "m" (p[22]), "m" (p[24]), "m" (p[26]), "m" (p[28]),
  "m" (p[30]), "m" (p[32]));

at -O2, here cse_not_expected hack in stmt.c doesn't help, but I'd say fwprop could fix it up.
Comment 6 Jakub Jelinek 2009-03-25 15:49:36 UTC
Created attachment 17540 [details]

Untested patch that cures all the 3 testcases by forward propagation into asm operands, as long as the propagation doesn't increase the number of needed registers.
As I need to update a bunch of locations at once, I can't use
try_fwprop_subst.  There are 2 things that try_fwprop_subst does that I'm not doing, one is the rtx_cost (SET_SRC (set), SET, speed) > old_cost check
(I wonder whether it is needed for asms at all and if yes, how to actually do it) and the update_df stuff (I'm never adding notes for asms, but for successful
change I don't know how to inform df about the changes, if I need to do that.
Paolo, any ideas?
Comment 7 Paolo Bonzini 2009-03-25 16:49:19 UTC
> one is the rtx_cost (SET_SRC (set), SET, speed) > old_cost check
> (I wonder whether it is needed for asms at all and if yes, how to actually do
> it)

For addresses it is already done in should_replace_address.  But for everything else, there could be problems if one uses

  x = 0x123456789abcdef;

and fwprop propagates x into the asm.  If they used a "r" constraint (correctly), reload probably will fix everything, but it may cause pessimizations or I don't know what.  Sorry for the handwaving---the reason I am a bit weary about propagating into asms is that we are quite careful about it on the tree level.

pr39543-3.c crashes on f1 with optimization, but crashes on f2 without optimization.  Which means that for f2, cse_not_expected does not work because gimplification effectively has already done the same as force_reg.  In my opinion that's the root cause: we worry about making asms complex, but we're cavalier in making them simpler (hoping that something later restores the complexity).

For 4.5, maybe SSA expansion fixes this?  (pr39543-3.c in Jakub's patch).  If so, modifying fwprop is probably not the best thing to do.
Comment 8 Jakub Jelinek 2009-03-25 18:31:58 UTC
fwprop won't propagate a constant into a "r" constrained asm operand, because
verify_changes (0) fails in that case (it calls check_asm_operands, which verifies the constraints of all the operands).
Comment 9 Paolo Bonzini 2009-03-26 08:47:46 UTC
That's good to know.  As I said, I'm okay with your patch.  I just wondered if for 4.5 the cse_not_expected trick fixes it.  In that case I'd have no problem applying the patch to 4.4, but I would revert it after SSA expansion goes in.
Comment 10 Michael Matz 2009-03-26 13:41:56 UTC
I must be missing something, as I don't see the connection to SSA expansion
(or non-SSA expansion).  AFAIU the issue is (and that often was indeed a
problem in the past with mplayer) that some transformation "needlessly"
uses a temporary instead of the expression directly in an operand (here
happening to be in an asm).  I'm not sure why expanding from SSA form
directly would somehow change that (after all fwprop could still do that).

If the gimplifier already allocated a temporary then the expansion (if from
SSA or non-SSA) has to expand that as pseudo.

There are special occasions were the expander could care to fixup such
asms, of course.  E.g. doing TER for the expressions in question,
but that can be done with or without SSA form.
Comment 11 Paolo Bonzini 2009-04-14 13:52:59 UTC
> I don't see the connection to SSA expansion

Well, now that you posted the patch my suppositions were partly true.  The important thing is that now you have a way to get an SSA_NAME's RHS into expansion.  So, if we can do a kind of "mini-TER" of values appearing into an ASM with the offending constraints, this would fix the bug even at -O0 or -fno-tree-ter.
Comment 12 Michael Matz 2009-04-14 13:58:58 UTC
That's only possible sometimes and hence can't be relied upon.  The RHS
expression might not be available at the point of the use anymore (some
operands might have been clobbered meanwhile, remember we aren't in SSA
anymore).  That's one of the reasons why TER doesn't forward all expressions
even if they're used only once.

To really fix this for good it needs a separate pass tracking lifetime (and
introducing copies in case the operands aren't available) or working on SSA
form (where lifetime isn't a problem and copies would be inserted by
Comment 13 Paolo Bonzini 2009-04-14 14:02:11 UTC
Subject: Re:  [4.4/4.5 Regression] Reload failure 
	on mplayer from SVN

> That's one of the reasons why TER doesn't forward all expressions even if they're used only once.

Yes -- indeed in this case we'd need forwarding of things used more
than once.  The infrastructure to inject definitions from an SSA_NAME
was all that I was interested in.
Comment 14 Jakub Jelinek 2009-04-26 18:56:30 UTC
Subject: Bug 39543

Author: jakub
Date: Sun Apr 26 18:56:14 2009
New Revision: 146813

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=146813
	PR inline-asm/39543
	* fwprop.c (forward_propagate_asm): New function.
	(forward_propagate_and_simplify): Propagate also into __asm, if it
	doesn't increase the number of referenced registers.

	* gcc.target/i386/pr39543-1.c: New test.
	* gcc.target/i386/pr39543-2.c: New test.
	* gcc.target/i386/pr39543-3.c: New test.


Comment 15 Jakub Jelinek 2009-04-26 18:58:16 UTC
Subject: Bug 39543

Author: jakub
Date: Sun Apr 26 18:58:04 2009
New Revision: 146814

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=146814
	PR inline-asm/39543
	* fwprop.c (forward_propagate_asm): New function.
	(forward_propagate_and_simplify): Propagate also into __asm, if it
	doesn't increase the number of referenced registers.

	* gcc.target/i386/pr39543-1.c: New test.
	* gcc.target/i386/pr39543-2.c: New test.
	* gcc.target/i386/pr39543-3.c: New test.

      - copied unchanged from r146813, trunk/gcc/testsuite/gcc.target/i386/pr39543-1.c
      - copied unchanged from r146813, trunk/gcc/testsuite/gcc.target/i386/pr39543-2.c
      - copied unchanged from r146813, trunk/gcc/testsuite/gcc.target/i386/pr39543-3.c

Comment 16 Jakub Jelinek 2009-04-26 19:09:27 UTC
Comment 17 Andrew Pinski 2009-05-01 13:15:07 UTC
*** Bug 39990 has been marked as a duplicate of this bug. ***
Comment 18 Andrew Pinski 2009-05-07 22:18:44 UTC
*** Bug 40064 has been marked as a duplicate of this bug. ***