This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/36222] x86 fails to optimize out __v4si -> __m128i move



------- Comment #8 from uweigand at gcc dot gnu dot org  2008-05-18 15:58 -------
That special case in find_reloads is really about a different situation.
We do not have a simple move here.

The problem also is not really related to vector instruction in particular;
reload doesn't at all care what the instructions actually do ...

There are two problems involved in this particular case, each of which
suffices to prevent optimal register allocation.

Before reload, we have

(insn:HI 10 27 11 2 d.c:7 (set (reg:V2SI 66)
        (vec_concat:V2SI (mem/c/i:SI (reg/f:SI 16 argp) [2 x1+0 S4 A32])
            (reg/v:SI 60 [ x2 ]))) 1338 {*vec_concatv2si_sse2}
(expr_list:REG_DEAD (reg/v:SI 60 [ x2 ])
        (nil)))

where local register allocation has already selected hard registers
for *both* operands 0 and 2:

;; Register 60 in 21.
;; Register 66 in 21.

Now, the insn pattern offers those alternatives:

(define_insn "*vec_concatv2si_sse2"
  [(set (match_operand:V2SI 0 "register_operand"     "=x,x ,*y,*y")
        (vec_concat:V2SI
          (match_operand:SI 1 "nonimmediate_operand" " 0,rm, 0,rm")
          (match_operand:SI 2 "reg_or_0_operand"     " x,C ,*y, C")))]

As operand 2 is not zero ("C" constraint), reload must choose
the first alternative, which has a matching constraint between
operands 0 and 1.

This means that the choices selected by local-alloc (both operand
0 and 2 in the same register) *force* a reload of operand 0 here.

Why does local-alloc choose the same register for the two operands?
This happens because in general, there is no conflict between a
register that is used and set in the same insn, because usually the
same hard register *can* be used for both.

In this case this is not true, but local-alloc does not recognize
this.  There is indeed code in block_alloc that tries to handle
matching constraints, but this only recognizes the more typical
scenario where *every* alternative requires a match.

Here, we seemingly have alternatives that do not require a match
-- of course, that doesn't help because in these alternatives
operand 2 is extremely constrained ("C" will only accept a constant
zero) and so they aren't actually usable ...


Even assuming local-alloc had made better choices, reload would still
generate an output reload.  This second problem really comes down to
use of matching constraints between operands of different modes / sizes.

Once operand 0 was assigned to a hard register by local-alloc, reload
would generally attempt to also assign operand 1 to the same register,
in order to fulfill the matching constraint without requiring an
output reload.

This is done by the routine find_dummy_reload.  However, in this
particular case, that routine immediately fails due to:

  /* If operands exceed a word, we can't use either of them
     unless they have the same size.  */
  if (GET_MODE_SIZE (outmode) != GET_MODE_SIZE (inmode)
      && (GET_MODE_SIZE (outmode) > UNITS_PER_WORD
          || GET_MODE_SIZE (inmode) > UNITS_PER_WORD))
    return 0;

because operand 0 is two words in size, while operand 1 is just
a single word in size.  I'm not completely sure this check (which
has been in SVN forever) is still required today ...


In any case, the simplest work-around might be to write that pattern
in a way that is easier to handle by local-alloc / reload:  the two
cases x <- 0, x  and x <- rm, C are nearly completely unrelated; why
not split them into two different insn patterns?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36222


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]