[PATCH] PR target/68991: Add vector_memory_operand and "Bm" constraint

Sat Jan 2 10:32:00 GMT 2016

On Thu, Dec 31, 2015 at 4:29 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Dec 31, 2015 at 1:14 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Wed, Dec 30, 2015 at 9:53 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> SSE vector arithmetic and logic instructions only accept aligned memory
>>> operand.  This patch adds vector_memory_operand and "Bm" constraint for
>>> aligned SSE memory operand.  They are applied to SSE any_logic patterns.
>>>
>>> OK for trunk and release branches if there are regressions?
>>
>> This patch is just papering over deeper problem, as Jakub said in the PR [1]:
>>
>> --q--
>> GCC uses the ix86_legitimate_combined_insn target hook to disallow
>> misaligned memory into certain SSE instructions.
>> (subreg:V4SI (reg:TI 245 [ MEM[(const struct bitset &)FeatureEntry_21 + 8] ]) 0)
>> is not misaligned memory, it is a subreg of a pseudo register, so it is fine.
>> If the replacement of the pseudo register with memory happens in some
>> other pass, then it probably either should use the
>> legitimate_combined_insn target hook or some other one.  I think we
>> have already a PR where that happens during live range shrinking.
>> --/q--
>>
>> Please figure out where memory replacement happens. There are several
>> other SSE insns (please grep the .md for "ssememalign" attribute) that
>> are affected by this problem, so fixing a couple of patterns won't
>> solve the problem completely.
>
> LRA turns
>
> insn 64 63 108 6 (set (reg:V4SI 148 [ vect__28.85 ])
>         (xor:V4SI (reg:V4SI 149)
>             (subreg:V4SI (reg:TI 147 [ MEM[(const struct bitset
> &)FeatureEntry_2(D)] ]) 0))) foo.ii:26 3454 {*xorv4si3}
>      (expr_list:REG_DEAD (reg:V4SI 149)
>         (expr_list:REG_DEAD (reg:TI 147 [ MEM[(const struct bitset
> &)FeatureEntry_2(D)] ])
>             (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20 frame)
>                         (const_int -16 [0xfffffffffffffff0])) [3
> MEM[(unsigned int *)&D.2851]+0 S16 A128])
>                 (nil)))))
>
> into
>
> (insn 64 63 108 6 (set (reg:V4SI 21 xmm0 [orig:148 vect__28.85 ] [148])
>         (xor:V4SI (reg:V4SI 21 xmm0 [149])
>             (mem:V4SI (reg/v/f:DI 4 si [orig:117 FeatureEntry ] [117])
> [6 MEM[(const struct bitset &)FeatureEntry_2(D)]+0 S16 A32])))
> foo.ii:26 3454 {*xorv4si3}
>      (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20 frame)
>                 (const_int -16 [0xfffffffffffffff0])) [3 MEM[(unsigned
> int *)&D.2851]+0 S16 A128])
>         (nil)))
>
> since
>
> (mem:V4SI (reg/v/f:DI 4 si [orig:117 FeatureEntry ] [117]) [6
> MEM[(const struct bitset &)FeatureEntry_2(D)]+0 S16 A32])))
>
> satisfies the 'm" constraint.  I don't think LRA should call
> ix86_legitimate_combined_insn to validate to validate constraints on
> an instruction.

Hm...

if LRA desn't assume that generic "m" constraint implies at least
natural alignment of propageted operand, then your patch is the way to
go. But in this case, *every* SSE vector memory constraint should be
changed to Bm.

Let's ask Vladimir for details.

Uros.