68991 – -O3 generates misaligned xorv4si3

Bug 68991 - -O3 generates misaligned xorv4si3

Summary: -O3 generates misaligned xorv4si3

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	5.3.1

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:	69299
	Show dependency tree / graph

Reported:	2015-12-18 23:25 UTC by H.J. Lu
Modified:	2016-01-15 14:46 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:	x86
Build:
Known to work:	6.0
Known to fail:	5.3.1
Last reconfirmed:	2015-12-18 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description H.J. Lu 2015-12-18 23:25:34 UTC

When compiling llvm 3.8 at -O3 for x32, GCC 5.3.1 turns

(insn 194 193 195 24 (set (reg:V4SI 246 [ vect__45.575 ])
        (xor:V4SI (mem/c:V4SI (plus:SI (reg/f:SI 20 frame)
                    (const_int -32 [0xffffffffffffffe0])) [14 MEM[(long unsigned int *)&D.120283]+0 S16 A128])
            (reg:V4SI 247))) /usr/include/c++/5.3.1/bitset:163 3434 {*xorv4si3}
     (expr_list:REG_EQUAL (not:V4SI (mem/c:V4SI (plus:SI (reg/f:SI 20 frame)
                    (const_int -32 [0xffffffffffffffe0])) [14 MEM[(long unsigned int *)&D.120283]+0 S16 A128]))
        (nil)))

into

(insn 194 193 439 22 (set (reg:V4SI 246 [ vect__45.575 ])
        (xor:V4SI (reg:V4SI 326)
            (reg:V4SI 247))) /usr/include/c++/5.3.1/bitset:163 3434 {*xorv4si3}
     (expr_list:REG_DEAD (reg:V4SI 326)
        (expr_list:REG_DEAD (reg:V4SI 247)
            (expr_list:REG_EQUAL (not:V4SI (mem/c:V4SI (plus:SI (reg/f:SI 20 frame)
                            (const_int -32 [0xffffffffffffffe0])) [14 MEM[(long unsigned int *)&D.120283]+0 S16 A128]))
                (nil)))))

Combine generates

(insn 194 193 439 22 (set (reg:V4SI 246 [ vect__45.575 ])
        (xor:V4SI (reg:V4SI 247)
            (subreg:V4SI (reg:TI 245 [ MEM[(const struct bitset &)FeatureEntry_21 + 8] ]) 0))) /usr/include/c++/5.3.1/bitset:163 3434 {*xorv4si3}
     (expr_list:REG_DEAD (reg:TI 245 [ MEM[(const struct bitset &)FeatureEntry_21 + 8] ])
        (expr_list:REG_DEAD (reg:V4SI 247) 
            (nil))))

But memory is aligned at 4 bytes:

(insn 194 193 439 22 (set (reg:V4SI 21 xmm0 [orig:246 vect__45.575 ] [246])
        (xor:V4SI (reg:V4SI 21 xmm0 [247])
            (mem:V4SI (plus:SI (reg/v/f:SI 3 bx [orig:88 FeatureEntry ] [88])
                    (const_int 8 [0x8])) [12 MEM[(const struct bitset &)FeatureEntry_21 + 8]+0 S16 A32]))) /usr/include/c++/5.3.1/bitset:163 3434 {*xorv4si3}
     (nil))

Combine with subreg over vector memory, which may be misaligned,
is only valid for AVX, not for SSE.

Comment 1 H.J. Lu 2015-12-18 23:50:08 UTC

You can't combine

(insn 438 191 193 22 (set (reg:V4SI 326)
        (subreg:V4SI (reg:TI 245 [ MEM[(const struct bitset &)FeatureEntry_21 + 8] ]) 0)) /usr/include/c++/5.3.1/bitset:1139 -1
     (expr_list:REG_DEAD (reg:TI 245 [ MEM[(const struct bitset &)FeatureEntry_21 + 8] ])
        (nil)))

(insn 194 193 439 22 (set (reg:V4SI 246 [ vect__45.575 ])
        (xor:V4SI (reg:V4SI 326)
            (reg:V4SI 247))) /usr/include/c++/5.3.1/bitset:163 3433 {*xorv4si3}
     (expr_list:REG_DEAD (reg:V4SI 326)
        (expr_list:REG_DEAD (reg:V4SI 247)
            (expr_list:REG_EQUAL (not:V4SI (mem/c:V4SI (plus:SI (reg/f:SI 20 frame)
                            (const_int -32 [0xffffffffffffffe0])) [14 MEM[(long unsigned int *)&D.120283]+0 S16 A128]))
                (nil)))))

into

(insn 194 193 439 22 (set (reg:V4SI 246 [ vect__45.575 ])
        (xor:V4SI (reg:V4SI 247)
            (subreg:V4SI (reg:TI 245 [ MEM[(const struct bitset &)FeatureEntry_21 + 8] ]) 0))) /usr/include/c++/5.3.1/bitset:163 3433 {*xorv4si3}
     (expr_list:REG_DEAD (reg:TI 245 [ MEM[(const struct bitset &)FeatureEntry_21 + 8] ])
        (expr_list:REG_DEAD (reg:V4SI 247)
            (nil))))

for SSE.

Comment 2 H.J. Lu 2015-12-18 23:56:50 UTC

This may be a latent bug.  SSE patterns like

(define_expand "<code><mode>3<mask_name>"
  [(set (match_operand:VF_128_256 0 "register_operand")
       (any_logic:VF_128_256
         (match_operand:VF_128_256 1 "nonimmediate_operand")
         (match_operand:VF_128_256 2 "nonimmediate_operand")))]
  "TARGET_SSE && <mask_avx512vl_condition>"
  "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")

can't use nonimmediate_operand.

Comment 3 H.J. Lu 2015-12-19 00:29:59 UTC

Here is the misaligned source:

(subreg:V4SI (unspec:V16QI [
            (mem:V16QI (subreg/s/v:SI (reg/v/f:DI 219 [ Bits ]) 0) [14 MEM[(long unsigned int *)Bits_12(D)]+0 S16 A32])
        ] UNSPEC_LOADU) 0)

Comment 4 H.J. Lu 2015-12-19 00:54:36 UTC

Also

subreg:V4SI (reg:TI 247 [ MEM[(const struct bitset &)FeatureEntry_115 + 8] ]) 0)

Comment 5 Jakub Jelinek 2015-12-19 11:39:43 UTC

You haven't provided preprocessed source, nor exact command line options.
GCC uses the ix86_legitimate_combined_insn target hook to disallow misaligned memory into certain SSE instructions.
(subreg:V4SI (reg:TI 245 [ MEM[(const struct bitset &)FeatureEntry_21 + 8] ]) 0)
is not misaligned memory, it is a subreg of a pseudo register, so it is fine.
If the replacement of the pseudo register with memory happens in some other pass, then it probably either should use the legitimate_combined_insn target hook or some other one.  I think we have already a PR where that happens during live range shrinking.

Comment 6 H.J. Lu 2015-12-19 14:59:38 UTC

I don't have a small testcase.  I am testing

https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=a93baf9afe49e059c9a7746608bdf7403fbaca42

to see if it fixes my problem.

Comment 7 H.J. Lu 2015-12-19 17:44:53 UTC

[hjl@gnu-tools-1 pr68991]$ cat x.cc
#include <bitset>
#include <string>

typedef std::string StringRef;

template<typename T>
class ArrayRef {
public:
    typedef const T *iterator;
    typedef size_t size_type;

private:
    const T *Data;
    size_type Length;

public:
    iterator begin() const { return Data; }
    iterator end() const { return Data + Length; }
};

const unsigned MAX_SUBTARGET_FEATURES = 128;
class FeatureBitset : public std::bitset<MAX_SUBTARGET_FEATURES> {
public:

    FeatureBitset() : bitset() {}
    FeatureBitset(const bitset<MAX_SUBTARGET_FEATURES>& B) : bitset(B) {}
};

struct SubtargetFeatureKV {
  const char *Key;
  FeatureBitset Value;
  FeatureBitset Implies;
  bool operator<(StringRef S) const;
};

struct SubtargetInfoKV {
  const char *Key;
  const void *Value;
};
class SubtargetFeatures {
public:
    FeatureBitset ToggleFeature(FeatureBitset Bits, StringRef String,
				ArrayRef<SubtargetFeatureKV> FeatureTable);
};

static inline bool hasFlag(StringRef Feature) {
  char Ch = Feature[0];
  return Ch == '+' || Ch =='-';
}

static inline std::string StripFlag(StringRef Feature) {
  return Feature;
}

static const SubtargetFeatureKV *Find(StringRef S,
				      ArrayRef<SubtargetFeatureKV> A) {
  auto F = std::lower_bound(A.begin(), A.end(), S);
  return F;
}

static
void ClearImpliedBits(FeatureBitset &Bits,
		      const SubtargetFeatureKV *FeatureEntry,
		      ArrayRef<SubtargetFeatureKV> FeatureTable) {
  for (auto &FE : FeatureTable) {
    if (FeatureEntry->Value == FE.Value) continue;

    if ((FE.Implies & FeatureEntry->Value).any()) {
      Bits &= ~FE.Value;
      ClearImpliedBits(Bits, &FE, FeatureTable);
    }
  }
}

FeatureBitset
SubtargetFeatures::ToggleFeature(FeatureBitset Bits, StringRef Feature,
				 ArrayRef<SubtargetFeatureKV> FeatureTable) {
  const SubtargetFeatureKV *FeatureEntry =
    Find(StripFlag(Feature), FeatureTable);
  if (FeatureEntry) {
    if ((Bits & FeatureEntry->Value) == FeatureEntry->Value) {
      Bits &= ~FeatureEntry->Value;
      ClearImpliedBits(Bits, FeatureEntry, FeatureTable);
    }
  }
  return Bits;
}
[hjl@gnu-tools-1 pr68991]$ gcc -O3 -S x.cc -std=c++11  -mx32 -da 

LRA turns

(insn 364 363 575 49 (set (reg:V4SI 288 [ vect__46.140 ])
        (xor:V4SI (reg:V4SI 289) 
            (subreg:V4SI (reg:TI 287 [ MEM[(const struct bitset &)A_56 + 4] ]) 0))) /usr/include/c++/5.3.1/bitset:163 3433 {*xorv4si3}
     (expr_list:REG_DEAD (reg:V4SI 289) 
        (expr_list:REG_DEAD (reg:TI 287 [ MEM[(const struct bitset &)A_56 + 4] ])
            (nil))))

into

(insn 364 363 575 49 (set (reg:V4SI 21 xmm0 [orig:288 vect__46.140 ] [288])
        (xor:V4SI (reg:V4SI 21 xmm0 [289])
            (mem:V4SI (plus:SI (reg/f:SI 44 r15 [orig:118 A ] [118])
                    (const_int 4 [0x4])) [9 MEM[(const struct bitset &)A_56 + 4]+0 S16 A32]))) /usr/include/c++/5.3.1/bitset:163 3433 {*xorv4si3}
     (nil))

Comment 8 H.J. Lu 2015-12-31 15:46:52 UTC

Another testcase:

[hjl@gnu-tools-1 pr68991]$ cat add.cc
typedef unsigned int size_type;

#define _GLIBCXX_BITSET_BITS_PER_WORD  (__CHAR_BIT__ * __SIZEOF_INT__)
#define _GLIBCXX_BITSET_WORDS(__n) \
  ((__n) / _GLIBCXX_BITSET_BITS_PER_WORD + \
   ((__n) % _GLIBCXX_BITSET_BITS_PER_WORD == 0 ? 0 : 1))

namespace std
{
  template<size_type _Nw>
    struct _Base_bitset
    {
      typedef unsigned int _WordT;
      _WordT 		_M_w[_Nw];

      _WordT&
      _M_hiword()
      { return _M_w[_Nw - 1]; }

      void
      _M_do_and(const _Base_bitset<_Nw>& __x)
      {
	for (size_type __i = 0; __i < _Nw; __i++)
	  _M_w[__i] += __x._M_w[__i];
      }

      void
      _M_do_flip()
      {
	for (size_type __i = 0; __i < _Nw; __i++)
	  _M_w[__i] = ~_M_w[__i];
      }

      bool
      _M_is_equal(const _Base_bitset<_Nw>& __x) const
      {
	for (size_type __i = 0; __i < _Nw; ++__i)
	  if (_M_w[__i] != __x._M_w[__i])
	    return false;
	return true;
      }

      bool
      _M_is_any() const
      {
	for (size_type __i = 0; __i < _Nw; __i++)
	  if (_M_w[__i] != static_cast<_WordT>(0))
	    return true;
	return false;
      }
    };

  template<size_type _Extrabits>
    struct _Sanitize
    {
      typedef unsigned int _WordT;

      static void
      _S_do_sanitize(_WordT& __val)
      { __val &= ~((~static_cast<_WordT>(0)) << _Extrabits); }
    };

  template<size_type _Nb>
    class bitset
    : private _Base_bitset<_GLIBCXX_BITSET_WORDS(_Nb)>
    {
    private:
      typedef _Base_bitset<_GLIBCXX_BITSET_WORDS(_Nb)> _Base;
      typedef unsigned int _WordT;

      void
      _M_do_sanitize()
      { 
	typedef _Sanitize<_Nb % _GLIBCXX_BITSET_BITS_PER_WORD> __sanitize_type;
	__sanitize_type::_S_do_sanitize(this->_M_hiword());
      }

    public:
      class reference
      {
	friend class bitset;

	_WordT*	_M_wp;
	size_type 	_M_bpos;
	
      public:
	reference&
	flip()
	{
	  *_M_wp ^= _Base::_S_maskbit(_M_bpos);
	  return *this;
	}
      };

      bitset<_Nb>&
      operator&=(const bitset<_Nb>& __rhs)
      {
	this->_M_do_and(__rhs);
	return *this;
      }

      bitset<_Nb>&
      flip() 
      {
	this->_M_do_flip();
	this->_M_do_sanitize();
	return *this;
      }
      
      bitset<_Nb>
      operator~() const 
      { return bitset<_Nb>(*this).flip(); }

      bool
      operator==(const bitset<_Nb>& __rhs) const 
      { return this->_M_is_equal(__rhs); }

      bool
      any() const 
      { return this->_M_is_any(); }
    };

  template<size_type _Nb>
    inline bitset<_Nb>
    operator&(const bitset<_Nb>& __x, const bitset<_Nb>& __y) 
    {
      bitset<_Nb> __result(__x);
      __result &= __y;
      return __result;
    }
}
template<typename T>
class ArrayRef {
public:
    typedef const T *iterator;

private:
    const T *Data;
    size_type Length;

public:
    iterator begin() const { return Data; }
    iterator end() const { return Data + Length; }
};

const unsigned MAX_SUBTARGET_FEATURES = 128;
class FeatureBitset : public std::bitset<MAX_SUBTARGET_FEATURES> {
};

struct SubtargetFeatureKV {
  FeatureBitset Value;
  FeatureBitset Implies;
};

struct SubtargetInfoKV {
  const void *Value;
};
class SubtargetFeatures {
public:
    FeatureBitset ToggleFeature(FeatureBitset Bits,
				const SubtargetFeatureKV *,
				ArrayRef<SubtargetFeatureKV> FeatureTable);
};

static
void ClearImpliedBits(FeatureBitset &Bits,
		      const SubtargetFeatureKV *FeatureEntry,
		      ArrayRef<SubtargetFeatureKV> FeatureTable) {
  for (auto &FE : FeatureTable) {
    if ((FE.Implies & FeatureEntry->Value).any()) {
      Bits &= ~FE.Value;
      ClearImpliedBits(Bits, &FE, FeatureTable);
    }
  }
}

FeatureBitset
SubtargetFeatures::ToggleFeature(FeatureBitset Bits,
				 const SubtargetFeatureKV *FeatureEntry,
				 ArrayRef<SubtargetFeatureKV> FeatureTable) {
    if ((Bits & FeatureEntry->Value) == FeatureEntry->Value) {
      Bits &= ~FeatureEntry->Value;
      ClearImpliedBits(Bits, FeatureEntry, FeatureTable);
    }
  return Bits;
}
[hjl@gnu-tools-1 pr68991]$ /export/build/gnu/gcc-x32/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/gcc-x32/build-x86_64-linux/gcc/ -m64 -O3 -fno-exceptions -fno-rtti -da -S -o add.s add.cc
[hjl@gnu-tools-1 pr68991]$ 

LRA turns

(insn 17 15 109 2 (set (reg:V4SI 122 [ vect__19.94 ])
        (plus:V4SI (mem/c:V4SI (plus:DI (reg/f:DI 20 frame)
                    (const_int -32 [0xffffffffffffffe0])) [6 MEM[(const struct bitset &)&Bits]+0 S16 A32])
            (subreg:V4SI (reg:V16QI 121) 0))) add.cc:24 2960 {*addv4si3}
     (expr_list:REG_DEAD (reg:V16QI 121)
        (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20 frame)
                    (const_int -16 [0xfffffffffffffff0])) [3 MEM[(unsigned int *)&__result]+0 S16 A128])
            (nil))))

into

(insn 17 15 109 2 (set (reg:V4SI 21 xmm0 [orig:122 vect__19.94 ] [122])
        (plus:V4SI (reg:V4SI 21 xmm0 [121])
            (mem/c:V4SI (reg/f:DI 7 sp) [6 MEM[(const struct bitset &)&Bits]+0 S16 A32]))) add.cc:24 2960 {*addv4si3}
     (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20 frame)
                (const_int -16 [0xfffffffffffffff0])) [3 MEM[(unsigned int *)&__result]+0 S16 A128])
        (nil)))

Comment 9 H.J. Lu 2016-01-04 22:25:25 UTC

With Bm constraint on SSE *mov<mode>_internal, curr_insn_transform in
lra-constraints.c generates an extra

(insn 354 353 323 8 (set (reg:V4SF 192)
        (reg:V4SF 202 [192])) 1226 {*movv4sf_internal}
     (nil))

for input:

(insn 353 322 354 8 (set (reg:V4SF 202 [192])
        (reg:V4SF 201 [192])) 1226 {*movv4sf_internal}
     (nil))

Comment 10 Jakub Jelinek 2016-01-04 23:00:36 UTC

But why should the *mov<mode>_internal use Bm or vector_operand?  It can/should handle both aligned and unaligned memory operands.

Comment 11 H.J. Lu 2016-01-04 23:26:20 UTC

(In reply to Jakub Jelinek from comment #10)
> But why should the *mov<mode>_internal use Bm or vector_operand?  It
> can/should handle both aligned and unaligned memory operands.

Only for historical reason.

Comment 12 H.J. Lu 2016-01-04 23:27:05 UTC

(In reply to H.J. Lu from comment #9)
> With Bm constraint on SSE *mov<mode>_internal, curr_insn_transform in
> lra-constraints.c generates an extra
> 
> (insn 354 353 323 8 (set (reg:V4SF 192)
>         (reg:V4SF 202 [192])) 1226 {*movv4sf_internal}
>      (nil))
> 
> for input:
> 
> (insn 353 322 354 8 (set (reg:V4SF 202 [192])
>         (reg:V4SF 201 [192])) 1226 {*movv4sf_internal}
>      (nil))


LRA is OK when Bm is properly defined as

(define_memory_constraint "Bm"
  "@internal Vector memory operand."
  (match_operand 0 "vector_memory_operand"))

Comment 13 H.J. Lu 2016-01-05 05:31:09 UTC

(In reply to H.J. Lu from comment #12)
> (In reply to H.J. Lu from comment #9)
> > With Bm constraint on SSE *mov<mode>_internal, curr_insn_transform in
> > lra-constraints.c generates an extra
> > 
> > (insn 354 353 323 8 (set (reg:V4SF 192)
> >         (reg:V4SF 202 [192])) 1226 {*movv4sf_internal}
> >      (nil))
> > 
> > for input:
> > 
> > (insn 353 322 354 8 (set (reg:V4SF 202 [192])
> >         (reg:V4SF 201 [192])) 1226 {*movv4sf_internal}
> >      (nil))
> 
> 
> LRA is OK when Bm is properly defined as
> 
> (define_memory_constraint "Bm"
>   "@internal Vector memory operand."
>   (match_operand 0 "vector_memory_operand"))

It doesn't work since process_alt_operands in LRA will treat Bm as m:

                   case CT_MEMORY:
                      if (MEM_P (op)
                          && satisfies_memory_constraint_p (op, cn))
                        win = true;
                      else if (spilled_pseudo_p (op))
                        win = true;

and ignores vector_memory_operand.

Comment 14 H.J. Lu 2016-01-05 05:51:48 UTC

(In reply to H.J. Lu from comment #13)

> > LRA is OK when Bm is properly defined as
> > 
> > (define_memory_constraint "Bm"
> >   "@internal Vector memory operand."
> >   (match_operand 0 "vector_memory_operand"))
> 
> It doesn't work since process_alt_operands in LRA will treat Bm as m:
> 
>                    case CT_MEMORY:
>                       if (MEM_P (op)
>                           && satisfies_memory_constraint_p (op, cn))
>                         win = true;
>                       else if (spilled_pseudo_p (op))
>                         win = true;
> 
> and ignores vector_memory_operand.

It happens with

                   case CT_MEMORY:
                      if (MEM_P (op)
                          && satisfies_memory_constraint_p (op, cn))
                        win = true;
                      else if (spilled_pseudo_p (op))
                        win = true;

                      /* If we didn't already win, we can reload constants
                         via force_const_mem or put the pseudo value into
                         memory, or make other memory by reloading the
                         address like for 'o'.  */
                      if (CONST_POOL_OK_P (mode, op)
                          || MEM_P (op) || REG_P (op))
                        badop = false;

             /* If this operand accepts a register, and if the
                 register class has at least one allocatable register,
                 then this operand can be reloaded.  */
              if (winreg && !no_regs_p)
                badop = false;

Comment 15 Jakub Jelinek 2016-01-05 08:23:43 UTC

(In reply to H.J. Lu from comment #11)
> (In reply to Jakub Jelinek from comment #10)
> > But why should the *mov<mode>_internal use Bm or vector_operand?  It
> > can/should handle both aligned and unaligned memory operands.
> 
> Only for historical reason.

I thought Uros said:
"Looking at the comment in Patch 3, I'd say let's keep *mov<mode>_internal constraints unchanged."
IMNSHO you only want to touch patterns which don't have ssememalign attributes (== have it 0) and leave the others as is.  Perhaps in the next step you can kill the UNSPEC_LOADU/UNSPEC_STOREU patterns and handle them in *mov<mode>_internal too - the unspecs were there just to make sure they aren't combined into SSE arithmetic instructions.

Comment 16 Uroš Bizjak 2016-01-05 08:28:25 UTC

(In reply to Jakub Jelinek from comment #15)
> (In reply to H.J. Lu from comment #11)
> > (In reply to Jakub Jelinek from comment #10)
> > > But why should the *mov<mode>_internal use Bm or vector_operand?  It
> > > can/should handle both aligned and unaligned memory operands.
> > 
> > Only for historical reason.
> 
> I thought Uros said:
> "Looking at the comment in Patch 3, I'd say let's keep *mov<mode>_internal
> constraints unchanged."
> IMNSHO you only want to touch patterns which don't have ssememalign
> attributes (== have it 0) and leave the others as is.  Perhaps in the next
> step you can kill the UNSPEC_LOADU/UNSPEC_STOREU patterns and handle them in
> *mov<mode>_internal too - the unspecs were there just to make sure they
> aren't combined into SSE arithmetic instructions.

Yes, there is no need to change *mov<mode>_internal constraints.

Comment 17 Uroš Bizjak 2016-01-05 08:41:05 UTC

(In reply to Jakub Jelinek from comment #15)

> IMNSHO you only want to touch patterns which don't have ssememalign
> attributes (== have it 0) and leave the others as is.  Perhaps in the next
> step you can kill the UNSPEC_LOADU/UNSPEC_STOREU patterns and handle them in
> *mov<mode>_internal too - the unspecs were there just to make sure they
> aren't combined into SSE arithmetic instructions.

I agree with the above. In the patch, please include minimum changes needed to solve the problem, we will do more radical changes after gcc-6 is released. IOW, ssememalign should stay in gcc-6.

Comment 18 hjl@gcc.gnu.org 2016-01-05 20:17:58 UTC

Author: hjl
Date: Tue Jan  5 20:17:26 2016
New Revision: 232087

URL: https://gcc.gnu.org/viewcvs?rev=232087&root=gcc&view=rev
Log:
Add vector_memory_operand and "Bm" constraint

SSE vector arithmetic and logic instructions only accept aligned memory
operand.  This patch adds vector_memory_operand and "Bm" constraint for
aligned SSE memory operand.  They are applied to SSE plusminus and
any_logic patterns.

gcc/

	PR target/68991
	* config/i386/constraints.md (Bm): New constraint.
	* config/i386/predicates.md (vector_memory_operand): New
	predicate.
	* config/i386/sse.md: Replace xm with xBm in plusminus and
	any_logic patterns.

gcc/testsuite/

	PR target/68991
	* g++.dg/pr68991-1.C: New test.
	* g++.dg/pr68991-2.C: Likewise.

Added:
    trunk/gcc/testsuite/g++.dg/pr68991-1.C
    trunk/gcc/testsuite/g++.dg/pr68991-2.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/constraints.md
    trunk/gcc/config/i386/predicates.md
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog

Comment 19 hjl@gcc.gnu.org 2016-01-05 20:19:48 UTC

Author: hjl
Date: Tue Jan  5 20:19:16 2016
New Revision: 232088

URL: https://gcc.gnu.org/viewcvs?rev=232088&root=gcc&view=rev
Log:
Use vector_operand on SSE with 16b memory operand

Add vector_operand, which is vector_memory_operand or register_operand,
and use it, instead of nonimmediate_operand, in SSE patterns with 16-byte
memory operand.

gcc/

	PR target/68991
	* config/i386/i386.c (ix86_expand_vector_logical_operator):
	Replace nonimmediate_operand with vector_operand.
	* config/i386/predicates.md (vector_operand): New predicate.
	(general_vector_operand): Replace nonimmediate_operand with
	vector_operand.
	* config/i386/sse.md: Replace nonimmediate_operand with
	vector_operand and m constraint with Bm constraint on SSE
	patterns with 16-byte memory operand.
	* config/i386/subst.md (round_nimm_predicate): Replace
	nonimmediate_operand with vector_operand.
	(round_saeonly_nimm_predicate): Likewise.
	(round_saeonly_nimm_scalar_predicate): New.

gcc/testsuite/

	PR target/68991
	* gcc.target/i386/pr68991.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr68991.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/predicates.md
    trunk/gcc/config/i386/sse.md
    trunk/gcc/config/i386/subst.md
    trunk/gcc/testsuite/ChangeLog

Comment 20 H.J. Lu 2016-01-06 13:25:05 UTC

Fixed for 6.0.