This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Remove UNSPEC_LOADU and UNSPEC_STOREU


On Mon, Apr 18, 2016 at 8:40 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Sun, Jan 10, 2016 at 11:45 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Sun, Jan 10, 2016 at 11:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> Since *mov<mode>_internal and <avx512>_(load|store)<mode>_mask patterns
>>> can handle unaligned load and store, we can remove UNSPEC_LOADU and
>>> UNSPEC_STOREU.  We use function prototypes with pointer to scalar for
>>> unaligned load/store builtin functions so that memory passed to
>>> *mov<mode>_internal is unaligned.
>>>
>>> Tested on x86-64.  Is this OK for trunk in stage 3?
>>
>> This patch is not appropriate for stage 3.
>>
>> Uros.
>>
>>> H.J.
>>> ----
>>> gcc/
>>>
>>>         PR target/69201
>>>         * config/i386/avx512bwintrin.h (_mm512_mask_loadu_epi16): Pass
>>>         const short * to __builtin_ia32_loaddquhi512_mask.
>>>         (_mm512_maskz_loadu_epi16): Likewise.
>>>         (_mm512_mask_storeu_epi16): Pass short * to
>>>         __builtin_ia32_storedquhi512_mask.
>>>         (_mm512_mask_loadu_epi8): Pass const char * to
>>>         __builtin_ia32_loaddquqi512_mask.
>>>         (_mm512_maskz_loadu_epi8): Likewise.
>>>         (_mm512_mask_storeu_epi8): Pass char * to
>>>         __builtin_ia32_storedquqi512_mask.
>>>         * config/i386/avx512fintrin.h (_mm512_loadu_pd): Pass
>>>         const double * to __builtin_ia32_loadupd512_mask.
>>>         (_mm512_mask_loadu_pd): Likewise.
>>>         (_mm512_maskz_loadu_pd): Likewise.
>>>         (_mm512_storeu_pd): Pass double * to
>>>         __builtin_ia32_storeupd512_mask.
>>>         (_mm512_mask_storeu_pd): Likewise.
>>>         (_mm512_loadu_ps): Pass const float * to
>>>         __builtin_ia32_loadups512_mask.
>>>         (_mm512_mask_loadu_ps): Likewise.
>>>         (_mm512_maskz_loadu_ps): Likewise.
>>>         (_mm512_storeu_ps): Pass float * to
>>>         __builtin_ia32_storeups512_mask.
>>>         (_mm512_mask_storeu_ps): Likewise.
>>>         (_mm512_mask_loadu_epi64): Pass const long long * to
>>>         __builtin_ia32_loaddqudi512_mask.
>>>         (_mm512_maskz_loadu_epi64): Likewise.
>>>         (_mm512_mask_storeu_epi64): Pass long long *
>>>         to __builtin_ia32_storedqudi512_mask.
>>>         (_mm512_loadu_si512): Pass const int * to
>>>         __builtin_ia32_loaddqusi512_mask.
>>>         (_mm512_mask_loadu_epi32): Likewise.
>>>         (_mm512_maskz_loadu_epi32): Likewise.
>>>         (_mm512_storeu_si512): Pass int * to
>>>         __builtin_ia32_storedqusi512_mask.
>>>         (_mm512_mask_storeu_epi32): Likewise.
>>>         * config/i386/avx512vlbwintrin.h (_mm256_mask_storeu_epi8): Pass
>>>         char * to __builtin_ia32_storedquqi256_mask.
>>>         (_mm_mask_storeu_epi8): Likewise.
>>>         (_mm256_mask_loadu_epi16): Pass const short * to
>>>         __builtin_ia32_loaddquhi256_mask.
>>>         (_mm256_maskz_loadu_epi16): Likewise.
>>>         (_mm_mask_loadu_epi16): Pass const short * to
>>>         __builtin_ia32_loaddquhi128_mask.
>>>         (_mm_maskz_loadu_epi16): Likewise.
>>>         (_mm256_mask_loadu_epi8): Pass const char * to
>>>         __builtin_ia32_loaddquqi256_mask.
>>>         (_mm256_maskz_loadu_epi8): Likewise.
>>>         (_mm_mask_loadu_epi8): Pass const char * to
>>>         __builtin_ia32_loaddquqi128_mask.
>>>         (_mm_maskz_loadu_epi8): Likewise.
>>>         (_mm256_mask_storeu_epi16): Pass short * to.
>>>         __builtin_ia32_storedquhi256_mask.
>>>         (_mm_mask_storeu_epi16): Pass short * to.
>>>         __builtin_ia32_storedquhi128_mask.
>>>         * config/i386/avx512vlintrin.h (_mm256_mask_loadu_pd): Pass
>>>         const double * to __builtin_ia32_loadupd256_mask.
>>>         (_mm256_maskz_loadu_pd): Likewise.
>>>         (_mm_mask_loadu_pd): Pass onst double * to
>>>         __builtin_ia32_loadupd128_mask.
>>>         (_mm_maskz_loadu_pd): Likewise.
>>>         (_mm256_mask_storeu_pd): Pass double * to
>>>         __builtin_ia32_storeupd256_mask.
>>>         (_mm_mask_storeu_pd): Pass double * to
>>>         __builtin_ia32_storeupd128_mask.
>>>         (_mm256_mask_loadu_ps): Pass const float * to
>>>         __builtin_ia32_loadups256_mask.
>>>         (_mm256_maskz_loadu_ps): Likewise.
>>>         (_mm_mask_loadu_ps): Pass const float * to
>>>         __builtin_ia32_loadups128_mask.
>>>         (_mm_maskz_loadu_ps): Likewise.
>>>         (_mm256_mask_storeu_ps): Pass float * to
>>>         __builtin_ia32_storeups256_mask.
>>>         (_mm_mask_storeu_ps): ass float * to
>>>         __builtin_ia32_storeups128_mask.
>>>         (_mm256_mask_loadu_epi64): Pass const long long * to
>>>         __builtin_ia32_loaddqudi256_mask.
>>>         (_mm256_maskz_loadu_epi64): Likewise.
>>>         (_mm_mask_loadu_epi64): Pass const long long * to
>>>         __builtin_ia32_loaddqudi128_mask.
>>>         (_mm_maskz_loadu_epi64): Likewise.
>>>         (_mm256_mask_storeu_epi64): Pass long long * to
>>>         __builtin_ia32_storedqudi256_mask.
>>>         (_mm_mask_storeu_epi64): Pass long long * to
>>>         __builtin_ia32_storedqudi128_mask.
>>>         (_mm256_mask_loadu_epi32): Pass const int * to
>>>         __builtin_ia32_loaddqusi256_mask.
>>>         (_mm256_maskz_loadu_epi32): Likewise.
>>>         (_mm_mask_loadu_epi32): Pass const int * to
>>>         __builtin_ia32_loaddqusi128_mask.
>>>         (_mm_maskz_loadu_epi32): Likewise.
>>>         (_mm256_mask_storeu_epi32): Pass int * to
>>>         __builtin_ia32_storedqusi256_mask.
>>>         (_mm_mask_storeu_epi32): Pass int * to
>>>         __builtin_ia32_storedqusi128_mask.
>>>         * config/i386/i386-builtin-types.def (PCSHORT): New.
>>>         (PINT64): Likewise.
>>>         (V64QI_FTYPE_PCCHAR_V64QI_UDI): Likewise.
>>>         (V32HI_FTYPE_PCSHORT_V32HI_USI): Likewise.
>>>         (V32QI_FTYPE_PCCHAR_V32QI_USI): Likewise.
>>>         (V16SF_FTYPE_PCFLOAT_V16SF_UHI): Likewise.
>>>         (V8DF_FTYPE_PCDOUBLE_V8DF_UQI): Likewise.
>>>         (V16SI_FTYPE_PCINT_V16SI_UHI): Likewise.
>>>         (V16HI_FTYPE_PCSHORT_V16HI_UHI): Likewise.
>>>         (V16QI_FTYPE_PCCHAR_V16QI_UHI): Likewise.
>>>         (V8SF_FTYPE_PCFLOAT_V8SF_UQI): Likewise.
>>>         (V8DI_FTYPE_PCINT64_V8DI_UQI): Likewise.
>>>         (V8SI_FTYPE_PCINT_V8SI_UQI): Likewise.
>>>         (V8HI_FTYPE_PCSHORT_V8HI_UQI): Likewise.
>>>         (V4DF_FTYPE_PCDOUBLE_V4DF_UQI): Likewise.
>>>         (V4SF_FTYPE_PCFLOAT_V4SF_UQI): Likewise.
>>>         (V4DI_FTYPE_PCINT64_V4DI_UQI): Likewise.
>>>         (V4SI_FTYPE_PCINT_V4SI_UQI): Likewise.
>>>         (V2DF_FTYPE_PCDOUBLE_V2DF_UQI): Likewise.
>>>         (V2DI_FTYPE_PCINT64_V2DI_UQI): Likewise.
>>>         (VOID_FTYPE_PDOUBLE_V8DF_UQI): Likewise.
>>>         (VOID_FTYPE_PDOUBLE_V4DF_UQI): Likewise.
>>>         (VOID_FTYPE_PDOUBLE_V2DF_UQI): Likewise.
>>>         (VOID_FTYPE_PFLOAT_V16SF_UHI): Likewise.
>>>         (VOID_FTYPE_PFLOAT_V8SF_UQI): Likewise.
>>>         (VOID_FTYPE_PFLOAT_V4SF_UQI): Likewise.
>>>         (VOID_FTYPE_PINT64_V8DI_UQI): Likewise.
>>>         (VOID_FTYPE_PINT64_V4DI_UQI): Likewise.
>>>         (VOID_FTYPE_PINT64_V2DI_UQI): Likewise.
>>>         (VOID_FTYPE_PINT_V16SI_UHI): Likewise.
>>>         (VOID_FTYPE_PINT_V8SI_UHI): Likewise.
>>>         (VOID_FTYPE_PINT_V4SI_UHI): Likewise.
>>>         (VOID_FTYPE_PSHORT_V32HI_USI): Likewise.
>>>         (VOID_FTYPE_PSHORT_V16HI_UHI): Likewise.
>>>         (VOID_FTYPE_PSHORT_V8HI_UQI): Likewise.
>>>         (VOID_FTYPE_PCHAR_V64QI_UDI): Likewise.
>>>         (VOID_FTYPE_PCHAR_V32QI_USI): Likewise.
>>>         (VOID_FTYPE_PCHAR_V16QI_UHI): Likewise.
>>>         (V64QI_FTYPE_PCV64QI_V64QI_UDI): Removed.
>>>         (V32HI_FTYPE_PCV32HI_V32HI_USI): Likewise.
>>>         (V32QI_FTYPE_PCV32QI_V32QI_USI): Likewise.
>>>         (V16HI_FTYPE_PCV16HI_V16HI_UHI): Likewise.
>>>         (V16QI_FTYPE_PCV16QI_V16QI_UHI): Likewise.
>>>         (V8HI_FTYPE_PCV8HI_V8HI_UQI): Likewise.
>>>         (VOID_FTYPE_PV32HI_V32HI_USI): Likewise.
>>>         (VOID_FTYPE_PV16HI_V16HI_UHI): Likewise.
>>>         (VOID_FTYPE_PV8HI_V8HI_UQI): Likewise.
>>>         (VOID_FTYPE_PV64QI_V64QI_UDI): Likewise.
>>>         (VOID_FTYPE_PV32QI_V32QI_USI): Likewise.
>>>         (VOID_FTYPE_PV16QI_V16QI_UHI): Likewise.
>>>         * config/i386/i386.c (ix86_emit_save_reg_using_mov): Don't
>>>         use UNSPEC_STOREU.
>>>         (ix86_emit_restore_sse_regs_using_mov): Don't use UNSPEC_LOADU.
>>>         (ix86_avx256_split_vector_move_misalign): Don't use unaligned
>>>         load nor store.
>>>         (ix86_expand_vector_move_misalign): Likewise.
>>>         (bdesc_special_args): Use CODE_FOR_movvNXY_internal and pointer
>>>         to scalar function prototype for unaligned load/store builtins.
>>>         (ix86_expand_special_args_builtin): Updated.
>>>         * config/i386/sse.md (UNSPEC_LOADU): Removed.
>>>         (UNSPEC_STOREU): Likewise.
>>>         (VI_ULOADSTORE_BW_AVX512VL): Likewise.
>>>         (VI_ULOADSTORE_F_AVX512VL): Likewise.
>>>         (<sse>_loadu<ssemodesuffix><avxsizesuffix><mask_name>): Likewise.
>>>         (*<sse>_loadu<ssemodesuffix><avxsizesuffix><mask_name>): Likewise.
>>>         (<sse>_storeu<ssemodesuffix><avxsizesuffix>): Likewise.
>>>         (<avx512>_storeu<ssemodesuffix><avxsizesuffix>_mask): Likewise.
>>>         (<sse2_avx_avx512f>_loaddqu<mode><mask_name>): Likewise.
>>>         (*<sse2_avx_avx512f>_loaddqu<mode><mask_name>"): Likewise.
>>>         (sse2_avx_avx512f>_storedqu<mode>): Likewise.
>>>         (<avx512>_storedqu<mode>_mask): Likewise.
>>>         (*sse4_2_pcmpestr_unaligned): Likewise.
>>>         (*sse4_2_pcmpistr_unaligned): Likewise.
>>>         (*mov<mode>_internal): Renamed to ...
>>>         (mov<mode>_internal): This.  Remove check of AVX and IAMCU on
>>>         misaligned operand.  Replace vmovdqu64 with vmovdqu<ssescalarsize>.
>>>         (movsd/movhpd to movupd peephole): Don't use UNSPEC_LOADU.
>>>         (movlpd/movhpd to movupd peephole): Don't use UNSPEC_STOREU.
>>>
>>> gcc/testsuite/
>>>
>>>         PR target/69201
>>>         * gcc.target/i386/avx256-unaligned-store-1.c (a): Make it
>>>         extern to force it misaligned.
>>>         (b): Likewise.
>>>         (c): Likewise.
>>>         (d): Likewise.
>>>         Check vmovups.*movv8sf_internal/3 instead of avx_storeups256.
>>>         Don't check `*' before movv4sf_internal.
>>>         * gcc.target/i386/avx256-unaligned-store-2.c: Check
>>>         vmovups.*movv32qi_internal/3 instead of avx_storeups256.
>>>         Don't check `*' before movv16qi_internal.
>>>         * gcc.target/i386/avx256-unaligned-store-3.c (a): Make it
>>>         extern to force it misaligned.
>>>         (b): Likewise.
>>>         (c): Likewise.
>>>         (d): Likewise.
>>>         Check vmovups.*movv4df_internal/3 instead of avx_storeupd256.
>>>         Don't check `*' before movv2df_internal.
>>>         * gcc.target/i386/avx256-unaligned-store-4.c (a): Make it
>>>         extern to force it misaligned.
>>>         (b): Likewise.
>>>         (c): Likewise.
>>>         (d): Likewise.
>>>         Check movv8sf_internal instead of avx_storeups256.
>>>         Check movups.*movv4sf_internal/3 instead of avx_storeups256.
>
>
> Here is the updated patch for GCC 7.  Tested on x86-64.  OK for
> trrunk?

IIRC from previous discussion, are we sure we won't propagate
unaligned memory into SSE arithmetic insns?

Otherwise, the patch is OK, but please wait for Kirill for AVX512 approval.

Thanks,
Uros.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]