[PATCH] x86: Define _mm*_undefined_*
Ilya Tocar
tocarip.intel@gmail.com
Mon Mar 17 11:41:00 GMT 2014
On 16 Mar 07:12, Ulrich Drepper wrote:
> [This patch is so far really meant for commenting. I haven't tested it
> at all yet.]
>
> Intel's intrinsic specification includes one set which currently is not
> defined in gcc's headers: the _mm*_undefined_* intrinsics.
What specification are talking about? As far as I know they are present
in ICC headers, but not in manuals such as:
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
> The purpose of these instrinsics (currently three classes, three formats
> each) is to create a pseudo-value the compiler does not assume is
> uninitialized without incurring any code doing so. The purpose is to
> use these intrinsics in places where it is known the value of a register
> is never used. This is already important with AVX2 and becomes really
> crucial with AVX512.
>
> Currently three different techniques are used:
>
> - _mm*_setzero_*() is used. Even though the XOR operation does not
> cost anything it still messes with the instruction scheduling and
> more code is generated.
>
> - another parameter is duplicated. This leads most of the time to
> one additional move instruction.
>
> - uninitialized variables are used (this is in new AVX512 code). The
> compiler should generate warnings for these headers. I haven't
> tried it.
Uninitialized variables certainly are bad. Replacing them with
setzero/undefined is a good idea.
Also in most AVX512 cases those values shouldn't be present in code.
They are either optimized away in case of -1 mask or result in
zero-masking being applied. Do you know of any cases where xor is
generated (except for destination in gather/scatter)
>
> Using the _mm*_undefined_*() intrinsics is much cleaner and also
> potentially allows to generate better code.
>
> For now the implementation uses an inline asm to suggest to the compiler
> that the variable is initialized. This does not prevent a real register
> to be allocated for this purpose but it saves the XOR instruction.
>
> The correct and optimal implementation will require a compiler built-in
> which will do something different based on how the value is used:
>
> - if the value is never modified then any register should be picked.
> In function/intrinsic calls the parameter simply need not be loaded at
> all.
>
> - if the value is modified (and allocated to a register or memory
> location) no initialization for the variable is needed (equivalent
> to the asm now).
>
>
> The questions are:
>
> - is there interest in adding the necessary compiler built-in?
>
> - if yes, anyone interested in working on this?
>
> - and: is it worth adding a patch like the on here in the meantime?
>
> As it stands now gcc's instrinsics are not complete and programs following
> Intel's manuals can fail to compile.
>
Compatibility with ICC is certainly good. I tried your patch, and
undefined is similar in behavior to setzero, but it also clobbers
flags. Maybe just define it to setzero for now?
>
>
> 2014-03-16 Ulrich Drepper <drepper@gmail.com>
>
> * config/i386/avxintrin.h (_mm256_undefined_si256): Define.
> (_mm256_undefined_ps): Define.
> (_mm256_undefined_pd): Define.
> * config/i386/emmintrin.h (_mm_undefined_si128): Define.
> (_mm_undefined_pd): Define.
> * config/i386/xmmintrin.h (_mm_undefined_ps): Define.
> * config/i386/avx512fintrin.h (_mm512_undefined_si512): Define.
> (_mm512_undefined_ps): Define.
> (_mm512_undefined_pd): Define.
> Use _mm*_undefined_*.
> * config/i386/avx2intrin.h: Use _mm*_undefined_*.
>
More information about the Gcc-patches
mailing list