This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] [x86] Avoid builtins for SSE/AVX2 immidiate logical shifts
- From: Allan Sandfeld Jensen <linux at carewolf dot com>
- To: gcc-patches at gcc dot gnu dot org, Jakub Jelinek <jakub at redhat dot com>
- Cc: Kirill Yukhin <kirill dot yukhin at gmail dot com>, Uros Bizjak <ubizjak at gmail dot com>
- Date: Mon, 24 Apr 2017 11:36:42 +0200
- Subject: Re: [PATCH] [x86] Avoid builtins for SSE/AVX2 immidiate logical shifts
- Authentication-results: sourceware.org; auth=none
- References: <201704221338.46300.linux@carewolf.com> <201704241101.29634.linux@carewolf.com> <20170424091737.GL1809@tucnak>
On Monday 24 April 2017, Jakub Jelinek wrote:
> On Mon, Apr 24, 2017 at 11:01:29AM +0200, Allan Sandfeld Jensen wrote:
> > On Monday 24 April 2017, Jakub Jelinek wrote:
> > > On Mon, Apr 24, 2017 at 10:34:58AM +0200, Allan Sandfeld Jensen wrote:
> > > > That is a different instruction. That is the vpsllw not vpsllwi
> > > >
> > > > The intrinsics I changed is the immediate version, I didn't change
> > > > the non- immediate version. It is probably a bug if you can give
> > > > non-immediate values to the immediate only intrinsic. At least both
> > > > versions handles it, if in different ways, but is is illegal
> > > > arguments.
> > >
> > > The documentation is unclear on that and I've only recently fixed up
> > > some cases where these intrinsics weren't able to handle non-constant
> > > arguments in some cases, while both ICC and clang coped with that
> > > fine.
> > > So it is clearly allowed and handled by all the compilers and needs to
> > > be supported, people use that in real-world code.
> >
> > Undoubtedly it happens. I just make a mistake myself that created that
> > case. But it is rather unfortunate, and means we make wrong code
> > currently for corner case values.
>
> The intrinsic documentation is poor, usually you have a good documentation
> on what the instructions do, and then you just have to guess what the
> intrinsics do. You can of course ask Intel for clarification.
>
> If you try:
> #include <x86intrin.h>
>
> __m128i
> foo (__m128i a, int b)
> {
> return _mm_slli_epi16 (a, b);
> }
> and call it with 257 from somewhere else, you can see that all the
> compilers will give you zero vector. And similarly if you use 257
> literally instead of b. So what the intrinsic (unlike the instruction)
> actually does is that it compares all bits of the imm8 argument (supposedly
> using unsigned comparison) and if it is bigger than 15 (or 7 or 31 or 63
> depending on the bitsize of element) it yields 0 vector.
>
Good point. I was using intel's documentation at
https://software.intel.com/sites/landingpage/IntrinsicsGuide/, but if all
compilers including us does something else, practicality wins.
It did make me curious and test out what _mm_slli_epi16(v, -250); compiles to.
For some reason that becomes an undefined shift using the non-immediate sll in
gcc, but returns the zero-vector in clang. With my patch it was a 6 bit shift,
but that is apparently not de-facto standard.
`Allan