Bug 83894 - [missed optimization] __v16qu shift instruction sequence on x86
Summary: [missed optimization] __v16qu shift instruction sequence on x86
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 7.2.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2018-01-16 12:41 UTC by Matthias Kretz (Vir)
Modified: 2018-01-17 08:45 UTC (History)
0 users

See Also:
Host:
Target: x86_64-*-*, i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
benchmark (512 bytes, text/x-csrc)
2018-01-16 12:41 UTC, Matthias Kretz (Vir)
Details
tsc.h (1.46 KB, text/x-csrc)
2018-01-16 12:41 UTC, Matthias Kretz (Vir)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Kretz (Vir) 2018-01-16 12:41:09 UTC
Created attachment 43148 [details]
benchmark

shifts of vector builtins with 8-bit integral element type can be optimized better.

I.e. `v << n` can be implemented as

1. load 0x00ff00ff00ff... and 16-bit shift by n
2. xor (1) with 0xff00ff00ff00... to produce a bitmask
3. 16-bit shift v by n
4. bitwise and of (2) and (3)

I'll attach a benchmark with an intrinsics based implementation.
Comment 1 Matthias Kretz (Vir) 2018-01-16 12:41:59 UTC
Created attachment 43149 [details]
tsc.h

Header required for the benchmark code.
Comment 2 Matthias Kretz (Vir) 2018-01-16 12:45:44 UTC
I compiled with:

g++-7 -march=haswell -std=c++17 -O3 -flax-vector-conversions -o char_shift char_shift.cpp