83894 – [missed optimization] __v16qu shift instruction sequence on x86

Bug 83894 - [missed optimization] __v16qu shift instruction sequence on x86

Summary: [missed optimization] __v16qu shift instruction sequence on x86

Status:	UNCONFIRMED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	7.2.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2018-01-16 12:41 UTC by Matthias Kretz (Vir)
Modified:	2018-01-17 08:45 UTC (History)
CC List:	0 users

See Also:
Host:
Target:	x86_64--, i?86--
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
benchmark (512 bytes, text/x-csrc) 2018-01-16 12:41 UTC, Matthias Kretz (Vir)	Details
tsc.h (1.46 KB, text/x-csrc) 2018-01-16 12:41 UTC, Matthias Kretz (Vir)	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Matthias Kretz (Vir) 2018-01-16 12:41:09 UTC

Created attachment 43148 [details]
benchmark

shifts of vector builtins with 8-bit integral element type can be optimized better.

I.e. `v << n` can be implemented as

1. load 0x00ff00ff00ff... and 16-bit shift by n
2. xor (1) with 0xff00ff00ff00... to produce a bitmask
3. 16-bit shift v by n
4. bitwise and of (2) and (3)

I'll attach a benchmark with an intrinsics based implementation.

Comment 1 Matthias Kretz (Vir) 2018-01-16 12:41:59 UTC

Created attachment 43149 [details]
tsc.h

Header required for the benchmark code.

Comment 2 Matthias Kretz (Vir) 2018-01-16 12:45:44 UTC

I compiled with:

g++-7 -march=haswell -std=c++17 -O3 -flax-vector-conversions -o char_shift char_shift.cpp