[Bug target/105966] x86: operations on certain few-element vectors yield very inefficient code
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Tue Jun 14 09:56:11 GMT 2022
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105966
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2022-06-14
CC| |rguenth at gcc dot gnu.org
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
So "lowering" would turn
_1 = _2 * _3;
into
_2' = { _2, {}, ... }; // vector-of-vector CTOR with zero filling
_3' = { _3, {}, ... };
_1' = _2' * _3';
_1 = BIT_FIELD_REF <_1', 0, bitsizeof(_1)>; // lowpart
little/big-endian needs some thoughts here. We currently require all
elements explicitely specified for vector-of-vector CTORs, for
scalar element CTORs we allow automatic zero-filling which would be
convenient here as well. For division we'd use a vector of ones.
Since lowering is on a per-stmt base we have to optimize the glues away,
thus
_2 = BIT_FIELD_REF <_3, 0, bitsizeof(_3)>;
_1 = { _2, {}, ... };
should ideally become just _3 but then we have to know _3 is zero-filled
or decide we can also have arbitrary values in the upper halves (signed
integer overflow issues, FP with NaNs might be slow, etc.). The vector
lowering process lacks something like a lattice so it doesn't re-use
previously lowered intermediate results (boo).
More information about the Gcc-bugs
mailing list