It should be trivial to support for single-lane SLP (testing a patch for that) but also multi-lane might be supportable, for example using slide up/down or easier when the number of lanes is power-of-two use a larger element lastb extract. In general when SLP discovery gets us multi-lanes and that fails we currently scrap the respective SLP instance but we might want to split it up to a set of single-lane instances (also avoiding re-discovery).
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:116bfbc806a7aa3f1ae2a3b3eb38d6bb65e0d0a7 commit r15-3506-g116bfbc806a7aa3f1ae2a3b3eb38d6bb65e0d0a7 Author: Richard Biener <rguenther@suse.de> Date: Thu Sep 5 10:46:58 2024 +0200 tree-optimization/116609 - SLP live lane vectorization with partial vectors The following implements the simple case of single-lane SLP when using partial vectors which can use the VEC_EXTRACT_LAST code generation without changes. I'll keep the PR open for further enhancements. This avoids FAILs of gcc.target/aarch64/sve/live_1.c when using single-lane SLP for non-grouped stores. PR tree-optimization/116609 * tree-vect-loop.cc (vectorizable_live_operation_1): Support partial vectors for single-lane SLP.
It should now work for single-lane SLP when also using the SLP equivalent of ncopies == 1. Both is overly restrictive of course, so keeping open.