Created attachment 48879 [details]
Current loop vectorizer only vectorize loops with groups size being power-of-2 or 3 due to vector permutation generation algorithm specifics.
However, in case of 2-element vectors, simple permutation schema can be used to support any group size: insert each vector element into required position, which leads to reasonable amount of operations in case of 2-element vectors.
Initial version is attached.
Note the code path you are changing will go away and "improving" it puts burden onto the replacement implementation ...
The testcase suggests the issue is missing SLP support for the not grouped
load of *k, something I've been looking at recently.