When failing vectorization without SLP we see that gcc.dg/vect/pr52252-ld.c ends up using single-lane SLP. That's way better than what GCC 14 does which is hybrid SLP but it might be possible to use a better strathegy for lowering node 0x4b25cf0 (max_nunits=1, refcnt=1) vector(16) unsigned char op: VEC_PERM_EXPR { } lane permutation { 0[0] 0[1] 0[2] 1[0] } children 0x4b25750 0x4b25990 that merges the three lane and single-lane values.