Take: ``` void __attribute__ ((noipa)) f0 (int *__restrict r, int *__restrict a, int *__restrict pred) { for (int i = 0; i < 1024; ++i) { unsigned short p = pred[i]?3:0; r[i] = p ; } } void __attribute__ ((noipa)) f1 (int *__restrict r, int *__restrict a, int *__restrict pred) { for (int i = 0; i < 1024; ++i) { int p = pred[i]?1<<3:0; r[i] = p ; } } ``` These 2 functions should produce the same code, selecting between 8 and 0 but instead in f0, we have a truncation and then an extension. This happens on x86_64 at -O3 and aarch64 at -O3. Though aarch64 with `-O3 -march=armv8.5-a+sve2` will be fixed with the patch to PR 111006 (which I will be submitting later today) because SVE uses conversions rather than VEC_PACK_TRUNC_EXPR/vec_unpack_hi_expr/vec_unpack_lo_expr here.
One way of fixing this is to optimize the following for the scalar side: ``` _15 = _4 != 0; _16 = (short unsigned int) _15; _17 = _16 << 3; _6 = (int) _17; ``` into: ``` _t = (int) _15; _6 = _t << 3; ``` Note this has the same issue too: ``` void __attribute__ ((noipa)) f0_1 (int *__restrict r, int *__restrict pred) { for (int i = 0; i < 1024; ++i) { short p = pred[i]?-1:0; r[i] = p ; } } ```