For example, the testcase from the testsuite: #define N 640 int a[N] = {0}; int b[N] = {0}; void f1 () { for (int i = 0; i < N; i++) { b[i] += a[i]; if (a[i] > 0) break; } } When compiled with, say, -mcpu=neoverse-v2 will choose to vectorise with Neon modes and emit: cmgt v31.4s, v30.4s, #0 umaxp v31.4s, v31.4s, v31.4s fmov x3, d31 cbz x3, .L2 for the early break check. But since this target supports SVE it could be using the SVE sequence: cmpgt p14.s, p7/z, z28.s, #0 ptest p15, p14.b b.none .L3 which is a bit shorter and, if I read the Neoverse V2 optimisation guide correctly, should take one less cycle. In this particular case the compiler would know that the operand to the compare came from a Neon load so the >128 bits are zero for VLA code. But if it can't prove that generally it could still make this codegen decision with -msve-vector-bits=128
Confirmed, I had submitted a patch for this a few years ago but it never got a review/help I requested https://patchwork.sourceware.org/project/gcc/patch/ZJw6SvUWBaXlpQoL@arm.com/ I was stuck on being able to do this in RTL because the problem is that the sequence in gimple is b = a > 0 if (c != 0) break and so the cbranch expansion doesn't always see the actual comparison being done. And if you try to match them up in RTL it becomes very complicated as the patch above showed. I've been thinking about it again, but this time I'd instead change it so that expand gives the comparison as the operation to cbranch. This removes the complication of trying to match an SVE and Adv. SIMD compare into one. But this can only be done if the target supports more than 0 as the second operand. because e.g. MVE can do c != 0, but not b1 > b2, where they're both predicates. Also: cmpgt p14.s, p7/z, z28.s, #0 ptest p15, p14.b b.none .L3 should be: cmpgt p14.s, p7/z, z28.s, #0 b.none .L3 we don't need the ptest since we only use b.none or b.any and the size of the predicate of the ptest is smaller than the original. This however is a general problem with SVE codegen in GCC and not cbranch related. The CC elimination pass needs to be expanded here, I think Richard was playing with extending it to do this.
Mine. Have a patch, the above now generates .L6: ldr q30, [x2, x0] cmple p15.s, p7/z, z30.s, #0 beq .L2 In queue for GCC 16 stage 1
and using the SVE CC regs: .L6: ldr q30, [x2, x0] cmple p15.s, p7/z, z30.s, #0 b.none .L2