BTF verifier is a static analyzer that identifies possible kernel threats with BPF applications. The example below is a small reproducer obtained from debugging the last issue presented in https://github.com/systemd/systemd/issues/31888. extern int bar (); int baz; int foo () { int quux = bar (); if (baz) { if (quux) return 1; } else { if (!quux) return 1; } return 0; } The following code gets optimized by phiopt2 like: Removing basic block 6 Removing basic block 4 Removing basic block 3 Removing basic block 5 Merging blocks 2 and 7 int foo () { int quux; int baz.0_1; int _2; _Bool _6; _Bool _11; _Bool _12; <bb 2> [local count: 1073741824]: quux_5 = bar (); baz.0_1 = baz; _6 = quux_5 == 0; _11 = baz.0_1 != 0; _12 = _6 ^ _11; _2 = (int) _12; return _2; } The produced code from this optimization, in the original issue in github, results in unverifiable code for the BPF execution environment. For now it is unclear if the problem is within the verifier or not. As a resolution for the problem 2 paths should be taken: - Create a reproducer in the testing environment for bpf-next and report the problem within the respective mailing list. - Disable the optimization for BPF until the verifier fixes the limitation if possible.
There is nothing magical about phiopt (in this case) is doing that can't be produced by an user. What does the final code from the backend looks like?
> - Disable the optimization for BPF until the verifier fixes the limitation if possible. NO.
``` int foo1 () { int quux = bar (); _Bool t = quux == 0; _Bool t1 = baz != 0; int t2 = t; t2 ^= t1; return t2; } ``` Produces the same resulting asm. The way clang produces comparisons is: ``` r2 = 1 if r3 != 0 goto LBB1_2 # %bb.1: r2 = 0 LBB1_2: ```
``` int foo2 () { int quux = bar (); unsigned long t3 = (unsigned int)quux; t3+=-1; t3 = t3>>63; unsigned char t = t3; unsigned long t4 = (unsigned int)baz; t4 = -t4; t4 = t4>>63; unsigned char t1 = t4; int t2 = t; t2 ^= t1; return t2; } ``` Produces the same trick for doing branchless compares. Hmm, clang produces a branch for the above code though :).
here is one which makes similar code for both gcc and clang: ``` int foo2 () { int quux = bar (); unsigned long t3 = (unsigned int)quux; t3+=-1; asm("":"+r"(t3)); t3 = t3>>63; unsigned char t = t3; unsigned long t4 = (unsigned int)baz; t4 = -t4; asm("":"+r"(t4)); t4 = t4>>63; asm("":"+r"(t4)); unsigned char t1 = t4; int t2 = t; t2 ^= t1; return t2; } ``` though gcc produces mov32 and not <<32/>>32.
I am going to close as this as invalid because the bug is definitely in the verfier after my testcases showing GCC and clang the same code and with the reasonable inputs.
The problem is the verifier: Mär 26 23:57:12 H systemd[1]: 17: (ac) w0 ^= w3 ; R0_w=scalar() R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1)) Mär 26 23:57:12 H systemd[1]: 18: (57) r0 &= 255 ; R0_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) it lost track of what the input of r0 was. It had the right result after the xor but then the bit_and didn't take into account the what was before it. It just used 0xff.
(In reply to Andrew Pinski from comment #7) > The problem is the verifier: > Mär 26 23:57:12 H systemd[1]: 17: (ac) w0 ^= w3 ; > R0_w=scalar() > R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1)) > Mär 26 23:57:12 H systemd[1]: 18: (57) r0 &= 255 ; > R0_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) > > > it lost track of what the input of r0 was. It had the right result after the > xor but then the bit_and didn't take into account the what was before it. It > just used 0xff. Or rather the verifier is keeping track of rN and rN_w seperately but GCC produces code that uses both and that definitely confuses the verifier. Mär 26 23:57:12 H systemd[1]: 16: (77) r3 >>= 63 ; R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1)) wait why did the verifier use r3_w here rather than just r3? that seems like the issue and would fix this issue easier ...
Please do not close this bug. We are well aware this is a limitation in the kernel verifier, but we need to keep track of this. In general, like in this case, many of the problems related to unverifiable generated code can be reduced to "this is a kernel verifier bug, make it smarter", but that doesn't mean we can't put workarounds in the back-end so we can actually compile actually-existing BPF programs (such as that systemd utility) in actually-existing current kernels. That includes disabling certain optimizations. I don't like it any more than you do, trust me on that. Note that the development of the BPF backends, in both clang and GCC, are by necessity very much in lock-step with the kernel BPF support. So this is not something we will forget to "undo" as soon as the verifier can handle these instructions. Very likely we will be doing the kernel patch as well.
The only way to fix this is to add extra zero extends all the time.
The bpf verifier is just plain broken when it comes to subreg usage. So after every 32bit usage you need to output a zero extend. To fix that.
Let me correct the title since phiopt has nothing to do with the issue at hand. It is the subreg with bitwise and that confuses the verifier.
Thanks. The new title is way better. And thank you for the further analysis and the reproducer that also makes clang to generate the no-verifiable code! I wonder, is the issue also there when -mno-alu32 is used? In that case the generated code doesn't involve "subregs" (or 32-bit operations in assembly-like syntax): .file "foo.c" .text .align 3 .global foo .type foo, @function foo: call bar lddw %r1,baz mov %r0,%r0 and %r0,0xffffffff ldxw %r2,[%r1+0] add %r0,-1 neg %r2 xor %r0,%r2 rsh %r0,63 exit .size foo, .-foo .global baz .type baz, @object .lcomm baz,4,4 .ident "GCC: (GNU) 14.0.1 20240226 (experimental)" Cuper, is the verifier able to track proper values through the xor in this case?