114523 – bpf: unverifable code due to subreg usage

Bug 114523 - bpf: unverifable code due to subreg usage

Summary: bpf: unverifable code due to subreg usage

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	14.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:	114431
	Show dependency tree / graph

Reported:	2024-03-28 17:45 UTC by Cupertino Miranda
Modified:	2024-03-28 21:06 UTC (History)
CC List:	1 user (show)

See Also:	https://github.com/systemd/systemd/issues/31888
Host:
Target:	bpf
Build:
Known to work:
Known to fail:
Last reconfirmed:	2024-03-28 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Cupertino Miranda 2024-03-28 17:45:05 UTC

BTF verifier is a static analyzer that identifies possible kernel threats with BPF applications.
The example below is a small reproducer obtained from debugging the last issue presented in https://github.com/systemd/systemd/issues/31888.

extern int bar ();

int baz;

int foo ()
{
  int quux = bar ();

  if (baz)
    {
      if (quux)
	return 1;
    }
  else
    {
      if (!quux)
	return 1;
    }

  return 0;
}

The following code gets optimized by phiopt2 like:

Removing basic block 6
Removing basic block 4
Removing basic block 3
Removing basic block 5
Merging blocks 2 and 7
int foo ()
{
  int quux;
  int baz.0_1;
  int _2;
  _Bool _6;
  _Bool _11;
  _Bool _12;

  <bb 2> [local count: 1073741824]:
  quux_5 = bar ();
  baz.0_1 = baz;
  _6 = quux_5 == 0;
  _11 = baz.0_1 != 0;
  _12 = _6 ^ _11;
  _2 = (int) _12;
  return _2;
}

The produced code from this optimization, in the original issue in github, results in unverifiable code for the BPF execution environment.
For now it is unclear if the problem is within the verifier or not.

As a resolution for the problem 2 paths should be taken:
 - Create a reproducer in the testing environment for bpf-next and report the problem within the respective mailing list.
 - Disable the optimization for BPF until the verifier fixes the limitation if possible.

Comment 1 Andrew Pinski 2024-03-28 17:50:59 UTC

There is nothing magical about phiopt (in this case) is doing that can't be produced by an user.

What does the final code from the backend looks like?

Comment 2 Andrew Pinski 2024-03-28 17:52:26 UTC

> - Disable the optimization for BPF until the verifier fixes the limitation if possible.

NO.

Comment 3 Andrew Pinski 2024-03-28 17:58:20 UTC

```
int foo1 ()
{
  int quux = bar ();
  _Bool t = quux == 0;
  _Bool t1 = baz != 0;
  int t2 = t;
  t2 ^= t1;
  return t2;
}
```

Produces the same resulting asm.
The way clang produces comparisons is:
```
        r2 = 1
        if r3 != 0 goto LBB1_2
# %bb.1:
        r2 = 0
LBB1_2:
```

Comment 4 Andrew Pinski 2024-03-28 18:05:36 UTC

```
int foo2 ()
{
  int quux = bar ();
  unsigned long t3 = (unsigned int)quux;
  t3+=-1;
  t3 = t3>>63;
  unsigned char t = t3;
  unsigned long t4 = (unsigned int)baz;
  t4 = -t4;
  t4 = t4>>63;
  unsigned char t1 = t4;
  int t2 = t;
  t2 ^= t1;
  return t2;
}
```

Produces the same trick for doing branchless compares.
Hmm, clang produces a branch for the above code though :).

Comment 5 Andrew Pinski 2024-03-28 18:09:44 UTC

here is one which makes similar code for both gcc and clang:
```

int foo2 ()
{
  int quux = bar ();
  unsigned long t3 = (unsigned int)quux;
  t3+=-1;
  asm("":"+r"(t3));
  t3 = t3>>63;
  unsigned char t = t3;
  unsigned long t4 = (unsigned int)baz;
  t4 = -t4;
  asm("":"+r"(t4));
  t4 = t4>>63;
  asm("":"+r"(t4));
  unsigned char t1 = t4;
  int t2 = t;
  t2 ^= t1;
  return t2;
}

```

though gcc produces mov32 and not <<32/>>32.

Comment 6 Andrew Pinski 2024-03-28 18:13:04 UTC

I am going to close as this as invalid because the bug is definitely in the verfier after my testcases showing GCC and clang the same code and with the reasonable inputs.

Comment 7 Andrew Pinski 2024-03-28 18:20:10 UTC

The problem is the verifier:
Mär 26 23:57:12 H systemd[1]: 17: (ac) w0 ^= w3                     ; R0_w=scalar() R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1))
Mär 26 23:57:12 H systemd[1]: 18: (57) r0 &= 255                    ; R0_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff))


it lost track of what the input of r0 was. It had the right result after the xor but then the bit_and didn't take into account the what was before it. It just used 0xff.

Comment 8 Andrew Pinski 2024-03-28 18:26:29 UTC

(In reply to Andrew Pinski from comment #7)
> The problem is the verifier:
> Mär 26 23:57:12 H systemd[1]: 17: (ac) w0 ^= w3                     ;
> R0_w=scalar()
> R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1))
> Mär 26 23:57:12 H systemd[1]: 18: (57) r0 &= 255                    ;
> R0_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff))
> 
> 
> it lost track of what the input of r0 was. It had the right result after the
> xor but then the bit_and didn't take into account the what was before it. It
> just used 0xff.

Or rather the verifier is keeping track of rN and rN_w seperately but GCC produces code that uses both and that definitely confuses the verifier.

Mär 26 23:57:12 H systemd[1]: 16: (77) r3 >>= 63                    ; R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1))

wait why did the verifier use r3_w here rather than just r3? that seems like the issue and would fix this issue easier ...

Comment 9 Jose E. Marchesi 2024-03-28 19:17:00 UTC

Please do not close this bug.  We are well aware this is a limitation in the kernel verifier, but we need to keep track of this.

In general, like in this case, many of the problems related to unverifiable generated code can be reduced to "this is a kernel verifier bug, make it smarter", but that doesn't mean we can't put workarounds in the back-end so we can actually compile actually-existing BPF programs (such as that systemd utility) in actually-existing current kernels.  That includes disabling certain optimizations.  I don't like it any more than you do, trust me on that.

Note that the development of the BPF backends, in both clang and GCC, are by necessity very much in lock-step with the kernel BPF support.  So this is not something we will forget to "undo" as soon as the verifier can handle these instructions.  Very likely we will be doing the kernel patch as well.

Comment 10 Andrew Pinski 2024-03-28 19:29:07 UTC

The only way to fix this is to add extra zero extends all the time.

Comment 11 Andrew Pinski 2024-03-28 19:31:42 UTC

The bpf verifier is just plain broken when it comes to subreg usage. So after every 32bit usage you need to output a zero extend. To fix that.

Comment 12 Andrew Pinski 2024-03-28 21:01:40 UTC

Let me correct the title since phiopt has nothing to do with the issue at hand. It is the subreg with bitwise and that confuses the verifier.

Comment 13 Jose E. Marchesi 2024-03-28 21:06:36 UTC

Thanks.  The new title is way better.  And thank you for the further analysis and the reproducer that also makes clang to generate the no-verifiable code!

I wonder, is the issue also there when -mno-alu32 is used?  In that case the generated code doesn't involve "subregs" (or 32-bit operations in assembly-like syntax):

	.file	"foo.c"
	.text
	.align	3
	.global	foo
	.type	foo, @function
foo:
	call	bar
	lddw	%r1,baz
	mov	%r0,%r0
	and	%r0,0xffffffff
	ldxw	%r2,[%r1+0]
	add	%r0,-1
	neg	%r2
	xor	%r0,%r2
	rsh	%r0,63
	exit
	.size	foo, .-foo
	.global	baz
	.type	baz, @object
	.lcomm	baz,4,4
	.ident	"GCC: (GNU) 14.0.1 20240226 (experimental)"

Cuper, is the verifier able to track proper values through the xor in this case?