Re: [PATCH] PR tree-optimization/90836 Missing popcount pattern matching


+  (convert
+    (rshift
+      (mult

> is the outer convert really necessary?  That is, if we change
> the simplification result to

Indeed that should be "convert?" to make it optional.

> Is the Hamming weight popcount
> faster than the libgcc table-based approach?  I wonder if we really
> need to restrict this conversion to the case where the target
> has an expander.

Well libgcc uses the exact same sequence (not a table):

objdump -d ./aarch64-unknown-linux-gnu/libgcc/_popcountsi2.o

0000000000000000 <__popcountdi2>:
   0:	d341fc01 	lsr	x1, x0, #1
   4:	b200c3e3 	mov	x3, #0x101010101010101     	// #72340172838076673
   8:	9200f021 	and	x1, x1, #0x5555555555555555
   c:	cb010001 	sub	x1, x0, x1
  10:	9200e422 	and	x2, x1, #0x3333333333333333
  14:	d342fc21 	lsr	x1, x1, #2
  18:	9200e421 	and	x1, x1, #0x3333333333333333
  1c:	8b010041 	add	x1, x2, x1
  20:	8b411021 	add	x1, x1, x1, lsr #4
  24:	9200cc20 	and	x0, x1, #0xf0f0f0f0f0f0f0f
  28:	9b037c00 	mul	x0, x0, x3
  2c:	d378fc00 	lsr	x0, x0, #56
  30:	d65f03c0 	ret

So if you don't check for an expander you get an endless loop in libgcc since
the makefile doesn't appear to use -fno-builtin anywhere...


