[PATCH] PR tree-optimization/90836 Missing popcount pattern matching
Richard Biener
richard.guenther@gmail.com
Mon Sep 9 08:24:00 GMT 2019
On Fri, Sep 6, 2019 at 2:13 PM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>
> Hi,
>
> +(simplify
> + (convert
> + (rshift
> + (mult
>
> > is the outer convert really necessary? That is, if we change
> > the simplification result to
>
> Indeed that should be "convert?" to make it optional.
Rather drop it, a generated conversion should be elided by
conversion simplification.
> > Is the Hamming weight popcount
> > faster than the libgcc table-based approach? I wonder if we really
> > need to restrict this conversion to the case where the target
> > has an expander.
>
> Well libgcc uses the exact same sequence (not a table):
>
> objdump -d ./aarch64-unknown-linux-gnu/libgcc/_popcountsi2.o
>
> 0000000000000000 <__popcountdi2>:
> 0: d341fc01 lsr x1, x0, #1
> 4: b200c3e3 mov x3, #0x101010101010101 // #72340172838076673
> 8: 9200f021 and x1, x1, #0x5555555555555555
> c: cb010001 sub x1, x0, x1
> 10: 9200e422 and x2, x1, #0x3333333333333333
> 14: d342fc21 lsr x1, x1, #2
> 18: 9200e421 and x1, x1, #0x3333333333333333
> 1c: 8b010041 add x1, x2, x1
> 20: 8b411021 add x1, x1, x1, lsr #4
> 24: 9200cc20 and x0, x1, #0xf0f0f0f0f0f0f0f
> 28: 9b037c00 mul x0, x0, x3
> 2c: d378fc00 lsr x0, x0, #56
> 30: d65f03c0 ret
>
> So if you don't check for an expander you get an endless loop in libgcc since
> the makefile doesn't appear to use -fno-builtin anywhere...
Hm, must be aarch specific. But indeed it should use -fno-builtin ...
Richard.
>
> Wilco
>
More information about the Gcc-patches
mailing list