[PATCH] PR tree-optimization/90836 Missing popcount pattern matching

Richard Biener richard.guenther@gmail.com
Mon Sep 9 08:24:00 GMT 2019


On Fri, Sep 6, 2019 at 2:13 PM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>
> Hi,
>
> +(simplify
> +  (convert
> +    (rshift
> +      (mult
>
> > is the outer convert really necessary?  That is, if we change
> > the simplification result to
>
> Indeed that should be "convert?" to make it optional.

Rather drop it, a generated conversion should be elided by
conversion simplification.

> > Is the Hamming weight popcount
> > faster than the libgcc table-based approach?  I wonder if we really
> > need to restrict this conversion to the case where the target
> > has an expander.
>
> Well libgcc uses the exact same sequence (not a table):
>
> objdump -d ./aarch64-unknown-linux-gnu/libgcc/_popcountsi2.o
>
> 0000000000000000 <__popcountdi2>:
>    0:   d341fc01        lsr     x1, x0, #1
>    4:   b200c3e3        mov     x3, #0x101010101010101          // #72340172838076673
>    8:   9200f021        and     x1, x1, #0x5555555555555555
>    c:   cb010001        sub     x1, x0, x1
>   10:   9200e422        and     x2, x1, #0x3333333333333333
>   14:   d342fc21        lsr     x1, x1, #2
>   18:   9200e421        and     x1, x1, #0x3333333333333333
>   1c:   8b010041        add     x1, x2, x1
>   20:   8b411021        add     x1, x1, x1, lsr #4
>   24:   9200cc20        and     x0, x1, #0xf0f0f0f0f0f0f0f
>   28:   9b037c00        mul     x0, x0, x3
>   2c:   d378fc00        lsr     x0, x0, #56
>   30:   d65f03c0        ret
>
> So if you don't check for an expander you get an endless loop in libgcc since
> the makefile doesn't appear to use -fno-builtin anywhere...

Hm, must be aarch specific.  But indeed it should use -fno-builtin ...

Richard.

>
> Wilco
>



More information about the Gcc-patches mailing list