This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] PR tree-optimization/90836 Missing popcount pattern matching
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>, "dmitrij dot pochepko at bell-sw dot com" <dmitrij dot pochepko at bell-sw dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Cc: nd <nd at arm dot com>
- Date: Fri, 6 Sep 2019 12:13:34 +0000
- Subject: Re: [PATCH] PR tree-optimization/90836 Missing popcount pattern matching
- Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=z1w6ykc6CaUu0iU0ZPkxp2NOxzJqor9wHgKNBs3ip/c=; b=XZgnivNkYUkYMX/9/AsZSFGW+nMYedbF0QSMv+yGpoNlpW+Id+rELpGN2U7RxYbjA6L/0s9N/MFVS6Qm8IREIVkjGk1wI1a8jOXw+zg9/6EW6CVnMaLJlc0ongUba8eeNtqkuMPQD8Gcifk+UP7Kr3E8Kzpu0mJrznn1TF0MMmpcl5QOWtnOYPcGZ0rlzfXRslWQlI+O3F9hbSh9B8vY41zAOyhqtPyhsrOdl5jXa6J+Y8bQHBC2WurJXr6euFvTMaYeS1GSMSDYNj6zNJn9PxoLEj6o4eHawWJjLJ6Yxyn6wBmYMgugnGv7EOsBcJ8D0yM9vJoOzMb6+RJv3zXQEg==
- Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VhkOoWtadAuGQr+suUDGIsXiD97ii5JS9kKm8i5gluBMyVjSOy4royd9zZWp8zufMmOS13i4e6ffgaxcZx6aaO5If09ExMObKhPCxoUdr47vTx5Kos+PnqY5dsy3HXhH46ACYnHSqFtlguTjR3y9Q8PguGS5E2e16N24uBgKUcKH1a0LowwEk+bKRjX1mP0ByPIxcvZjCDcB3dBaX9SzgyfJe9Ls4m5K9+RB33UFgQoiVHsXECpDE0lKi7tPPuRryWsPAMy7HsvM6wLP6r9V0GnG/Uaa3jnXk9R5jdpEgKXOLD3bB7K9AEaLE3gZJgoSsFgm1OeXljVuMjRCSc4Ynw==
- Original-authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco dot Dijkstra at arm dot com;
Hi,
+(simplify
+ (convert
+ (rshift
+ (mult
> is the outer convert really necessary? That is, if we change
> the simplification result to
Indeed that should be "convert?" to make it optional.
> Is the Hamming weight popcount
> faster than the libgcc table-based approach? I wonder if we really
> need to restrict this conversion to the case where the target
> has an expander.
Well libgcc uses the exact same sequence (not a table):
objdump -d ./aarch64-unknown-linux-gnu/libgcc/_popcountsi2.o
0000000000000000 <__popcountdi2>:
0: d341fc01 lsr x1, x0, #1
4: b200c3e3 mov x3, #0x101010101010101 // #72340172838076673
8: 9200f021 and x1, x1, #0x5555555555555555
c: cb010001 sub x1, x0, x1
10: 9200e422 and x2, x1, #0x3333333333333333
14: d342fc21 lsr x1, x1, #2
18: 9200e421 and x1, x1, #0x3333333333333333
1c: 8b010041 add x1, x2, x1
20: 8b411021 add x1, x1, x1, lsr #4
24: 9200cc20 and x0, x1, #0xf0f0f0f0f0f0f0f
28: 9b037c00 mul x0, x0, x3
2c: d378fc00 lsr x0, x0, #56
30: d65f03c0 ret
So if you don't check for an expander you get an endless loop in libgcc since
the makefile doesn't appear to use -fno-builtin anywhere...
Wilco