This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH][GCC] Simplification of 1U << (31 - x)

From: Richard Biener <richard dot guenther at gmail dot com>
To: Jakub Jelinek <jakub at redhat dot com>
Cc: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, Sudi Das <Sudi dot Das at arm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>, James Greenhalgh <James dot Greenhalgh at arm dot com>
Date: Thu, 13 Apr 2017 13:49:01 +0200
Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
Authentication-results: sourceware.org; auth=none
References: <AM5PR0802MB2610B3E04DF2484B04208CEC83020@AM5PR0802MB2610.eurprd08.prod.outlook.com> <20170413112151.GD1809@tucnak> <AM5PR0802MB2610B75CC3BDBA5C021B3DA083020@AM5PR0802MB2610.eurprd08.prod.outlook.com> <20170413114125.GE1809@tucnak>

On Thu, Apr 13, 2017 at 1:41 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Apr 13, 2017 at 11:33:12AM +0000, Wilco Dijkstra wrote:
>> Jakub Jelinek wrote:
>>
>> > No.  Some constants sometimes even 7 instructions (e.g. sparc64; not talking
>> > in particular about 1ULL << 63 constant), or have one instruction
>> > that is more expensive than normal small constant load.  Compare say x86_64
>> > movl/movq vs. movabsq, I think the latter has 3 times longer latency on many
>> > CPUs.  So no, I think it isn't an unconditional win.
>>
>> We're specifically only talking about the constants (1L << 63), (1 << 31) and (1 << 15).
>> On all targets these need at most 2 simple instructions. That makes it an unconditional win.
>
> It is not a win on at least Haswell-E:
> __attribute__((noinline, noclone)) unsigned long long int
> foo (int x)
> {
>   asm volatile ("" : : : "memory");
>   return 1ULL << (63 - x);
> }
>
> __attribute__((noinline, noclone)) unsigned long long int
> bar (int x)
> {
>   asm volatile ("" : : : "memory");
>   return (1ULL << 63) >> x;
> }
>
> int
> main (int argc, const char **argv)
> {
>   int i;
>   if (argc == 1)
>     for (i = 0; i < 1000000000; i++)
>       asm volatile ("" : : "r" (foo (13)));
>   else
>     for (i = 0; i < 1000000000; i++)
>       asm volatile ("" : : "r" (bar (13)));
>   return 0;
> }
>
> $ time /tmp/test
>
> real    0m1.290s
> user    0m1.288s
> sys     0m0.002s
> $ time /tmp/test 1
>
> real    0m1.542s
> user    0m1.540s
> sys     0m0.002s
>
> As I said, movabsq is expensive.

It is IMHO a valid GIMPLE optimization / canonicalization.

        movabsq $-9223372036854775808, %rax

so this should then have been generated as 1<<63?

At some point variable shifts were quite expensive as well..

Richard.

>         Jakub

Follow-Ups:
- Re: [PATCH][GCC] Simplification of 1U << (31 - x)
  - From: Jakub Jelinek
- Re: [PATCH][GCC] Simplification of 1U << (31 - x)
  - From: Wilco Dijkstra

References:
- Re: [PATCH][GCC] Simplification of 1U << (31 - x)
  - From: Wilco Dijkstra
- Re: [PATCH][GCC] Simplification of 1U << (31 - x)
  - From: Jakub Jelinek
- Re: [PATCH][GCC] Simplification of 1U << (31 - x)
  - From: Wilco Dijkstra
- Re: [PATCH][GCC] Simplification of 1U << (31 - x)
  - From: Jakub Jelinek

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]