This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug c/57157] New: Poor optimization of portable rotate idiom

From: "nisse at lysator dot liu.se" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Fri, 03 May 2013 09:19:58 +0000
Subject: [Bug c/57157] New: Poor optimization of portable rotate idiom
Auto-submitted: auto-generated

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157

             Bug #: 57157
           Summary: Poor optimization of portable rotate idiom
    Classification: Unclassified
           Product: gcc
           Version: 4.6.3
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: nisse@lysator.liu.se


The standard rotate idiom,

  (x << n) | (x >> (32 - n))

is recognized by gcc (for concreteness, I discuss only the case that x
is an uint32_t here).

However, this is portable C only for n in the range 0 < n < 32. For n
== 0, we get x >> 32 which gives undefined behaviour according to the
C standard (6.5.7, Bitwise shift operators). To portably support n ==
0, one has to write the rotate as something like

  (x << n) | (x >> ((-n) & 31))

And this is apparently not recognized by gcc. I compiled this test
program with "gcc -O3 -c -save-temps rot.c". Using gcc-4.6.3 on
GNU/Linux x86_64 (ubuntu):

typedef unsigned int uint32_t;

/* Allows 0 < n < 32 (n == 0 gives undefined behaviour) */
uint32_t
rot1(unsigned n, uint32_t x)
{
  return (x << n) | (x >> (32 - n));
}

/* Allows 0 <= n < 32 */
uint32_t
rot2(unsigned n, uint32_t x)
{
  return (x << n) | (x >> ((- n) & 31));
}

Generated assembler

    .file    "rot.c"
    .text
    .p2align 4,,15
    .globl    rot1
    .type    rot1, @function
rot1:
.LFB0:
    .cfi_startproc
    movl    %esi, %eax
    movl    %edi, %ecx
    roll    %cl, %eax
    ret
    .cfi_endproc
.LFE0:
    .size    rot1, .-rot1
    .p2align 4,,15
    .globl    rot2
    .type    rot2, @function
rot2:
.LFB1:
    .cfi_startproc
    movl    %edi, %ecx
    movl    %esi, %eax
    negl    %ecx
    shrl    %cl, %eax
    movl    %edi, %ecx
    sall    %cl, %esi
    orl    %esi, %eax
    ret
    .cfi_endproc
.LFE1:
    .size    rot2, .-rot2
    .ident    "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
    .section    .note.GNU-stack,"",@progbits

The desired result is of course to get a rotl instruction also for
rot2, instead of the combination of negl, shrl, sall, and orl.

Applying the above portability fix to my ROTL32 macro in GNU Nettle
results in a slowdown of almost 20% for cast128. This function depends
a lot on key-dependant rotates, where rotation counts of zero will
happen for some keys.

Follow-Ups:
- [Bug c/57157] Poor optimization of portable rotate idiom
  - From: rguenth at gcc dot gnu.org
- [Bug c/57157] Poor optimization of portable rotate idiom
  - From: glisse at gcc dot gnu.org
- [Bug c/57157] Poor optimization of portable rotate idiom
  - From: jakub at gcc dot gnu.org
- [Bug c/57157] Poor optimization of portable rotate idiom
  - From: jakub at gcc dot gnu.org
- [Bug c/57157] Poor optimization of portable rotate idiom
  - From: jakub at gcc dot gnu.org

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]