This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug c/57157] New: Poor optimization of portable rotate idiom
- From: "nisse at lysator dot liu.se" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 03 May 2013 09:19:58 +0000
- Subject: [Bug c/57157] New: Poor optimization of portable rotate idiom
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157
Bug #: 57157
Summary: Poor optimization of portable rotate idiom
Classification: Unclassified
Product: gcc
Version: 4.6.3
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: nisse@lysator.liu.se
The standard rotate idiom,
(x << n) | (x >> (32 - n))
is recognized by gcc (for concreteness, I discuss only the case that x
is an uint32_t here).
However, this is portable C only for n in the range 0 < n < 32. For n
== 0, we get x >> 32 which gives undefined behaviour according to the
C standard (6.5.7, Bitwise shift operators). To portably support n ==
0, one has to write the rotate as something like
(x << n) | (x >> ((-n) & 31))
And this is apparently not recognized by gcc. I compiled this test
program with "gcc -O3 -c -save-temps rot.c". Using gcc-4.6.3 on
GNU/Linux x86_64 (ubuntu):
typedef unsigned int uint32_t;
/* Allows 0 < n < 32 (n == 0 gives undefined behaviour) */
uint32_t
rot1(unsigned n, uint32_t x)
{
return (x << n) | (x >> (32 - n));
}
/* Allows 0 <= n < 32 */
uint32_t
rot2(unsigned n, uint32_t x)
{
return (x << n) | (x >> ((- n) & 31));
}
Generated assembler
.file "rot.c"
.text
.p2align 4,,15
.globl rot1
.type rot1, @function
rot1:
.LFB0:
.cfi_startproc
movl %esi, %eax
movl %edi, %ecx
roll %cl, %eax
ret
.cfi_endproc
.LFE0:
.size rot1, .-rot1
.p2align 4,,15
.globl rot2
.type rot2, @function
rot2:
.LFB1:
.cfi_startproc
movl %edi, %ecx
movl %esi, %eax
negl %ecx
shrl %cl, %eax
movl %edi, %ecx
sall %cl, %esi
orl %esi, %eax
ret
.cfi_endproc
.LFE1:
.size rot2, .-rot2
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
The desired result is of course to get a rotl instruction also for
rot2, instead of the combination of negl, shrl, sall, and orl.
Applying the above portability fix to my ROTL32 macro in GNU Nettle
results in a slowdown of almost 20% for cast128. This function depends
a lot on key-dependant rotates, where rotation counts of zero will
happen for some keys.