[Bug middle-end/57529] New: Redundant masking of zero-extended values
jewillco at osl dot iu.edu
gcc-bugzilla@gcc.gnu.org
Tue Jun 4 18:36:00 GMT 2013
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57529
Bug ID: 57529
Summary: Redundant masking of zero-extended values
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: jewillco at osl dot iu.edu
Using version "gcc (GCC) 4.9.0 20130519 (experimental)" with target
"x86_64-unknown-linux-gnu" and the flags "-Ofast -std=gnu99 -march=bdver1", the
following code:
#include <stdint.h>
void foo(const uint16_t* restrict indexes, const uint64_t* restrict bits,
unsigned int* restrict sum, int count) {
for (int i = 0; i < count; ++i) {
unsigned int val = indexes[i];
if (bits[val / 64] & (1UL << (val % 64))) {sum[val] += 1;}
}
}
produces two shifts to implement the "val / 64" operation instead of one,
seemingly because the compiler is trying to mask val to 16 bits even though it
was loaded with movzwl and thus was already masked and zero-extended. Here is
the assembly for the function body:
testl %ecx, %ecx # count
movl %ecx, %r9d # count, count
jle .L8 #,
xorl %eax, %eax # ivtmp.5
.p2align 4,,10
.p2align 3
.L4:
movzwl (%rdi,%rax,2), %ecx # MEM[base: indexes_8(D), index:
ivtmp.5_52, step: 2, offset: 0B], D.2242
movq %rcx, %r8 # D.2242, D.2244
# **************** Redundant masking operation:
salq $48, %r8 #, D.2244
shrq $54, %r8 #, D.2244
# ****************
movq (%rsi,%r8,8), %r8 # *_16, D.2244
# ++++++++++++++++
shrq %cl, %r8 # D.2242, D.2244
andl $1, %r8d #, D.2244
# ++++++++++++++++
je .L3 #,
# xxxxxxxxxxxxxxxx
movzwl %cx, %r8d # D.2242, D.2244
# xxxxxxxxxxxxxxxx
incl (%rdx,%r8,4) # *_25
.L3:
incq %rax # ivtmp.5
cmpl %eax, %r9d # ivtmp.5, count
jg .L4 #,
.L8:
rep; ret
The seemingly-unnecessary operation is marked with stars; a single shrq by 6
should do the unsigned division operation correctly, while two instructions are
used to both mask the value to 16 bits and shift it. The zero-extension inside
x's is also unnecessary (%rcx could have been used directly in the index
expression). On a somewhat unrelated issue, the code marked in +'s seems to be
sub-optimal as well, and could probably be replaced by a bt instruction (GCC
4.4.7 uses "btq" there using -O3 and the same -march flag).
More information about the Gcc-bugs
mailing list