48877 – Inline asm for rdtsc generates silly code

Bug 48877 - Inline asm for rdtsc generates silly code

Summary: Inline asm for rdtsc generates silly code

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	4.6.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-05-05 02:26 UTC by Andy Lutomirski
Modified:	2020-12-25 18:50 UTC (History)
CC List:	3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2011-05-05 06:50:18

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andy Lutomirski 2011-05-05 02:26:02 UTC

gcc -O2 -S on this input:

typedef unsigned long long u64;

u64 test()
{
  u64 low, high;
  asm volatile ("rdtsc" : "=a" (low), "=d" (high));
  return low | (high << 32);
}

generates this:

test:
.LFB0:
        .cfi_startproc
#APP
# 6 "rax_rdx.c" 1
        rdtsc
# 0 "" 2
#NO_APP
        movq    %rax, %rcx
        movq    %rdx, %rax
        salq    $32, %rax
        orq     %rcx, %rax
        ret
        .cfi_endproc

which is silly -- both movq instructions are unnecessary.

clang -O3 -fomit-frame-pointer does much better:

test:
.Leh_func_begin0:
        #APP
        rdtsc
        #NO_APP
        shlq    $32, %rdx
        orq     %rdx, %rax
        ret

Getting rid of the << 32 makes gcc generate the obvious code.

FWIW, this code:

unsigned long long rdtsc (void)
{
  unsigned int tickl, tickh;
  __asm__ __volatile__("rdtsc":"=a"(tickl),"=d"(tickh));
  return ((unsigned long long)tickh << 32)|tickl;
}
          
is copied verbatim from the manual in the "Machine Constraints" (http://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints) and generates the same silly code.

Comment 1 Jakub Jelinek 2011-05-05 06:50:18 UTC

If you use return __builtin_ia32_rdtsc (); instead, both 4.6 and 4.7 generate:
        rdtsc
        salq    $32, %rdx
        orq     %rdx, %rax
        ret
Current GCC trunk generates:
#APP
# 7 "pr48877.c" 1
        rdtsc
# 0 "" 2
#NO_APP
        salq    $32, %rdx
        orq     %rax, %rdx
        movq    %rdx, %rax
        ret
for the asm testcase, which isn't as bad as 4.6, but isn't perfect.  What matters for IRA is which pseudo is LHS/RHS1 and which is RHS2 on the orq insn,
for the builtin version LHS/RHS1 is the pseudo set by the unspecv with "=a" constraint, for the asm version it is the LHS from the shift insn.

Comment 2 Ivan Sorokin 2020-12-25 11:39:05 UTC

Modern GCC doesn't generate excessive moves for this example. It looks like the problem was fixed in 4.9.0: https://godbolt.org/z/MqE7sP .

I think the bug can be closed now.

Comment 3 Andy Lutomirski 2020-12-25 18:50:03 UTC

(In reply to Ivan Sorokin from comment #2)
> Modern GCC doesn't generate excessive moves for this example. It looks like
> the problem was fixed in 4.9.0: https://godbolt.org/z/MqE7sP .
> 
> I think the bug can be closed now.

Indeed.