[Bug target/104250] New: [i386] GCC may want to use 32-bit (I)DIV if it can for 64-bit operands

Wed Jan 26 18:30:09 GMT 2022

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104250

            Bug ID: 104250
           Summary: [i386] GCC may want to use 32-bit (I)DIV if it can for
                    64-bit operands
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: thiago at kde dot org
  Target Milestone: ---

In
long long f1(long long n, long long d)
{
    return n / d;
}

GCC generates:

        movq    %rdi, %rax
        cqto
        idivq   %rsi
        ret

Which is fine, except that the 64-bit IDIV instruction is significantly slower
than the 32-bit (I)DIV. In recent CPUs (such as PMC, SNC, WLC, GLC), that's 18
vs 14 cycles, but it was much worse in older CPUs. There's still a significant
difference for Atom cores, such as used in Alder Lake-E.

Clang generates:
        movq    %rdi, %rax
        movq    %rdi, %rcx
        orq     %rsi, %rcx
        shrq    $32, %rcx
        je      .LBB0_1
        cqto
        idivq   %rsi
        retq
.LBB0_1:
        xorl    %edx, %edx
        divl    %esi
        retq

That is, it ORs the two operands and checks if any bit in the upper half is
set. If so, it performs the 64-bit division; otherwise, it performs the 32-bit
one.

References:
https://gcc.godbolt.org/z/385a3da8q
https://uops.info/html-instr/IDIV_R32.html
https://uops.info/html-instr/IDIV_R64.html