User account creation filtered due to spam.

Bug 71321 - [6 Regression] x86: worse code for uint8_t % 10 and / 10
Summary: [6 Regression] x86: worse code for uint8_t % 10 and / 10
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 6.1.0
: P2 normal
Target Milestone: 6.5
Assignee: Bernd Schmidt
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2016-05-28 02:37 UTC by Peter Cordes
Modified: 2017-07-04 08:50 UTC (History)
3 users (show)

See Also:
Host:
Target: i386-linux-gnu, x86_64-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-05-28 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Cordes 2016-05-28 02:37:34 UTC
If we have an integer (0..99), we can modulo and divide by 10 to get two decimal digits, then convert to a pair of ASCII bytes with a newline by adding `00\n`.   When replacing div and mod with a multiplicative inverse, gcc 6.1 uses more instructions than gcc 5.3, due to poor choices.

See also https://godbolt.org/g/vvS5J6

#include <stdint.h>
// assuming little-endian
__attribute__((always_inline)) 
unsigned cvt_to_2digit(uint8_t i, uint8_t base) {
  return ((i / base) | (uint32_t)(i % base)<<8);
}
  // movzbl %dil,%eax    # 5.3 and 6.1, with -O3 -march=haswell
  // div    %sil
  // movzwl %ax,%eax

// at -Os, gcc uses a useless  AND eax, 0xFFF, instead of a movzx eax,ax.  I think to avoid partial-register stalls?
unsigned cvt_to_2digit_ascii(uint8_t i) {
  return cvt_to_2digit(i, 10) + 0x0a3030;    // + "00\n" converts to ASCII
}

Compiling with -O3 -march=haswell
        ## gcc 5.3                         ## gcc 6.1
        movzbl  %dil, %edx                 movzbl  %dil, %eax
        leal    (%rdx,%rdx,4), %ecx        leal    0(,%rax,4), %edx   # requires a 4B zero displacement
        leal    (%rdx,%rcx,8), %edx        movl    %eax, %ecx         # lea should let us avoid mov
        leal    (%rdx,%rdx,4), %edx        addl    %eax, %edx
                                           leal    (%rcx,%rdx,8), %edx
                                           leal    0(,%rdx,4), %eax   # requires a 4B zero displacement
                                           addl    %eax, %edx
        shrw    $11, %dx                   shrw    $11, %dx
        leal    (%rdx,%rdx,4), %eax        leal    0(,%rdx,4), %eax   # requires a 4B zero displacement.  gcc5.3 didn't use any of these
                                           addl    %edx, %eax
        movzbl  %dl, %edx                  movzbl  %dl, %edx       # same after this
        addl    %eax, %eax                 addl    %eax, %eax
        subl    %eax, %edi                 subl    %eax, %edi
        movzbl  %dil, %eax                 movzbl  %dil, %eax
        sall    $8, %eax                   sall    $8, %eax
        orl     %eax, %edx                 orl     %eax, %edx
        leal    667696(%rdx), %eax         leal    667696(%rdx), %eax

with -mtune=haswell, it's  prob. best to merge with   mov ah, dil  or something, rather than movzx/shift/or.  Haswell has no penalty for partial-registers, but still has partial-reg renaming to avoid false dependencies: the best of both worlds.



BTW, with -Os, both gcc versions compile it to

        movb    $10, %dl
        movzbl  %dil, %eax
        divb    %dl
        andl    $4095, %eax      # partial reg stall.  gcc does this even with -march=core2 where it matters
        addl    $667696, %eax

The AND appears to be totally useless, because the upper bytes of eax are already zero (from movzbl %dil, %eax before div).  I thought the movzbl %ax, %eax  in the unknown-divisor version was to avoid partial-register slowdowns, but maybe it's just based on the possible range of the result.

Off-topic, but I noticed this while writing FizzBuzz in asm.  http://stackoverflow.com/a/37494090/224132
Comment 1 Jakub Jelinek 2016-05-28 15:14:21 UTC
Note, for always_inline attribute you need also inlike keyword.
This regressed with r222874.
Comment 2 Richard Biener 2016-08-22 08:44:34 UTC
GCC 6.2 is being released, adjusting target milestone.
Comment 3 Richard Biener 2016-08-22 08:46:05 UTC
GCC 6.2 is being released, adjusting target milestone.
Comment 4 Bernd Schmidt 2016-12-15 15:13:04 UTC
I have some ideas. Investigating...
Comment 5 Jakub Jelinek 2016-12-21 10:59:10 UTC
GCC 6.3 is being released, adjusting target milestone.
Comment 6 Bernd Schmidt 2016-12-21 16:46:08 UTC
Author: bernds
Date: Wed Dec 21 16:45:33 2016
New Revision: 243861

URL: https://gcc.gnu.org/viewcvs?rev=243861&root=gcc&view=rev
Log:

	PR target/71321
	* config/i386/i386.md (lea<mode>_general_2b, lea<mode>_general_3b): New
	patterns.
	* config/i386/predicates.md (const123_operand): New.

	PR target/71321
	* gcc.target/i386/pr71321.c: New test.


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr71321.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.md
    trunk/gcc/config/i386/predicates.md
    trunk/gcc/testsuite/ChangeLog
Comment 7 Jeffrey A. Law 2017-01-13 18:24:16 UTC
THe regression relative to gcc-5 has been fixed on the trunk.

It would be worth creating a separate bug for the unnecessary AND when compiling with -Os that all versions of GCC exhibit.
Comment 8 Richard Biener 2017-07-04 08:50:09 UTC
GCC 6.4 is being released, adjusting target milestone.