User account creation filtered due to spam.
If we have an integer (0..99), we can modulo and divide by 10 to get two decimal digits, then convert to a pair of ASCII bytes with a newline by adding `00\n`. When replacing div and mod with a multiplicative inverse, gcc 6.1 uses more instructions than gcc 5.3, due to poor choices. See also https://godbolt.org/g/vvS5J6 #include <stdint.h> // assuming little-endian __attribute__((always_inline)) unsigned cvt_to_2digit(uint8_t i, uint8_t base) { return ((i / base) | (uint32_t)(i % base)<<8); } // movzbl %dil,%eax # 5.3 and 6.1, with -O3 -march=haswell // div %sil // movzwl %ax,%eax // at -Os, gcc uses a useless AND eax, 0xFFF, instead of a movzx eax,ax. I think to avoid partial-register stalls? unsigned cvt_to_2digit_ascii(uint8_t i) { return cvt_to_2digit(i, 10) + 0x0a3030; // + "00\n" converts to ASCII } Compiling with -O3 -march=haswell ## gcc 5.3 ## gcc 6.1 movzbl %dil, %edx movzbl %dil, %eax leal (%rdx,%rdx,4), %ecx leal 0(,%rax,4), %edx # requires a 4B zero displacement leal (%rdx,%rcx,8), %edx movl %eax, %ecx # lea should let us avoid mov leal (%rdx,%rdx,4), %edx addl %eax, %edx leal (%rcx,%rdx,8), %edx leal 0(,%rdx,4), %eax # requires a 4B zero displacement addl %eax, %edx shrw $11, %dx shrw $11, %dx leal (%rdx,%rdx,4), %eax leal 0(,%rdx,4), %eax # requires a 4B zero displacement. gcc5.3 didn't use any of these addl %edx, %eax movzbl %dl, %edx movzbl %dl, %edx # same after this addl %eax, %eax addl %eax, %eax subl %eax, %edi subl %eax, %edi movzbl %dil, %eax movzbl %dil, %eax sall $8, %eax sall $8, %eax orl %eax, %edx orl %eax, %edx leal 667696(%rdx), %eax leal 667696(%rdx), %eax with -mtune=haswell, it's prob. best to merge with mov ah, dil or something, rather than movzx/shift/or. Haswell has no penalty for partial-registers, but still has partial-reg renaming to avoid false dependencies: the best of both worlds. BTW, with -Os, both gcc versions compile it to movb $10, %dl movzbl %dil, %eax divb %dl andl $4095, %eax # partial reg stall. gcc does this even with -march=core2 where it matters addl $667696, %eax The AND appears to be totally useless, because the upper bytes of eax are already zero (from movzbl %dil, %eax before div). I thought the movzbl %ax, %eax in the unknown-divisor version was to avoid partial-register slowdowns, but maybe it's just based on the possible range of the result. Off-topic, but I noticed this while writing FizzBuzz in asm. http://stackoverflow.com/a/37494090/224132
Note, for always_inline attribute you need also inlike keyword. This regressed with r222874.
GCC 6.2 is being released, adjusting target milestone.
I have some ideas. Investigating...
GCC 6.3 is being released, adjusting target milestone.
Author: bernds Date: Wed Dec 21 16:45:33 2016 New Revision: 243861 URL: https://gcc.gnu.org/viewcvs?rev=243861&root=gcc&view=rev Log: PR target/71321 * config/i386/i386.md (lea<mode>_general_2b, lea<mode>_general_3b): New patterns. * config/i386/predicates.md (const123_operand): New. PR target/71321 * gcc.target/i386/pr71321.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr71321.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md trunk/gcc/config/i386/predicates.md trunk/gcc/testsuite/ChangeLog
THe regression relative to gcc-5 has been fixed on the trunk. It would be worth creating a separate bug for the unnecessary AND when compiling with -Os that all versions of GCC exhibit.
GCC 6.4 is being released, adjusting target milestone.