Bug 100809 - PPC: __int128 divide/modulo does not use P10 instructions vdivsq/vdivuq
Summary: PPC: __int128 divide/modulo does not use P10 instructions vdivsq/vdivuq
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 10.2.1
: P3 normal
Target Milestone: ---
Assignee: Michael Meissner
URL:
Keywords: missed-optimization
Depends on:
Blocks: 61030
  Show dependency treegraph
 
Reported: 2021-05-28 07:58 UTC by Jens Seifert
Modified: 2021-08-15 12:11 UTC (History)
6 users (show)

See Also:
Host:
Target: powerpc*-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-06-01 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jens Seifert 2021-05-28 07:58:35 UTC
unsigned __int128 div(unsigned __int128 a, unsigned __int128 b)
{
   return a/b;
}

__int128 div(__int128 a, __int128 b)
{
   return a/b;
}

gcc -mcpu=power10 -save-temps -O2 int128.C

Output:
_Z3divoo:
.LFB0:
        .cfi_startproc
        .localentry     _Z3divoo,1
        mflr 0
        std 0,16(1)
        stdu 1,-32(1)
        .cfi_def_cfa_offset 32
        .cfi_offset 65, 16
        bl __udivti3@notoc
        addi 1,1,32
        .cfi_def_cfa_offset 0
        ld 0,16(1)
        mtlr 0
        .cfi_restore 65
        blr
        .long 0
        .byte 0,9,0,1,128,0,0,0
        .cfi_endproc
.LFE0:
        .size   _Z3divoo,.-_Z3divoo
        .globl __divti3
        .align 2
        .p2align 4,,15
        .globl _Z3divnn
        .type   _Z3divnn, @function
_Z3divnn:
.LFB1:
        .cfi_startproc
        .localentry     _Z3divnn,1
        mflr 0
        std 0,16(1)
        stdu 1,-32(1)
        .cfi_def_cfa_offset 32
        .cfi_offset 65, 16
        bl __divti3@notoc
        addi 1,1,32
        .cfi_def_cfa_offset 0
        ld 0,16(1)
        mtlr 0
        .cfi_restore 65
        blr
        .long 0
        .byte 0,9,0,1,128,0,0,0
        .cfi_endproc

Expected is the use of vdivsq/vdivuq.

GCC version:

/opt/rh/devtoolset-10/root/usr/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/opt/rh/devtoolset-10/root/usr/bin/gcc
COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-10/root/usr/libexec/gcc/ppc64le-redhat-linux/10/lto-wrapper
Target: ppc64le-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/opt/rh/devtoolset-10/root/usr --mandir=/opt/rh/devtoolset-10/root/usr/share/man --infodir=/opt/rh/devtoolset-10/root/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-targets=powerpcle-linux --disable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --with-default-libstdcxx-abi=gcc4-compatible --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-10.2.1-20200804/obj-ppc64le-redhat-linux/isl-install --disable-libmpx --enable-gnu-indirect-function --enable-secureplt --with-long-double-128 --with-cpu-32=power8 --with-tune-32=power8 --with-cpu-64=power8 --with-tune-64=power8 --build=ppc64le-redhat-linux
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.2.1 20200804 (Red Hat 10.2.1-2) (GCC)
Comment 1 Jens Seifert 2021-05-28 08:12:05 UTC
Same applies to modulo.
Comment 2 Bill Schmidt 2021-06-01 14:59:58 UTC
I believe this work is pending, but the patches are still under review.
Comment 3 Michael Meissner 2021-06-01 22:55:20 UTC
Carl Love submitted a patch for this on April 26th.
Comment 4 Michael Meissner 2021-06-01 22:58:31 UTC
Note, in looking at Carl's patch, it is only for adding the built-ins.  I don't believe it adds direct support for {,u}divti3 and {,u}moddti3 to implement these for normal __int128 variables.
Comment 5 Michael Meissner 2021-06-04 20:17:32 UTC
Patch submitted:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571942.html
Comment 6 CVS Commits 2021-07-08 01:56:23 UTC
The master branch has been updated by Michael Meissner <meissner@gcc.gnu.org>:

https://gcc.gnu.org/g:852b11da11a181df517c0348df044354ff0656d6

commit r12-2135-g852b11da11a181df517c0348df044354ff0656d6
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Jul 7 21:55:38 2021 -0400

    Generate 128-bit int divide/modulus on power10.
    
    This patch adds support for the VDIVSQ, VDIVUQ, VMODSQ, and VMODUQ
    instructions to do 128-bit arithmetic.
    
    2021-07-07  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
            PR target/100809
            * config/rs6000/rs6000.md (udivti3): New insn.
            (divti3): New insn.
            (umodti3): New insn.
            (modti3): New insn.
    
    gcc/testsuite/
            PR target/100809
            * gcc.target/powerpc/p10-vdivq-vmodq.c: New test.
Comment 7 CVS Commits 2021-07-14 17:25:51 UTC
The releases/gcc-11 branch has been updated by Michael Meissner <meissner@gcc.gnu.org>:

https://gcc.gnu.org/g:8ebcd3608584e544ae8e7c422b3f2400758c47f5

commit r11-8743-g8ebcd3608584e544ae8e7c422b3f2400758c47f5
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Jul 14 13:23:51 2021 -0400

    Generate 128-bit int divide/modulus on power10.
    
    This patch adds support for the VDIVSQ, VDIVUQ, VMODSQ, and VMODUQ
    instructions to do 128-bit arithmetic.
    
    Backported from master: 2021-07-07  Michael Meissner  <meissner@linux.ibm.com>
    
    2021-07-14  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
            PR target/100809
            * config/rs6000/rs6000.md (udivti3): New insn.
            (divti3): New insn.
            (umodti3): New insn.
            (modti3): New insn.
    
    gcc/testsuite/
            PR target/100809
            * gcc.target/powerpc/p10-vdivq-vmodq.c: New test.
Comment 8 Michael Meissner 2021-07-14 17:53:22 UTC
Patch applied to mainline and GCC 11 branches.  PR closed.