Bug 7625

Summary: gcc pessimized 64-bit % operator on hppa2.0
Product: gcc Reporter: dtucker
Component: targetAssignee: Not yet assigned to anyone <unassigned>
Status: ASSIGNED ---    
Severity: normal CC: gcc-bugs, steven
Priority: P3 Keywords: missed-optimization
Version: 3.2   
Target Milestone: ---   
Host: hppa2.0w-hp-hpux11.00 Target: hppa2.0w-hp-hpux11.00
Build: hppa2.0w-hp-hpux11.00 Known to work:
Known to fail: Last reconfirmed: 2005-09-07 17:37:19
Attachments: longmodtest.i.bz2

Description dtucker 2002-08-18 04:36:00 UTC
GCC seems to compile code for the 64-bit "%" operator that is about 6 times
slower that the HP native compiler on HPPA2.0 machines, even with -march=2.0.

This was noticed affecting OpenSSL DSA operations and identified by Deron
Meranda . For background, please see
http://marc.theaimsgroup.com/?l=openssh-unix-dev&m=102646106016694&w=2

$ cat logmodtest.c
#include <stdio.h>

int
main()
{
        unsigned long long i, a=0;

        for(i=2000000; i; --i)
                a += (i+10) % i;

        printf("Result=%llu\n", a);
        exit(0);
}

$ cc +O3 longmodtest.c
$ time ./a.out
Result=19999913

real    0m0.649s
user    0m0.650s
sys     0m0.000s

$ gcc -O3 -march=2.0 longmodtest.c
$ time ./a.out
Result=19999913

real    0m3.712s
user    0m3.700s
sys     0m0.020s

Release:
3.2

Environment:
System: HP-UX c240 B.11.00 A 9000/782 2007058445 two-user license

host: hppa2.0w-hp-hpux11.00
build: hppa2.0w-hp-hpux11.00
target: hppa2.0w-hp-hpux11.00
configured with: ../gcc-3.2/configure --with-as=/usr/local/hppa2.0w-hp-hpux11.00/bin/as --with-gnu-as --with-ld=/usr/ccs/bin/ld --enable-languages=c,c++

How-To-Repeat:
$ gcc -O3 -march=2.0 -v -save-temps longmodtest.c
Reading specs from /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/specs
Configured with: ../gcc-3.2/configure --with-as=/usr/local/hppa2.0w-hp-hpux11.00/bin/as --with-gnu-as --with-ld=/usr/ccs/bin/ld --enable-languages=c,c++
Thread model: single
gcc version 3.2
 /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/cpp0 -lang-c -v -D__GNUC__=3 -D__GNUC_MINOR__=2 -D__GNUC_PATCHLEVEL__=0 -D__GXX_ABI_VERSION=102 -Dhppa -Dhp9000s800 -D__hp9000s800 -Dhp9k8 -DPWB -Dhpux -Dunix -D__hppa__ -D__hp9000s800__ -D__hp9000s800 -D__hp9k8__ -D__PWB__ -D__hpux__ -D__unix__ -D__hppa -D__hp9000s800 -D__hp9k8 -D__PWB -D__hpux -D__unix -Asystem=unix -Asystem=hpux -Acpu=hppa -Amachine=hppa -D__OPTIMIZE__ -D__STDC_HOSTED__=1 -D_PA_RISC1_1 -D__hp9000s700 -D_HPUX_SOURCE -D_HIUX_SOURCE -D__STDC_EXT__ -D_INCLUDE_LONGLONG longmodtest.c longmodtest.i
GNU CPP version 3.2 (cpplib) (hppa)
ignoring nonexistent directory "NONE/include"
ignoring nonexistent directory "/usr/local/hppa2.0w-hp-hpux11.00/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/include
 /usr/include
End of search list.
 /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/cc1 -fpreprocessed longmodtest.i -quiet -dumpbase longmodtest.c -march=2.0 -O3 -version -o longmodtest.s
GNU CPP version 3.2 (cpplib) (hppa)
GNU C version 3.2 (hppa2.0w-hp-hpux11.00)
        compiled by GNU C version 3.2.
 /usr/local/hppa2.0w-hp-hpux11.00/bin/as --traditional-format -o longmodtest.o longmodtest.s
 /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/collect2 -L/lib/pa1.1 -L/usr/lib/pa1.1 -z -u main /usr/ccs/lib/crt0.o -L/usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2 -L/usr/ccs/bin -L/usr/ccs/lib -L/opt/langtools/lib -L/usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/../../.. longmodtest.o -lgcc -lgcc_eh -lc -lgcc -lgcc_eh

See attachments for longmodtest.i.bz2

For comparison, if you split the "%" operation out into a separate source
file:

unsigned long long longmod(unsigned long long a, unsigned long long b)
{
        return(a % b);
}

the HP compiler produces the following assembler output:

        .LEVEL  2.0N

        .SPACE  $TEXT$,SORT=8
        .SUBSPA $CODE$,QUAD=0,ALIGN=4,ACCESS=0x2c,CODE_ONLY,SORT=24
longmod
        .PROC
        .CALLINFO FRAME=0,ARGS_SAVED,ORDERING_AWARE
        .ENTRY
        DEPD    %r25,31,32,%r26 ;offset 0x0
        DEPD    %r23,31,32,%r24 ;offset 0x4
        EXTRD,U %r26,31,32,%r25 ;offset 0x8
        .CALL   ;in=23,24,25,26;out=21,22,28,29; (MILLICALL)
        B,L     $$rem2U,%r31    ;offset 0xc
        EXTRD,U %r24,31,32,%r23 ;offset 0x10
        DEPD    %r28,31,32,%r29 ;offset 0x14
$00000002
$L0
        BVE     (%r2)   ;offset 0x18
        .EXIT
        EXTRD,U %r29,31,32,%r28 ;offset 0x1c
        .PROCEND        ;in=23,25;out=28,29;fpin=105,107;

        .SPACE  $TEXT$
        .SUBSPA $CODE$
        .SPACE  $PRIVATE$,SORT=16
        .SPACE  $TEXT$
        .SUBSPA $CODE$
        .EXPORT longmod,ENTRY,PRIV_LEV=3,ARGW0=GR,ARGW1=GR,ARGW2=GR,ARGW3=GR,RTNVAL=GR,LONG_RETURN
        .IMPORT $$rem2U,MILLICODE
        .END
Comment 1 dtucker 2002-08-18 04:36:00 UTC
Fix:
	Be patient :-)
Comment 2 John David Anglin 2002-11-15 15:25:18 UTC
Responsible-Changed-From-To: unassigned->danglin
Responsible-Changed-Why: Assignment.
Comment 3 John David Anglin 2002-11-15 15:25:18 UTC
State-Changed-From-To: open->analyzed
State-Changed-Why: Problem confirmed.  GCC currently uses __umoddi3 from
    libgcc2.c for the operation.  We need to add pattern
    to allow use of $$rem2U when available.  We don't
    currently have this routine in the millicode routines
    used with linux.
    
    I suspect there may be other 64-bit operations that
    are pessimized by using generic libgcc code.
Comment 4 Steven Bosscher 2006-04-10 19:49:04 UTC
Boooooooiinngggggg.......

Or, is anyone working on this?
Comment 5 dave 2006-04-10 20:17:25 UTC
Subject: Re:  gcc pessimized 64-bit % operator on hppa2.0

> Boooooooiinngggggg.......
> 
> Or, is anyone working on this?

I'm not.  Note that the HP code is using 64-bit registers and instructions
in 32-bit mode for the call to $$rem2.  I think doing this in GCC is going
to be tricky as normal calls only save the the least significant 32-bits.
Maybe we could somehow confine 64-bit register values to the call
clobbered registers.  Normally register pairs are used for 64-bit values.

In 64-bit mode, we can probably easily benefit from using the new 64-bit
millicode.

Dave
Comment 6 Steven Bosscher 2012-01-29 22:39:37 UTC
Perhaps this should be closed as WONTFIX?
Comment 7 dave.anglin 2012-01-30 17:00:45 UTC
On 1/29/2012 5:39 PM, steven at gcc dot gnu.org wrote:
> Perhaps this should be closed as WONTFIX?
This enhancement should be done.  It appears both the
32 and 64-bit targets would benefit in using $$rem2U.
The / operator is probably also pessimized.

Dave