User account creation filtered due to spam.

Bug 7625 - gcc pessimized 64-bit % operator on hppa2.0
Summary: gcc pessimized 64-bit % operator on hppa2.0
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 3.2
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
Keywords: missed-optimization
Depends on:
Reported: 2002-08-18 04:36 UTC by dtucker
Modified: 2015-08-30 14:49 UTC (History)
2 users (show)

See Also:
Host: hppa2.0w-hp-hpux11.00
Target: hppa2.0w-hp-hpux11.00
Build: hppa2.0w-hp-hpux11.00
Known to work:
Known to fail:
Last reconfirmed: 2005-09-07 17:37:19

longmodtest.i.bz2 (2.06 KB, application/octet-stream)
2003-05-21 15:17 UTC, dtucker

Note You need to log in before you can comment on or make changes to this bug.
Description dtucker 2002-08-18 04:36:00 UTC
GCC seems to compile code for the 64-bit "%" operator that is about 6 times
slower that the HP native compiler on HPPA2.0 machines, even with -march=2.0.

This was noticed affecting OpenSSL DSA operations and identified by Deron
Meranda . For background, please see

$ cat logmodtest.c
#include <stdio.h>

        unsigned long long i, a=0;

        for(i=2000000; i; --i)
                a += (i+10) % i;

        printf("Result=%llu\n", a);

$ cc +O3 longmodtest.c
$ time ./a.out

real    0m0.649s
user    0m0.650s
sys     0m0.000s

$ gcc -O3 -march=2.0 longmodtest.c
$ time ./a.out

real    0m3.712s
user    0m3.700s
sys     0m0.020s


System: HP-UX c240 B.11.00 A 9000/782 2007058445 two-user license

host: hppa2.0w-hp-hpux11.00
build: hppa2.0w-hp-hpux11.00
target: hppa2.0w-hp-hpux11.00
configured with: ../gcc-3.2/configure --with-as=/usr/local/hppa2.0w-hp-hpux11.00/bin/as --with-gnu-as --with-ld=/usr/ccs/bin/ld --enable-languages=c,c++

$ gcc -O3 -march=2.0 -v -save-temps longmodtest.c
Reading specs from /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/specs
Configured with: ../gcc-3.2/configure --with-as=/usr/local/hppa2.0w-hp-hpux11.00/bin/as --with-gnu-as --with-ld=/usr/ccs/bin/ld --enable-languages=c,c++
Thread model: single
gcc version 3.2
 /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/cpp0 -lang-c -v -D__GNUC__=3 -D__GNUC_MINOR__=2 -D__GNUC_PATCHLEVEL__=0 -D__GXX_ABI_VERSION=102 -Dhppa -Dhp9000s800 -D__hp9000s800 -Dhp9k8 -DPWB -Dhpux -Dunix -D__hppa__ -D__hp9000s800__ -D__hp9000s800 -D__hp9k8__ -D__PWB__ -D__hpux__ -D__unix__ -D__hppa -D__hp9000s800 -D__hp9k8 -D__PWB -D__hpux -D__unix -Asystem=unix -Asystem=hpux -Acpu=hppa -Amachine=hppa -D__OPTIMIZE__ -D__STDC_HOSTED__=1 -D_PA_RISC1_1 -D__hp9000s700 -D_HPUX_SOURCE -D_HIUX_SOURCE -D__STDC_EXT__ -D_INCLUDE_LONGLONG longmodtest.c longmodtest.i
GNU CPP version 3.2 (cpplib) (hppa)
ignoring nonexistent directory "NONE/include"
ignoring nonexistent directory "/usr/local/hppa2.0w-hp-hpux11.00/include"
#include "..." search starts here:
#include <...> search starts here:
End of search list.
 /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/cc1 -fpreprocessed longmodtest.i -quiet -dumpbase longmodtest.c -march=2.0 -O3 -version -o longmodtest.s
GNU CPP version 3.2 (cpplib) (hppa)
GNU C version 3.2 (hppa2.0w-hp-hpux11.00)
        compiled by GNU C version 3.2.
 /usr/local/hppa2.0w-hp-hpux11.00/bin/as --traditional-format -o longmodtest.o longmodtest.s
 /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/collect2 -L/lib/pa1.1 -L/usr/lib/pa1.1 -z -u main /usr/ccs/lib/crt0.o -L/usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2 -L/usr/ccs/bin -L/usr/ccs/lib -L/opt/langtools/lib -L/usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/../../.. longmodtest.o -lgcc -lgcc_eh -lc -lgcc -lgcc_eh

See attachments for longmodtest.i.bz2

For comparison, if you split the "%" operation out into a separate source

unsigned long long longmod(unsigned long long a, unsigned long long b)
        return(a % b);

the HP compiler produces the following assembler output:

        .LEVEL  2.0N

        .SPACE  $TEXT$,SORT=8
        DEPD    %r25,31,32,%r26 ;offset 0x0
        DEPD    %r23,31,32,%r24 ;offset 0x4
        EXTRD,U %r26,31,32,%r25 ;offset 0x8
        .CALL   ;in=23,24,25,26;out=21,22,28,29; (MILLICALL)
        B,L     $$rem2U,%r31    ;offset 0xc
        EXTRD,U %r24,31,32,%r23 ;offset 0x10
        DEPD    %r28,31,32,%r29 ;offset 0x14
        BVE     (%r2)   ;offset 0x18
        EXTRD,U %r29,31,32,%r28 ;offset 0x1c
        .PROCEND        ;in=23,25;out=28,29;fpin=105,107;

        .SPACE  $TEXT$
        .SUBSPA $CODE$
        .SPACE  $PRIVATE$,SORT=16
        .SPACE  $TEXT$
        .SUBSPA $CODE$
        .IMPORT $$rem2U,MILLICODE
Comment 1 dtucker 2002-08-18 04:36:00 UTC
	Be patient :-)
Comment 2 John David Anglin 2002-11-15 15:25:18 UTC
Responsible-Changed-From-To: unassigned->danglin
Responsible-Changed-Why: Assignment.
Comment 3 John David Anglin 2002-11-15 15:25:18 UTC
State-Changed-From-To: open->analyzed
State-Changed-Why: Problem confirmed.  GCC currently uses __umoddi3 from
    libgcc2.c for the operation.  We need to add pattern
    to allow use of $$rem2U when available.  We don't
    currently have this routine in the millicode routines
    used with linux.
    I suspect there may be other 64-bit operations that
    are pessimized by using generic libgcc code.
Comment 4 Steven Bosscher 2006-04-10 19:49:04 UTC

Or, is anyone working on this?
Comment 5 dave 2006-04-10 20:17:25 UTC
Subject: Re:  gcc pessimized 64-bit % operator on hppa2.0

> Boooooooiinngggggg.......
> Or, is anyone working on this?

I'm not.  Note that the HP code is using 64-bit registers and instructions
in 32-bit mode for the call to $$rem2.  I think doing this in GCC is going
to be tricky as normal calls only save the the least significant 32-bits.
Maybe we could somehow confine 64-bit register values to the call
clobbered registers.  Normally register pairs are used for 64-bit values.

In 64-bit mode, we can probably easily benefit from using the new 64-bit

Comment 6 Steven Bosscher 2012-01-29 22:39:37 UTC
Perhaps this should be closed as WONTFIX?
Comment 7 dave.anglin 2012-01-30 17:00:45 UTC
On 1/29/2012 5:39 PM, steven at gcc dot wrote:
> Perhaps this should be closed as WONTFIX?
This enhancement should be done.  It appears both the
32 and 64-bit targets would benefit in using $$rem2U.
The / operator is probably also pessimized.