First Last Prev Next    No search results available      Search page      Enter new bug
Bug#: 7625
Product:  
Component:  
Status: ASSIGNED
Resolution:
Assigned To: John David Anglin <danglin@gcc.gnu.org>
Host:
Reported against  
Priority:  
Severity:  
Target Milestone:  
 
 
Target:
Reporter: dtucker@zip.com.au
Add CC:
CC:
Remove selected CCs
Build:
URL:
Summary:
Keywords:
Known to work:
Known to fail:

Attachment Description Type Created Size Actions
longmodtest.i.bz2 longmodtest.i.bz2 application/octet-stream 2003-05-21 15:17 2.06 KB Edit
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 7625 depends on: Show dependency tree
Show dependency graph
Bug 7625 blocks:

Additional Comments:




Mark bug as waiting for feedback
Mark bug as suspended




View Bug Activity   |   Format For Printing   |   Clone This Bug


Description:   Last confirmed: 2005-09-07 17:37 Opened: 2002-08-18 04:36
GCC seems to compile code for the 64-bit "%" operator that is about 6 times
slower that the HP native compiler on HPPA2.0 machines, even with -march=2.0.

This was noticed affecting OpenSSL DSA operations and identified by Deron
Meranda . For background, please see
http://marc.theaimsgroup.com/?l=openssh-unix-dev&m=102646106016694&w=2

$ cat logmodtest.c
#include <stdio.h>

int
main()
{
        unsigned long long i, a=0;

        for(i=2000000; i; --i)
                a += (i+10) % i;

        printf("Result=%llu\n", a);
        exit(0);
}

$ cc +O3 longmodtest.c
$ time ./a.out
Result=19999913

real    0m0.649s
user    0m0.650s
sys     0m0.000s

$ gcc -O3 -march=2.0 longmodtest.c
$ time ./a.out
Result=19999913

real    0m3.712s
user    0m3.700s
sys     0m0.020s

Release:
3.2

Environment:
System: HP-UX c240 B.11.00 A 9000/782 2007058445 two-user license

host: hppa2.0w-hp-hpux11.00
build: hppa2.0w-hp-hpux11.00
target: hppa2.0w-hp-hpux11.00
configured with: ../gcc-3.2/configure --with-as=/usr/local/hppa2.0w-hp-hpux11.00/bin/as --with-gnu-as --with-ld=/usr/ccs/bin/ld --enable-languages=c,c++

How-To-Repeat:
$ gcc -O3 -march=2.0 -v -save-temps longmodtest.c
Reading specs from /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/specs
Configured with: ../gcc-3.2/configure --with-as=/usr/local/hppa2.0w-hp-hpux11.00/bin/as --with-gnu-as --with-ld=/usr/ccs/bin/ld --enable-languages=c,c++
Thread model: single
gcc version 3.2
 /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/cpp0 -lang-c -v -D__GNUC__=3 -D__GNUC_MINOR__=2 -D__GNUC_PATCHLEVEL__=0 -D__GXX_ABI_VERSION=102 -Dhppa -Dhp9000s800 -D__hp9000s800 -Dhp9k8 -DPWB -Dhpux -Dunix -D__hppa__ -D__hp9000s800__ -D__hp9000s800 -D__hp9k8__ -D__PWB__ -D__hpux__ -D__unix__ -D__hppa -D__hp9000s800 -D__hp9k8 -D__PWB -D__hpux -D__unix -Asystem=unix -Asystem=hpux -Acpu=hppa -Amachine=hppa -D__OPTIMIZE__ -D__STDC_HOSTED__=1 -D_PA_RISC1_1 -D__hp9000s700 -D_HPUX_SOURCE -D_HIUX_SOURCE -D__STDC_EXT__ -D_INCLUDE_LONGLONG longmodtest.c longmodtest.i
GNU CPP version 3.2 (cpplib) (hppa)
ignoring nonexistent directory "NONE/include"
ignoring nonexistent directory "/usr/local/hppa2.0w-hp-hpux11.00/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/include
 /usr/include
End of search list.
 /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/cc1 -fpreprocessed longmodtest.i -quiet -dumpbase longmodtest.c -march=2.0 -O3 -version -o longmodtest.s
GNU CPP version 3.2 (cpplib) (hppa)
GNU C version 3.2 (hppa2.0w-hp-hpux11.00)
        compiled by GNU C version 3.2.
 /usr/local/hppa2.0w-hp-hpux11.00/bin/as --traditional-format -o longmodtest.o longmodtest.s
 /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/collect2 -L/lib/pa1.1 -L/usr/lib/pa1.1 -z -u main /usr/ccs/lib/crt0.o -L/usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2 -L/usr/ccs/bin -L/usr/ccs/lib -L/opt/langtools/lib -L/usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/../../.. longmodtest.o -lgcc -lgcc_eh -lc -lgcc -lgcc_eh

See attachments for longmodtest.i.bz2

For comparison, if you split the "%" operation out into a separate source
file:

unsigned long long longmod(unsigned long long a, unsigned long long b)
{
        return(a % b);
}

the HP compiler produces the following assembler output:

        .LEVEL  2.0N

        .SPACE  $TEXT$,SORT=8
        .SUBSPA $CODE$,QUAD=0,ALIGN=4,ACCESS=0x2c,CODE_ONLY,SORT=24
longmod
        .PROC
        .CALLINFO FRAME=0,ARGS_SAVED,ORDERING_AWARE
        .ENTRY
        DEPD    %r25,31,32,%r26 ;offset 0x0
        DEPD    %r23,31,32,%r24 ;offset 0x4
        EXTRD,U %r26,31,32,%r25 ;offset 0x8
        .CALL   ;in=23,24,25,26;out=21,22,28,29; (MILLICALL)
        B,L     $$rem2U,%r31    ;offset 0xc
        EXTRD,U %r24,31,32,%r23 ;offset 0x10
        DEPD    %r28,31,32,%r29 ;offset 0x14
$00000002
$L0
        BVE     (%r2)   ;offset 0x18
        .EXIT
        EXTRD,U %r29,31,32,%r28 ;offset 0x1c
        .PROCEND        ;in=23,25;out=28,29;fpin=105,107;

        .SPACE  $TEXT$
        .SUBSPA $CODE$
        .SPACE  $PRIVATE$,SORT=16
        .SPACE  $TEXT$
        .SUBSPA $CODE$
        .EXPORT longmod,ENTRY,PRIV_LEV=3,ARGW0=GR,ARGW1=GR,ARGW2=GR,ARGW3=GR,RTNVAL=GR,LONG_RETURN
        .IMPORT $$rem2U,MILLICODE
        .END

------- Comment #1 From dtucker@zip.com.au 2002-08-18 04:36 -------
Fix:
	Be patient :-)

------- Comment #2 From John David Anglin 2002-11-15 15:25 -------
Responsible-Changed-From-To: unassigned->danglin
Responsible-Changed-Why: Assignment.

------- Comment #3 From John David Anglin 2002-11-15 15:25 -------
State-Changed-From-To: open->analyzed
State-Changed-Why: Problem confirmed.  GCC currently uses __umoddi3 from
    libgcc2.c for the operation.  We need to add pattern
    to allow use of $$rem2U when available.  We don't
    currently have this routine in the millicode routines
    used with linux.
    
    I suspect there may be other 64-bit operations that
    are pessimized by using generic libgcc code.

------- Comment #4 From Steven Bosscher 2006-04-10 19:49 -------
Boooooooiinngggggg.......

Or, is anyone working on this?

------- Comment #5 From dave@hiauly1.hia.nrc.ca 2006-04-10 20:17 -------
Subject: Re:  gcc pessimized 64-bit % operator on hppa2.0

> Boooooooiinngggggg.......
> 
> Or, is anyone working on this?

I'm not.  Note that the HP code is using 64-bit registers and instructions
in 32-bit mode for the call to $$rem2.  I think doing this in GCC is going
to be tricky as normal calls only save the the least significant 32-bits.
Maybe we could somehow confine 64-bit register values to the call
clobbered registers.  Normally register pairs are used for 64-bit values.

In 64-bit mode, we can probably easily benefit from using the new 64-bit
millicode.

Dave

First Last Prev Next    No search results available      Search page      Enter new bug