GCC seems to compile code for the 64-bit "%" operator that is about 6 times slower that the HP native compiler on HPPA2.0 machines, even with -march=2.0. This was noticed affecting OpenSSL DSA operations and identified by Deron Meranda . For background, please see http://marc.theaimsgroup.com/?l=openssh-unix-dev&m=102646106016694&w=2 $ cat logmodtest.c #include <stdio.h> int main() { unsigned long long i, a=0; for(i=2000000; i; --i) a += (i+10) % i; printf("Result=%llu\n", a); exit(0); } $ cc +O3 longmodtest.c $ time ./a.out Result=19999913 real 0m0.649s user 0m0.650s sys 0m0.000s $ gcc -O3 -march=2.0 longmodtest.c $ time ./a.out Result=19999913 real 0m3.712s user 0m3.700s sys 0m0.020s Release: 3.2 Environment: System: HP-UX c240 B.11.00 A 9000/782 2007058445 two-user license host: hppa2.0w-hp-hpux11.00 build: hppa2.0w-hp-hpux11.00 target: hppa2.0w-hp-hpux11.00 configured with: ../gcc-3.2/configure --with-as=/usr/local/hppa2.0w-hp-hpux11.00/bin/as --with-gnu-as --with-ld=/usr/ccs/bin/ld --enable-languages=c,c++ How-To-Repeat: $ gcc -O3 -march=2.0 -v -save-temps longmodtest.c Reading specs from /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/specs Configured with: ../gcc-3.2/configure --with-as=/usr/local/hppa2.0w-hp-hpux11.00/bin/as --with-gnu-as --with-ld=/usr/ccs/bin/ld --enable-languages=c,c++ Thread model: single gcc version 3.2 /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/cpp0 -lang-c -v -D__GNUC__=3 -D__GNUC_MINOR__=2 -D__GNUC_PATCHLEVEL__=0 -D__GXX_ABI_VERSION=102 -Dhppa -Dhp9000s800 -D__hp9000s800 -Dhp9k8 -DPWB -Dhpux -Dunix -D__hppa__ -D__hp9000s800__ -D__hp9000s800 -D__hp9k8__ -D__PWB__ -D__hpux__ -D__unix__ -D__hppa -D__hp9000s800 -D__hp9k8 -D__PWB -D__hpux -D__unix -Asystem=unix -Asystem=hpux -Acpu=hppa -Amachine=hppa -D__OPTIMIZE__ -D__STDC_HOSTED__=1 -D_PA_RISC1_1 -D__hp9000s700 -D_HPUX_SOURCE -D_HIUX_SOURCE -D__STDC_EXT__ -D_INCLUDE_LONGLONG longmodtest.c longmodtest.i GNU CPP version 3.2 (cpplib) (hppa) ignoring nonexistent directory "NONE/include" ignoring nonexistent directory "/usr/local/hppa2.0w-hp-hpux11.00/include" #include "..." search starts here: #include <...> search starts here: /usr/local/include /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/include /usr/include End of search list. /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/cc1 -fpreprocessed longmodtest.i -quiet -dumpbase longmodtest.c -march=2.0 -O3 -version -o longmodtest.s GNU CPP version 3.2 (cpplib) (hppa) GNU C version 3.2 (hppa2.0w-hp-hpux11.00) compiled by GNU C version 3.2. /usr/local/hppa2.0w-hp-hpux11.00/bin/as --traditional-format -o longmodtest.o longmodtest.s /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/collect2 -L/lib/pa1.1 -L/usr/lib/pa1.1 -z -u main /usr/ccs/lib/crt0.o -L/usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2 -L/usr/ccs/bin -L/usr/ccs/lib -L/opt/langtools/lib -L/usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/../../.. longmodtest.o -lgcc -lgcc_eh -lc -lgcc -lgcc_eh See attachments for longmodtest.i.bz2 For comparison, if you split the "%" operation out into a separate source file: unsigned long long longmod(unsigned long long a, unsigned long long b) { return(a % b); } the HP compiler produces the following assembler output: .LEVEL 2.0N .SPACE $TEXT$,SORT=8 .SUBSPA $CODE$,QUAD=0,ALIGN=4,ACCESS=0x2c,CODE_ONLY,SORT=24 longmod .PROC .CALLINFO FRAME=0,ARGS_SAVED,ORDERING_AWARE .ENTRY DEPD %r25,31,32,%r26 ;offset 0x0 DEPD %r23,31,32,%r24 ;offset 0x4 EXTRD,U %r26,31,32,%r25 ;offset 0x8 .CALL ;in=23,24,25,26;out=21,22,28,29; (MILLICALL) B,L $$rem2U,%r31 ;offset 0xc EXTRD,U %r24,31,32,%r23 ;offset 0x10 DEPD %r28,31,32,%r29 ;offset 0x14 $00000002 $L0 BVE (%r2) ;offset 0x18 .EXIT EXTRD,U %r29,31,32,%r28 ;offset 0x1c .PROCEND ;in=23,25;out=28,29;fpin=105,107; .SPACE $TEXT$ .SUBSPA $CODE$ .SPACE $PRIVATE$,SORT=16 .SPACE $TEXT$ .SUBSPA $CODE$ .EXPORT longmod,ENTRY,PRIV_LEV=3,ARGW0=GR,ARGW1=GR,ARGW2=GR,ARGW3=GR,RTNVAL=GR,LONG_RETURN .IMPORT $$rem2U,MILLICODE .END
Fix: Be patient :-)
Responsible-Changed-From-To: unassigned->danglin Responsible-Changed-Why: Assignment.
State-Changed-From-To: open->analyzed State-Changed-Why: Problem confirmed. GCC currently uses __umoddi3 from libgcc2.c for the operation. We need to add pattern to allow use of $$rem2U when available. We don't currently have this routine in the millicode routines used with linux. I suspect there may be other 64-bit operations that are pessimized by using generic libgcc code.
Boooooooiinngggggg....... Or, is anyone working on this?
Subject: Re: gcc pessimized 64-bit % operator on hppa2.0 > Boooooooiinngggggg....... > > Or, is anyone working on this? I'm not. Note that the HP code is using 64-bit registers and instructions in 32-bit mode for the call to $$rem2. I think doing this in GCC is going to be tricky as normal calls only save the the least significant 32-bits. Maybe we could somehow confine 64-bit register values to the call clobbered registers. Normally register pairs are used for 64-bit values. In 64-bit mode, we can probably easily benefit from using the new 64-bit millicode. Dave
Perhaps this should be closed as WONTFIX?
On 1/29/2012 5:39 PM, steven at gcc dot gnu.org wrote: > Perhaps this should be closed as WONTFIX? This enhancement should be done. It appears both the 32 and 64-bit targets would benefit in using $$rem2U. The / operator is probably also pessimized. Dave