7625 – gcc pessimized 64-bit % operator on hppa2.0

Bug 7625 - gcc pessimized 64-bit % operator on hppa2.0

Summary: gcc pessimized 64-bit % operator on hppa2.0

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	3.2

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2002-08-18 04:36 UTC by dtucker
Modified:	2022-01-20 11:31 UTC (History)
CC List:	3 users (show)

See Also:
Host:	hppa2.0w-hp-hpux11.00
Target:	hppa2.0w-hp-hpux11.00
Build:	hppa2.0w-hp-hpux11.00
Known to work:
Known to fail:
Last reconfirmed:	2005-09-07 17:37:19

Attachments
longmodtest.i.bz2 (2.06 KB, application/octet-stream) 2003-05-21 15:17 UTC, dtucker	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description dtucker 2002-08-18 04:36:00 UTC

GCC seems to compile code for the 64-bit "%" operator that is about 6 times
slower that the HP native compiler on HPPA2.0 machines, even with -march=2.0.

This was noticed affecting OpenSSL DSA operations and identified by Deron
Meranda . For background, please see
http://marc.theaimsgroup.com/?l=openssh-unix-dev&m=102646106016694&w=2

$ cat logmodtest.c
#include <stdio.h>

int
main()
{
        unsigned long long i, a=0;

        for(i=2000000; i; --i)
                a += (i+10) % i;

        printf("Result=%llu\n", a);
        exit(0);
}

$ cc +O3 longmodtest.c
$ time ./a.out
Result=19999913

real    0m0.649s
user    0m0.650s
sys     0m0.000s

$ gcc -O3 -march=2.0 longmodtest.c
$ time ./a.out
Result=19999913

real    0m3.712s
user    0m3.700s
sys     0m0.020s

Release:
3.2

Environment:
System: HP-UX c240 B.11.00 A 9000/782 2007058445 two-user license

host: hppa2.0w-hp-hpux11.00
build: hppa2.0w-hp-hpux11.00
target: hppa2.0w-hp-hpux11.00
configured with: ../gcc-3.2/configure --with-as=/usr/local/hppa2.0w-hp-hpux11.00/bin/as --with-gnu-as --with-ld=/usr/ccs/bin/ld --enable-languages=c,c++

How-To-Repeat:
$ gcc -O3 -march=2.0 -v -save-temps longmodtest.c
Reading specs from /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/specs
Configured with: ../gcc-3.2/configure --with-as=/usr/local/hppa2.0w-hp-hpux11.00/bin/as --with-gnu-as --with-ld=/usr/ccs/bin/ld --enable-languages=c,c++
Thread model: single
gcc version 3.2
 /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/cpp0 -lang-c -v -D__GNUC__=3 -D__GNUC_MINOR__=2 -D__GNUC_PATCHLEVEL__=0 -D__GXX_ABI_VERSION=102 -Dhppa -Dhp9000s800 -D__hp9000s800 -Dhp9k8 -DPWB -Dhpux -Dunix -D__hppa__ -D__hp9000s800__ -D__hp9000s800 -D__hp9k8__ -D__PWB__ -D__hpux__ -D__unix__ -D__hppa -D__hp9000s800 -D__hp9k8 -D__PWB -D__hpux -D__unix -Asystem=unix -Asystem=hpux -Acpu=hppa -Amachine=hppa -D__OPTIMIZE__ -D__STDC_HOSTED__=1 -D_PA_RISC1_1 -D__hp9000s700 -D_HPUX_SOURCE -D_HIUX_SOURCE -D__STDC_EXT__ -D_INCLUDE_LONGLONG longmodtest.c longmodtest.i
GNU CPP version 3.2 (cpplib) (hppa)
ignoring nonexistent directory "NONE/include"
ignoring nonexistent directory "/usr/local/hppa2.0w-hp-hpux11.00/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/include
 /usr/include
End of search list.
 /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/cc1 -fpreprocessed longmodtest.i -quiet -dumpbase longmodtest.c -march=2.0 -O3 -version -o longmodtest.s
GNU CPP version 3.2 (cpplib) (hppa)
GNU C version 3.2 (hppa2.0w-hp-hpux11.00)
        compiled by GNU C version 3.2.
 /usr/local/hppa2.0w-hp-hpux11.00/bin/as --traditional-format -o longmodtest.o longmodtest.s
 /usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/collect2 -L/lib/pa1.1 -L/usr/lib/pa1.1 -z -u main /usr/ccs/lib/crt0.o -L/usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2 -L/usr/ccs/bin -L/usr/ccs/lib -L/opt/langtools/lib -L/usr/local/lib/gcc-lib/hppa2.0w-hp-hpux11.00/3.2/../../.. longmodtest.o -lgcc -lgcc_eh -lc -lgcc -lgcc_eh

See attachments for longmodtest.i.bz2

For comparison, if you split the "%" operation out into a separate source
file:

unsigned long long longmod(unsigned long long a, unsigned long long b)
{
        return(a % b);
}

the HP compiler produces the following assembler output:

        .LEVEL  2.0N

        .SPACE  $TEXT$,SORT=8
        .SUBSPA $CODE$,QUAD=0,ALIGN=4,ACCESS=0x2c,CODE_ONLY,SORT=24
longmod
        .PROC
        .CALLINFO FRAME=0,ARGS_SAVED,ORDERING_AWARE
        .ENTRY
        DEPD    %r25,31,32,%r26 ;offset 0x0
        DEPD    %r23,31,32,%r24 ;offset 0x4
        EXTRD,U %r26,31,32,%r25 ;offset 0x8
        .CALL   ;in=23,24,25,26;out=21,22,28,29; (MILLICALL)
        B,L     $$rem2U,%r31    ;offset 0xc
        EXTRD,U %r24,31,32,%r23 ;offset 0x10
        DEPD    %r28,31,32,%r29 ;offset 0x14
$00000002
$L0
        BVE     (%r2)   ;offset 0x18
        .EXIT
        EXTRD,U %r29,31,32,%r28 ;offset 0x1c
        .PROCEND        ;in=23,25;out=28,29;fpin=105,107;

        .SPACE  $TEXT$
        .SUBSPA $CODE$
        .SPACE  $PRIVATE$,SORT=16
        .SPACE  $TEXT$
        .SUBSPA $CODE$
        .EXPORT longmod,ENTRY,PRIV_LEV=3,ARGW0=GR,ARGW1=GR,ARGW2=GR,ARGW3=GR,RTNVAL=GR,LONG_RETURN
        .IMPORT $$rem2U,MILLICODE
        .END

Comment 1 dtucker 2002-08-18 04:36:00 UTC

Fix:
	Be patient :-)

Comment 2 John David Anglin 2002-11-15 15:25:18 UTC

Responsible-Changed-From-To: unassigned->danglin
Responsible-Changed-Why: Assignment.

Comment 3 John David Anglin 2002-11-15 15:25:18 UTC

State-Changed-From-To: open->analyzed
State-Changed-Why: Problem confirmed.  GCC currently uses __umoddi3 from
    libgcc2.c for the operation.  We need to add pattern
    to allow use of $$rem2U when available.  We don't
    currently have this routine in the millicode routines
    used with linux.
    
    I suspect there may be other 64-bit operations that
    are pessimized by using generic libgcc code.

Comment 4 Steven Bosscher 2006-04-10 19:49:04 UTC

Boooooooiinngggggg.......

Or, is anyone working on this?

Comment 5 dave 2006-04-10 20:17:25 UTC

Subject: Re:  gcc pessimized 64-bit % operator on hppa2.0

> Boooooooiinngggggg.......
> 
> Or, is anyone working on this?

I'm not.  Note that the HP code is using 64-bit registers and instructions
in 32-bit mode for the call to $$rem2.  I think doing this in GCC is going
to be tricky as normal calls only save the the least significant 32-bits.
Maybe we could somehow confine 64-bit register values to the call
clobbered registers.  Normally register pairs are used for 64-bit values.

In 64-bit mode, we can probably easily benefit from using the new 64-bit
millicode.

Dave

Comment 6 Steven Bosscher 2012-01-29 22:39:37 UTC

Perhaps this should be closed as WONTFIX?

Comment 7 dave.anglin 2012-01-30 17:00:45 UTC

On 1/29/2012 5:39 PM, steven at gcc dot gnu.org wrote:
> Perhaps this should be closed as WONTFIX?
This enhancement should be done.  It appears both the
32 and 64-bit targets would benefit in using $$rem2U.
The / operator is probably also pessimized.

Dave