[Bug optimization/8126] [3.3/3.4 regression] Floating point computation far slower in 3.2 than in 2.95

Thu Jan 1 10:25:00 GMT 2004

------- Additional Comments From hubicka at ucw dot cz  2004-01-01 10:25 -------
Subject: Re:  [3.3/3.4 regression] Floating point computation far slower in 3.2 than in 2.95

> 
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2004-01-01 04:11 -------
> What is weird is that -march=i386 is faster than -march=i686 on a pentium3:
> grendel:~/src/gnu/gcctest>gcc -O3 -ffast-math -fomit-frame-pointer pr8126.c -march=i386
> grendel:~/src/gnu/gcctest>time ./a.out
> Start?
> Stop!
> Result = 0.000000, 0.000000, 1.000000
> 2.726u 0.000s 0:02.74 99.2%     0+0k 0+0io 2pf+0w
> grendel:~/src/gnu/gcctest>time ./a.out
> Start?
> Stop!
> Result = 0.000000, 0.000000, 1.000000
> 2.710u 0.000s 0:02.74 98.9%     0+0k 0+0io 0pf+0w
> grendel:~/src/gnu/gcctest>gcc -O3 -ffast-math -fomit-frame-pointer pr8126.c -march=i686
> grendel:~/src/gnu/gcctest>time ./a.out
> Start?
> Stop!
> Result = 0.000000, 0.000000, 1.000000
> 2.843u 0.007s 0:02.87 98.9%     0+0k 0+0io 2pf+0w
> grendel:~/src/gnu/gcctest>gcc -O3 -ffast-math -fomit-frame-pointer pr8126.c -march=i586
> grendel:~/src/gnu/gcctest>time ./a.out
> Start?
> Stop!
> Result = 0.000000, 0.000000, 1.000000
> 2.703u 0.000s 0:02.72 99.2%     0+0k 0+0io 2pf+0w
> grendel:~/src/gnu/gcctest>gcc -O3 -ffast-math -fomit-frame-pointer pr8126.c -march=
> pentium3
> grendel:~/src/gnu/gcctest>time ./a.out
> Start?
> Stop!
> Result = 0.000000, 0.000000, 1.000000
> 2.843u 0.007s 0:02.87 98.9%     0+0k 0+0io 2pf+0w
> 
> Is it looks like a choosing the wrong instruction for pentium3. (pentium4 is different and 
> does not matter that mcuh).

No, it is the scheduler (you will likely reproduce similar results via
-fno-schedule-insns2).  Scheduler does not take into account the stack
register file and reg-stack does not reorder and works by blindly
inserting exchange operations when the code does not match stack nature,
thus we get 100% random results performance wise out of the backend.
The unscheduled code usually fare slightly better as the structure of
original expression trees is still somewhat preserved, but it is still
far fom optimal.  There is not much to do on this front in short term,
unfortunately.

I've had limited luck with a patch teaching scheduler that two
consetuctive FP operations are cheaper when the other uses same operand
as destination of the first, but it does not fit very well to current
scheduler model (and it is missdesign).  Proper sollution is to
reorganize scheduler core into kind of library and make reg-stack to use
it to fix ordering as needed.  I am not planning to dig into it anytime
soon tought, home that the importance of x87 will fade.

Honza
> 
> -- 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>    Last reconfirmed|0000-00-00 00:00:00         |2004-01-01 04:11:34
>                date|                            |
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126
> 
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126