Bug 53340 - [4.8 Regression] rnflow.f90 is ~20% slower after revision 187092
Summary: [4.8 Regression] rnflow.f90 is ~20% slower after revision 187092
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: 4.8.0
Assignee: Richard Biener
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-14 09:35 UTC by Dominique d'Humieres
Modified: 2012-05-14 11:37 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-05-14 00:00:00


Attachments
source cptrf2.f90 extracted from rnflow.f90 (1.24 KB, text/plain)
2012-05-14 09:44 UTC, Dominique d'Humieres
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dominique d'Humieres 2012-05-14 09:35:31 UTC
On x86_64-apple-darwin10, rnflow.f90 is ~20% slower after revision 187092

[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187091/bin/gfortran -O3 -ffast-math -funroll-loops rnflow.f90
[macbook] test/dbg_rnflow% time a.out > /dev/null
22.038u 0.352s 0:22.52 99.3%	0+0k 2+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187092/bin/gfortran -O3 -ffast-math -funroll-loops rnflow.f90
[macbook] test/dbg_rnflow% time a.out > /dev/null
27.480u 0.349s 0:27.83 99.9%	0+0k 0+0io 0pf+0w

The slowdown comes from the optimization of cptrf2

[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187092/bin/gfortran -c -O3 -ffast-math -funroll-loops timctr.f90 cmpcpt.f90 cptrf2.f90 dger.f90 dgetri.f90 dswap.f90 dtrsm.f90 evlrnf.f90 idamax.f90 main.f90 mattrs.f90 cmpmat.f90 dgemm.f90 dgetf2.f90 dlaswp.f90 dtrmm.f90 dtrti2.f90 extpic.f90 ilaenv.f90 matcnt.f90 reaseq.f90 xerbla.f90 cptrf1.f90 dgemv.f90 dgetrf.f90 dscal.f90 dtrmv.f90 dtrtri.f90 gentrs.f90 lsame.f90 matsim.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null27.567u 0.349s 0:27.92 99.9%	0+0k 0+0io 0pf+0w[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187091/bin/gfortran -c -O3 -ffast-math -funroll-loops cptrf2.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.136u 0.345s 0:22.48 99.9%	0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187091/bin/gfortran -c -O2 cptrf2.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
21.453u 0.348s 0:21.80 99.9%	0+0k 0+0io 0pf+0w
Comment 1 Dominique d'Humieres 2012-05-14 09:44:33 UTC
Created attachment 27399 [details]
source cptrf2.f90 extracted from rnflow.f90
Comment 2 Dominique d'Humieres 2012-05-14 09:49:22 UTC
If I understand correctly the profiling, the slowdown comes from the first inlined function minlst. The fast assembly is

L45:
	movss   (%r10), %xmm10
	leal    -1(%rsi), %edi
	movss   -4(%r10), %xmm11
	comiss  %xmm10, %xmm6
	movss   -8(%r10), %xmm12
	minss   %xmm10, %xmm6
	movss   -12(%r10), %xmm13
	cmova   %esi, %edx
	comiss  %xmm11, %xmm6
	minss   %xmm11, %xmm6
	cmova   %edi, %edx
	comiss  %xmm12, %xmm6
	minss   %xmm12, %xmm6
	leal    -2(%rsi), %edi
	cmova   %edi, %edx
	comiss  %xmm13, %xmm6
	leal    -3(%rsi), %edi
	minss   %xmm13, %xmm6
	cmova   %edi, %edx
	subl    $4, %esi
	subq    $16, %r10
	cmpl    %r8d, %esi
	jne     L45

while the slow one is

L39:
	movslq  %edx, %r9
	movss   -4(%rdi,%r9,4), %xmm9
	leal    -1(%r8), %r9d
	comiss  (%rbx), %xmm9
	cmova   %r8d, %edx
	movslq  %edx, %r14
	movss   -4(%rdi,%r14,4), %xmm10
	comiss  -4(%rbx), %xmm10
	cmova   %r9d, %edx
	leal    -2(%r8), %r9d
	movslq  %edx, %r14
	movss   -4(%rdi,%r14,4), %xmm11
	comiss  -8(%rbx), %xmm11
	cmova   %r9d, %edx
	leal    -3(%r8), %r9d
	movslq  %edx, %r14
	movss   -4(%rdi,%r14,4), %xmm12
	comiss  -12(%rbx), %xmm12
	cmova   %r9d, %edx
	subl    $4, %r8d
	subq    $16, %rbx
	cmpl    %r10d, %r8d
	jne     L39
Comment 3 Richard Biener 2012-05-14 10:38:51 UTC
Ouch.  Mine.
Comment 4 Richard Biener 2012-05-14 11:37:02 UTC
Author: rguenth
Date: Mon May 14 11:36:58 2012
New Revision: 187457

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=187457
Log:
2012-05-14  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/53340
	* tree-ssa-pre.c (op_valid_in_sets): Fix error in last commit.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-pre.c
Comment 5 Richard Biener 2012-05-14 11:37:18 UTC
Fixed.