53340 – [4.8 Regression] rnflow.f90 is ~20% slower after revision 187092

Bug 53340 - [4.8 Regression] rnflow.f90 is ~20% slower after revision 187092

Summary: [4.8 Regression] rnflow.f90 is ~20% slower after revision 187092

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.8.0

Importance:	P3 normal
Target Milestone:	4.8.0
Assignee:	Richard Biener

URL:
Keywords:

Depends on:
Blocks:

Reported:	2012-05-14 09:35 UTC by Dominique d'Humieres
Modified:	2012-05-14 11:37 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2012-05-14 00:00:00

Attachments
source cptrf2.f90 extracted from rnflow.f90 (1.24 KB, text/plain) 2012-05-14 09:44 UTC, Dominique d'Humieres	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dominique d'Humieres 2012-05-14 09:35:31 UTC

On x86_64-apple-darwin10, rnflow.f90 is ~20% slower after revision 187092

[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187091/bin/gfortran -O3 -ffast-math -funroll-loops rnflow.f90
[macbook] test/dbg_rnflow% time a.out > /dev/null
22.038u 0.352s 0:22.52 99.3%	0+0k 2+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187092/bin/gfortran -O3 -ffast-math -funroll-loops rnflow.f90
[macbook] test/dbg_rnflow% time a.out > /dev/null
27.480u 0.349s 0:27.83 99.9%	0+0k 0+0io 0pf+0w

The slowdown comes from the optimization of cptrf2

[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187092/bin/gfortran -c -O3 -ffast-math -funroll-loops timctr.f90 cmpcpt.f90 cptrf2.f90 dger.f90 dgetri.f90 dswap.f90 dtrsm.f90 evlrnf.f90 idamax.f90 main.f90 mattrs.f90 cmpmat.f90 dgemm.f90 dgetf2.f90 dlaswp.f90 dtrmm.f90 dtrti2.f90 extpic.f90 ilaenv.f90 matcnt.f90 reaseq.f90 xerbla.f90 cptrf1.f90 dgemv.f90 dgetrf.f90 dscal.f90 dtrmv.f90 dtrtri.f90 gentrs.f90 lsame.f90 matsim.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null27.567u 0.349s 0:27.92 99.9%	0+0k 0+0io 0pf+0w[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187091/bin/gfortran -c -O3 -ffast-math -funroll-loops cptrf2.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.136u 0.345s 0:22.48 99.9%	0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187091/bin/gfortran -c -O2 cptrf2.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
21.453u 0.348s 0:21.80 99.9%	0+0k 0+0io 0pf+0w

Comment 1 Dominique d'Humieres 2012-05-14 09:44:33 UTC

Created attachment 27399 [details]
source cptrf2.f90 extracted from rnflow.f90

Comment 2 Dominique d'Humieres 2012-05-14 09:49:22 UTC

If I understand correctly the profiling, the slowdown comes from the first inlined function minlst. The fast assembly is

L45:
	movss   (%r10), %xmm10
	leal    -1(%rsi), %edi
	movss   -4(%r10), %xmm11
	comiss  %xmm10, %xmm6
	movss   -8(%r10), %xmm12
	minss   %xmm10, %xmm6
	movss   -12(%r10), %xmm13
	cmova   %esi, %edx
	comiss  %xmm11, %xmm6
	minss   %xmm11, %xmm6
	cmova   %edi, %edx
	comiss  %xmm12, %xmm6
	minss   %xmm12, %xmm6
	leal    -2(%rsi), %edi
	cmova   %edi, %edx
	comiss  %xmm13, %xmm6
	leal    -3(%rsi), %edi
	minss   %xmm13, %xmm6
	cmova   %edi, %edx
	subl    $4, %esi
	subq    $16, %r10
	cmpl    %r8d, %esi
	jne     L45

while the slow one is

L39:
	movslq  %edx, %r9
	movss   -4(%rdi,%r9,4), %xmm9
	leal    -1(%r8), %r9d
	comiss  (%rbx), %xmm9
	cmova   %r8d, %edx
	movslq  %edx, %r14
	movss   -4(%rdi,%r14,4), %xmm10
	comiss  -4(%rbx), %xmm10
	cmova   %r9d, %edx
	leal    -2(%r8), %r9d
	movslq  %edx, %r14
	movss   -4(%rdi,%r14,4), %xmm11
	comiss  -8(%rbx), %xmm11
	cmova   %r9d, %edx
	leal    -3(%r8), %r9d
	movslq  %edx, %r14
	movss   -4(%rdi,%r14,4), %xmm12
	comiss  -12(%rbx), %xmm12
	cmova   %r9d, %edx
	subl    $4, %r8d
	subq    $16, %rbx
	cmpl    %r10d, %r8d
	jne     L39

Comment 3 Richard Biener 2012-05-14 10:38:51 UTC

Ouch.  Mine.

Comment 4 Richard Biener 2012-05-14 11:37:02 UTC

Author: rguenth
Date: Mon May 14 11:36:58 2012
New Revision: 187457

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=187457
Log:
2012-05-14  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/53340
	* tree-ssa-pre.c (op_valid_in_sets): Fix error in last commit.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-pre.c

Comment 5 Richard Biener 2012-05-14 11:37:18 UTC

Fixed.