54245 – [4.8 regression] incorrect optimisation

Bug 54245 - [4.8 regression] incorrect optimisation

Summary: [4.8 regression] incorrect optimisation

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.8.0

Importance:	P3 normal
Target Milestone:	4.8.0
Assignee:	Bill Schmidt

URL:
Keywords:

Depends on:
Blocks:

Reported:	2012-08-13 16:25 UTC by Mans Rullgard
Modified:	2012-08-15 13:28 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2012-08-13 00:00:00

Attachments
Test case (320 bytes, text/x-csrc) 2012-08-13 16:25 UTC, Mans Rullgard	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mans Rullgard 2012-08-13 16:25:55 UTC

Created attachment 28007 [details]
Test case

Since r190220 the attached test is compiled incorrectly at -O1 and higher.

Comment 1 Jakub Jelinek 2012-08-13 17:19:40 UTC

Confirmed.  slsr replaces:
  D.2219_3 = *row_2(D);
  D.2220_4 = (int) D.2219_3;
  a1_5 = D.2220_4 * 22725;
  D.2222_6 = MEM[(short int *)row_2(D) + 4B];
  D.2223_7 = (int) D.2222_6;
  D.2224_8 = D.2223_7 * 21407;
  a0_9 = D.2224_8 + a1_5;
  D.2225_10 = D.2223_7 * 8867;
- a1_11 = a1_5 + D.2225_10;
+ slsr.4_25 = D.2222_6 * 12540;
+ slsr.5_26 = (int) slsr.4_25;
+ a1_11 = a0_9 - slsr.5_26;

The multiplication is newly performed in short int, supposedly that is the problem here.  Anyway, while the number of multiplications in the end is the same, with slsr the code sequence is also 3 insns/4 bytes longer on x86_64.

Comment 2 Bill Schmidt 2012-08-13 19:29:06 UTC

I'll take a look.  Might be a day or two as my queue is kind of full.

Comment 3 Bill Schmidt 2012-08-14 12:34:24 UTC

I'm putting together a for-now patch that disables the optimization when a widening cast produces the stride.  In the long run this can be re-enabled so long as we can retain the original cast and base the new multiply on that value rather than doing it in the smaller type.  This is a bit subtle to get right so I plan to defer that work until I have more time to concentrate on it.

I.e., there are two problems in this example: we did the multiply in a smaller type and we introduced an extra cast that wasn't necessary if we were smart enough (causing the longer sequence as Jakub noted).  The replacement would be ok with a little re-design.

Comment 4 Bill Schmidt 2012-08-15 13:27:38 UTC

Author: wschmidt
Date: Wed Aug 15 13:27:29 2012
New Revision: 190412

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190412
Log:
gcc:

2012-08-15  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR tree-optimization/54245
	* gimple-ssa-strength-reduction.c (legal_cast_p_1): New function.
	(legal_cast_p): Split out logic to legal_cast_p_1.
	(analyze_increments): Avoid introducing multiplies in smaller types.


gcc/testsuite:

2012-08-15  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR tree-optimization/54245
	* gcc.dg/tree-ssa/pr54245.c: New test.


Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr54245.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/gimple-ssa-strength-reduction.c
    trunk/gcc/testsuite/ChangeLog

Comment 5 Bill Schmidt 2012-08-15 13:28:49 UTC

Fixed.