Bug 54245 - [4.8 regression] incorrect optimisation
Summary: [4.8 regression] incorrect optimisation
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: 4.8.0
Assignee: Bill Schmidt
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-08-13 16:25 UTC by Mans Rullgard
Modified: 2012-08-15 13:28 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-08-13 00:00:00


Attachments
Test case (320 bytes, text/x-csrc)
2012-08-13 16:25 UTC, Mans Rullgard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mans Rullgard 2012-08-13 16:25:55 UTC
Created attachment 28007 [details]
Test case

Since r190220 the attached test is compiled incorrectly at -O1 and higher.
Comment 1 Jakub Jelinek 2012-08-13 17:19:40 UTC
Confirmed.  slsr replaces:
  D.2219_3 = *row_2(D);
  D.2220_4 = (int) D.2219_3;
  a1_5 = D.2220_4 * 22725;
  D.2222_6 = MEM[(short int *)row_2(D) + 4B];
  D.2223_7 = (int) D.2222_6;
  D.2224_8 = D.2223_7 * 21407;
  a0_9 = D.2224_8 + a1_5;
  D.2225_10 = D.2223_7 * 8867;
- a1_11 = a1_5 + D.2225_10;
+ slsr.4_25 = D.2222_6 * 12540;
+ slsr.5_26 = (int) slsr.4_25;
+ a1_11 = a0_9 - slsr.5_26;

The multiplication is newly performed in short int, supposedly that is the problem here.  Anyway, while the number of multiplications in the end is the same, with slsr the code sequence is also 3 insns/4 bytes longer on x86_64.
Comment 2 Bill Schmidt 2012-08-13 19:29:06 UTC
I'll take a look.  Might be a day or two as my queue is kind of full.
Comment 3 Bill Schmidt 2012-08-14 12:34:24 UTC
I'm putting together a for-now patch that disables the optimization when a widening cast produces the stride.  In the long run this can be re-enabled so long as we can retain the original cast and base the new multiply on that value rather than doing it in the smaller type.  This is a bit subtle to get right so I plan to defer that work until I have more time to concentrate on it.

I.e., there are two problems in this example: we did the multiply in a smaller type and we introduced an extra cast that wasn't necessary if we were smart enough (causing the longer sequence as Jakub noted).  The replacement would be ok with a little re-design.
Comment 4 Bill Schmidt 2012-08-15 13:27:38 UTC
Author: wschmidt
Date: Wed Aug 15 13:27:29 2012
New Revision: 190412

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190412
Log:
gcc:

2012-08-15  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR tree-optimization/54245
	* gimple-ssa-strength-reduction.c (legal_cast_p_1): New function.
	(legal_cast_p): Split out logic to legal_cast_p_1.
	(analyze_increments): Avoid introducing multiplies in smaller types.


gcc/testsuite:

2012-08-15  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR tree-optimization/54245
	* gcc.dg/tree-ssa/pr54245.c: New test.


Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr54245.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/gimple-ssa-strength-reduction.c
    trunk/gcc/testsuite/ChangeLog
Comment 5 Bill Schmidt 2012-08-15 13:28:49 UTC
Fixed.