Bug 40992 - [4.4 Regression] cunroll ignoring asm size
: [4.4 Regression] cunroll ignoring asm size
Status: NEW
Product: gcc
Classification: Unclassified
Component: tree-optimization
: 4.5.0
: P2 normal
: 4.4.7
Assigned To: Not yet assigned to anyone
: http://gcc.gnu.org/ml/gcc-patches/200...
: missed-optimization, patch
:
:
  Show dependency treegraph
 
Reported: 2009-08-07 04:24 UTC by Alexander Strange
Modified: 2011-11-29 23:06 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2009-10-05 00:39:52


Attachments
the source (5.01 KB, text/plain)
2009-08-07 04:25 UTC, Alexander Strange
Details
Patch which should fix this (1.58 KB, patch)
2009-10-05 00:57 UTC, Andrew Pinski
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Strange 2009-08-07 04:24:57 UTC
The attached file is a loop over the same function implemented in C and inline
asm. 

When compiled with:
gcc -O3 -fno-pic -fomit-frame-pointer -fdump-tree-cunroll-details -S
cabac_unroll.i
cunroll thinks they're different sizes:

size: 55-4, last_iteration: 55-4
  Loop size: 55
  Estimated size after unrolling: 442

size: 8-4, last_iteration: 8-4
  Loop size: 8
  Estimated size after unrolling: 34

and expands the asm loop all 13 times.

This is reduced from ffmpeg decode_cabac_residual, where it apparently causes
significant decoding slowdown.

Besides that, cunroll seems to be hurting ffmpeg in general on x86-32
(http://multimedia.cx/eggs/last-performance-smackdown-for-awhile/), maybe we'll
turn it down some.
Comment 1 Alexander Strange 2009-08-07 04:25:31 UTC
Created attachment 18315 [details]
the source
Comment 2 Richard Guenther 2009-08-07 12:06:57 UTC
asms are predicted to have a small size/cost because they should be used to
implement single instructions that cannot be mapped to C easily.

No easy way out without pessimizing more legitimate uses of asms.
Comment 3 Alexander Strange 2009-08-08 16:44:24 UTC
Maybe the C version will be usable after everyone is using 4.4+, earlier
versions tend to make a mess.

Anyway, counting newlines for size estimation wouldn't pessimize anything.
Comment 4 Andrew Pinski 2009-10-05 00:39:52 UTC
>No easy way out without pessimizing more legitimate uses of asms.

Actually that is not true.  asm_insn_count in final.c will be able to count the
number of instructions that an inline-asm will be.  This size count is
documented too.

I am going to fix this by making asm_insn_count public and use that in
tree-inline.c.
Comment 5 Andrew Pinski 2009-10-05 00:56:02 UTC
  size:  41 __asm__ __volatile__

Before:
  size:   1 __asm__ __volatile__
Comment 6 Andrew Pinski 2009-10-05 00:57:35 UTC
Created attachment 18705 [details]
Patch which should fix this
Comment 7 Andrew Pinski 2009-10-05 17:46:48 UTC
Fixed on the trunk.
Comment 8 Andrew Pinski 2009-10-05 17:46:48 UTC
Subject: Bug 40992

Author: pinskia
Date: Mon Oct  5 17:46:35 2009
New Revision: 152458

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=152458
Log:
2009-10-05  Andrew Pinski  <andrew_pinski@playstation.sony.com>

        PR tree-opt/40992
        * final.c (asm_str_count): Split out from asm_insn_count.
        * rtl.h (asm_str_count): New prototype.
        * tree-inline (estimate_num_insns) <case GIMPLE_ASM>: Call
        asm_str_count.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/final.c
    trunk/gcc/rtl.h
    trunk/gcc/tree-inline.c
Comment 9 Richard Guenther 2010-05-22 18:13:43 UTC
GCC 4.3.5 is being released, adjusting target milestone.
Comment 10 Richard Guenther 2011-06-27 12:13:57 UTC
4.3 branch is being closed, moving to 4.4.7 target.
Comment 11 Andrew Pinski 2011-11-29 23:06:02 UTC
No longer working on this.