This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
RE: About loop unrolling and optimize for size
- From: "sarah at hederstierna dot com" <fredrik at hederstierna dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Date: Fri, 14 Aug 2015 09:53:42 +0000
- Subject: RE: About loop unrolling and optimize for size
- Authentication-results: sourceware.org; auth=none
- References: <263ef4cd6f55417e97d180530b2e64ba at DAG03 dot HMC dot local>,<CAFiYyc1QtZOitp-4Q6vhAAfr0tQBj6L34Z0kMgrzK_vGksi4hA at mail dot gmail dot com>
I think I found explanation, the -fpeel-loops trigger some extra flags:
from "toplev.c":
/* web and rename-registers help when run after loop unrolling. */
if (flag_web == AUTODETECT_VALUE)
flag_web = flag_unroll_loops || flag_peel_loops;
if (flag_rename_registers == AUTODETECT_VALUE)
flag_rename_registers = flag_unroll_loops || flag_peel_loops;
actually its -frename-registers that causes the code size to decrease.
This flags seems to be set when enable -fpeel-loops.
Maybe this flag could be enabled in -Os, shouldn't have any downside besides makes possibly debugging harder?
Thanks/Fredrik
________________________________________
From: Richard Biener [richard.guenther@gmail.com]
Sent: Friday, August 14, 2015 09:28
To: sarah@hederstierna.com
Cc: gcc@gcc.gnu.org
Subject: Re: About loop unrolling and optimize for size
On Thu, Aug 13, 2015 at 6:26 PM, sarah@hederstierna.com
<fredrik@hederstierna.com> wrote:
> Hi
> I'm using an ARM thumb cross compiler for embedded systems and always do optimize for small size with -Os.
>
> Though I've experimented with optimization flags, and loop unrolling.
>
> Normally loop unrolling is always bad for size, code is duplicated and size increases.
>
> Though I discovered that in some special cases where the number of iteration is very small, eg a loop of 2-3 times,
> in this case an unrolling could make code size smaller - eg. losen up registers used for index in loops etc.
>
> Example when I use the flag "-fpeel-loops" together with -Os I will 99% of the cases get smaller code size for ARM thumb target.
>
> Some my question is how unrolling works with -Os, is it always totally disabled,
> or are there some cases when it could be tested, eg. with small number iterations, so loop can be eliminated?
>
> Could eg. "-fpeel-loops" be enabled by default for -Os perhaps? Now its only enabled for -O2 and above I think.
Complete peeling is already enabled with -Os, it is just restricted to
those cases where GCCs cost modeling of the
unrolling operation determines the code size shrinks. If you enable
-fpeel-loops then the cost model allows the
code size to grow - sth not (always) intended with -Os.
The solution is of course to improve the cost modeling and GCCs idea
of followup optimization opportunities.
I do have some incomplete patches to improve that and hope to get back
to it for GCC 6.
If you have (small) testcases that show code size improvements with
-Os -fpeel-loops over -Os and you are
confident they are caused by unrolling please open a bugzilla containing them.
Thanks,
Richard.
> Thanks and Best Regards
> Fredrik