This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch tree-ssa] RFC: Enable path threading for control variables (PR tree-optimization/54742).
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: Steve Ellcey <sellcey at mips dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, "richard dot guenther at gmail dot com" <richard dot guenther at gmail dot com>, "law at redhat dot com" <law at redhat dot com>, "dnovillo at google dot com" <dnovillo at google dot com>, "amacleod at redhat dot com" <amacleod at redhat dot com>, "ook at ucw dot cz" <ook at ucw dot cz>, "sellcey at imgtec dot com" <sellcey at imgtec dot com>
- Date: Fri, 21 Jun 2013 17:43:30 +0100
- Subject: Re: [Patch tree-ssa] RFC: Enable path threading for control variables (PR tree-optimization/54742).
- References: <1371233239 dot 12204 dot 285 dot camel at ubuntu-sellcey> <1371647944-9788-1-git-send-email-james dot greenhalgh at arm dot com> <1371666799 dot 1804 dot 55 dot camel at ubuntu-sellcey>
> While testing it I noticed that the final executable
> is larger with your patch then with mine. Here are the sizes of the
> bare-metal executables I created using the same flags I sent you
> earlier, the first has no switch optimization, the second one uses my
> plugin optimization, and the third uses your latest patch. I haven't
> looked into why the size difference for your patch and mine exists, do
> you see a size difference on your platforms?
Yes I do, but after playing around with it, this seems very dependant
on pass ordering.
I've built various arm-none-eabi compilers to test with:
clean: is a compiler without path threading.
steve.pass: is your original pass patch.
james: is my patch (which will be called within vrp and dom passes)
steve.after-vrp1: moves your pass to immediately after the first call
to vrp
steve.before, steve.after, steve.after-vrp-before-dom,
steve.before-vrp-after-dom: run your pass immediately before or after
both vrp and both dom passes.
james.ch is my patch, rerunning pass_ch after dom1.
Then, building with flags:
-finline-limit=1000 -funroll-all-loops
-finline-functions [[-ftree-switch-shortcut]] -O3 -mthumb
And passing the resulting binary through:
$ arm-none-eabi-strip blob.*
I see:
$ size blob.arm.* | sort -n
text data bss dec hex filename
53984 2548 296 56828 ddfc ../blobs/blob.arm.clean
54464 2548 296 57308 dfdc ../blobs/blob.arm.steve.pass
54496 2548 296 57340 dffc ../blobs/blob.arm.steve.after
54496 2548 296 57340 dffc ../blobs/blob.arm.steve.after-vrp-before-dom
54504 2548 296 57348 e004 ../blobs/blob.arm.james.ch
54504 2548 296 57348 e004 ../blobs/blob.arm.steve.only-after-vrp1
54656 2548 296 57500 e09c ../blobs/blob.arm.james
54704 2548 296 57548 e0cc ../blobs/blob.arm.steve.before-vrp-after-dom
54736 2548 296 57580 e0ec ../blobs/blob.arm.steve.before
So to my mind, this is all far too tied up in pass ordering details to
resolve. Given that all the threading opportunities for my patch are found
in dom1 and how fragile the positioning of dom1 is, there is not a great
deal I can do to modify the ordering.
The biggest improvement I could find comes from rerunning pass_ch
immediately after dom1, though I'm not sure what the cost of that
would be.
I wonder if you or others have any thoughts on what the right thing to
do would be?
> I am not sure if path threading in general is turned off for -Os but it
> probably should be.
I agree, jump threading is on at -Os, path threading should not be.
Thanks,
James