This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch tree-ssa] RFC: Enable path threading for control variables (PR tree-optimization/54742).


>  While testing it I noticed that the final executable
> is larger with your patch then with mine.  Here are the sizes of the
> bare-metal executables I created using the same flags I sent you
> earlier, the first has no switch optimization, the second one uses my
> plugin optimization, and the third uses your latest patch.  I haven't
> looked into why the size difference for your patch and mine exists, do
> you see a size difference on your platforms? 

Yes I do, but after playing around with it, this seems very dependant
on pass ordering.

I've built various arm-none-eabi compilers to test with:

  clean: is a compiler without path threading.
  steve.pass: is your original pass patch.
  james: is my patch (which will be called within vrp and dom passes)

  steve.after-vrp1: moves your pass to immediately after the first call
    to vrp

  steve.before, steve.after, steve.after-vrp-before-dom,
  steve.before-vrp-after-dom: run your pass immediately before or after
    both vrp and both dom passes.

  james.ch is my patch, rerunning pass_ch after dom1.

Then, building with flags:

  -finline-limit=1000 -funroll-all-loops
  -finline-functions [[-ftree-switch-shortcut]] -O3 -mthumb

And passing the resulting binary through:

$ arm-none-eabi-strip blob.*

I see:

$ size blob.arm.* | sort -n

   text	   data	    bss	    dec	    hex	filename
  53984	   2548	    296	  56828	   ddfc	../blobs/blob.arm.clean
  54464	   2548	    296	  57308	   dfdc	../blobs/blob.arm.steve.pass
  54496	   2548	    296	  57340	   dffc	../blobs/blob.arm.steve.after
  54496	   2548	    296	  57340	   dffc	../blobs/blob.arm.steve.after-vrp-before-dom
  54504	   2548	    296	  57348	   e004	../blobs/blob.arm.james.ch
  54504	   2548	    296	  57348	   e004	../blobs/blob.arm.steve.only-after-vrp1
  54656	   2548	    296	  57500	   e09c	../blobs/blob.arm.james
  54704	   2548	    296	  57548	   e0cc	../blobs/blob.arm.steve.before-vrp-after-dom
  54736	   2548	    296	  57580	   e0ec	../blobs/blob.arm.steve.before

So to my mind, this is all far too tied up in pass ordering details to
resolve. Given that all the threading opportunities for my patch are found
in dom1 and how fragile the positioning of dom1 is, there is not a great
deal I can do to modify the ordering.

The biggest improvement I could find comes from rerunning pass_ch
immediately after dom1, though I'm not sure what the cost of that
would be.

I wonder if you or others have any thoughts on what the right thing to
do would be?

> I am not sure if path threading in general is turned off for -Os but it
> probably should be.

I agree, jump threading is on at -Os, path threading should not be.

Thanks,
James


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]