Finding the optimization that is making the change

Will Hawkins whh8b@virginia.edu
Fri Aug 11 21:27:00 GMT 2017


On Fri, Aug 11, 2017 at 4:36 PM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
> On 11 August 2017 at 19:44, Will Hawkins <whh8b@virginia.edu> wrote:
>> On Fri, Aug 11, 2017 at 1:15 PM, Will Hawkins <whh8b@virginia.edu> wrote:
>>> On Fri, Aug 11, 2017 at 1:14 PM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
>>>> On 11 August 2017 at 18:09, Will Hawkins <whh8b@virginia.edu> wrote:
>>>>> Hello everyone!
>>>>>
>>>>> First, thank you all for your participation in the gcc community -- I
>>>>> firmly believe that one of the great strengths of free software is the
>>>>> community of people that participate in its development, maintenance
>>>>> and support. So, thank you!
>>>>>
>>>>> I have a simple C program and I am attempting to determine which of
>>>>> the optimizations at O1 cause a particular transformation. In order to
>>>>> isolate the optimizations enabled at O1 vs O0, I followed an idea set
>>>>> out in the gcc man page and ran the following command:
>>>>>
>>>>> $ diff <(gcc -Q -O1 --help=optimizers) <(gcc -Q --help=optimizers) |
>>>>> grep enabled | awk '{print $2;}' > optimizations
>>>>>
>>>>> Then I compiled with the following command:
>>>>>
>>>>> gcc -o scfi.poptim `cat optimizations | tr '\n' ' '` scfi.c
>>>>>
>>>>> I compared simple.poptim with simple.optim that came from running this command:
>>>>>
>>>>> gcc -o simple.optim -O1 simple.c
>>>>>
>>>>> I expected that simple.optim and simple.poptim would be (largely)
>>>>> identical. That is not the case, however. It does not look like the
>>>>> scfi.poptim program has been optimized at all.
>>>>
>>>> Because you didn't specify any -O optimization option, which means
>>>> there is no optimization done at all. See
>>>> https://gcc.gnu.org/wiki/FAQ#optimization-options
>>>>
>>>> Options to enable/disable individual optimizations have no effect if
>>>> the optimizers aren't run at all.
>>>>
>>>>> I was wondering if anyone could shed some light on why this is not the
>>>>> case. I ask only because the gcc man page seems to imply that this is
>>>>> the "right" way to isolate the different optimizations performed at
>>>>> different levels.
>>>>
>>>> No, you need to use -O1 -fno-xxx -fno-yyy -fno-zzz
>>>>
>>>> i.e. turn on optimization, then disable individual passes. You can't
>>>> start from nothing and enable individual ones, that gives you nothing.
>>>
>>> Wow! That makes perfect sense. Thank you so much! I will give it a try
>>> and let you know what I find. Thanks for the quick response!
>>
>> I tried your suggestion and I am still getting very odd behavior. I
>> have essentially done the opposite of what I was doing before.
>>
>> gcc -o scfi.poptim -O1 `cat optimizations | sed -e 's/^-f/-fno-/' | tr
>> '\n' ' '` scfi.c
>>
>> which yields the following invocation:
>>
>> gcc -o scfi.poptim -O1 -fno-combine-stack-adjustments
>> -fno-compare-elim -fno-cprop-registers -fno-defer-pop
>> -fno-forward-propagate -fno-guess-branch-probability
>> -fno-if-conversion -fno-if-conversion2
>> -fno-inline-functions-called-once -fno-ipa-profile -fno-ipa-pure-const
>> -fno-ipa-reference -fno-merge-constants -fno-shrink-wrap
>> -fno-split-wide-types -fno-tree-bit-ccp -fno-tree-ccp -fno-tree-ch
>> -fno-tree-copy-prop -fno-tree-copyrename -fno-tree-dce
>> -fno-tree-dominator-opts -fno-tree-dse -fno-tree-fre -fno-tree-sink
>> -fno-tree-slsr -fno-tree-sra -fno-tree-ter scfi.c
>>
>> I would have expected that to build the (largely) same binary as
>>
>>  gcc -o scfi -O0 scfi.c
>>
>> and yet it does not. The former is still optimized and the latter is
>> (obviously) not.
>
> As it says at https://gcc.gnu.org/wiki/FAQ#optimization-options "the
> -Ox flags enable many optimizations that are not controlled by any
> individual -f* option. "
>
> You can't reproduce the effects of -O1 by adding flags to unoptimized
> code, and you can't recreate unoptimized code by disabling individual
> optimizations. Unoptimized code is still completely unoptimized, and
> optimized code is not completely unoptimized.

First of all, thank you for continuing to offer your feedback!

>
> This probably won't stop you isolating which optimization causes the
> effect you're interested in, because it's probably one that is
> controlled by a -f flag. Instead of trying to compare apples and
> oranges (unoptimized and optimized) compare -O1 -fno-xxx -fno-yyy
> -fno-zzz and -O1, and then add/remove those -fno-* flags until you
> find the one that causes the effect you're interested in.

Interestingly enough, I think that it will. Here's why I say that.

I have the following C source code:

int calling_cd(int c_or_d) {
  void (*cd)(void) = testing_c;

  switch (c_or_d) {
    case 1:
      cd = testing_c;
      break;
    case 2:
      cd = testing_d;
      break;
  }
  cd();
  return 1;
}

When I compile at O0, I get the following:

  4005f1: push   %rbp
  4005f2: mov    %rsp,%rbp
  4005f5: sub    $0x20,%rsp
  4005f9: mov    %edi,-0x14(%rbp)
  4005fc: movq   $0x40054d,-0x8(%rbp)
  400604: mov    -0x14(%rbp),%eax
  400607: cmp    $0x1,%eax
  40060a: je     400613 <calling_cd+0x22>
  40060c: cmp    $0x2,%eax
  40060f: je     40061d <calling_cd+0x2c>
  400611: jmp    400626 <calling_cd+0x35>
  400613: movq   $0x40054d,-0x8(%rbp)
  40061b: jmp    400626 <calling_cd+0x35>
  40061d: movq   $0x40055d,-0x8(%rbp)
  400625: nop
  400626: mov    -0x8(%rbp),%rax
  40062a: callq  *%rax
  40062c: mov    $0x1,%eax
  400631: leaveq
  400632: retq

And, when I compile with O1 and *every* optimization disabled (as I
included in my previous email), I get the following:

  400643: sub    $0x8,%rsp
  400647: mov    $0x40059a,%eax
  40064c: cmp    $0x1,%edi
  40064f: je     400658 <calling_cd+0x15>
  400651: cmp    $0x2,%edi
  400654: je     400662 <calling_cd+0x1f>
  400656: jmp    400667 <calling_cd+0x24>
  400658: mov    $0x40059a,%eax
  40065d: nopl   (%rax)
  400660: jmp    400667 <calling_cd+0x24>
  400662: mov    $0x4005b7,%eax
  400667: callq  *%rax
  400669: mov    $0x1,%eax
  40066e: add    $0x8,%rsp
  400672: retq

The difference that I am interested in figuring out is what
"optimization" causes the local variable cd to be stored in a register
(eax) throughout function execution rather than to the stack
(-0x8(rbp)) for every assignment.

As you mentioned, I can definitely walk through the remaining
optimizations to get to the code that is generated at "full" O1 but
that's not really the behavior that I am trying to decipher.

I know that this is probably not a "reasonable" question -- I am not
trying to get some behavior for production, I am more interested in
understanding the compiler/optimizer in a way that I can dig into the
code, if I needed to.

Thanks again for taking the time to correspond. I hope that this is
not a waste of your time.

Have a great afternoon!
Will

>
> If you tell us more about what you're trying to achieve (rather than
> how you're trying to do it) maybe someone can save you some time.



More information about the Gcc-help mailing list