Order of variables in specific sections when enabling optimization in gcc

David Brown david.brown@hesbynett.no
Thu Mar 7 20:45:00 GMT 2019


On 07/03/2019 16:51, Freddie Chopin wrote:
> On Thu, 2019-03-07 at 15:18 +0100, David Brown wrote:
>> On 07/03/2019 14:56, Freddie Chopin wrote:
>>> On Thu, 2019-03-07 at 14:05 +0100, David Brown wrote:
>>>> The -fno-toplevel-reorder switch can be handy too - it will stop
>>>> re-ordering within a translation unit.
>>>
>>> Great - I'll try that soon (; This seems to be what I was looking
>>> for!
>>>
>>
>> I'd recommend putting it as:
>>
>> 	#pragma GCC optimize ("-fno-toplevel-reorder")
>>
>> in the source code defining your data.  That way it should work no
>> matter what switches are used.  (I hope that option is allowed in the
>> pragma - not all options are.)
> 
> Unfortunately GCC says this option is not valid for a pragma... But
> compiling the file with this option does indeed result in identical
> order in both source and object file (;
> 
> But the #pragma gave me an idea - I can actually disable the
> optimizations with the pragma and this does work too:
> 
> #pragma GCC optimize ("0")

Fair enough - especially if this file has nothing but the variable 
definitions.

> 
>>>> However, if you are using LTO or -fdata-sections and the
>>>> --gc-sections linker option, variables that are not needed get
>>>> eliminated.  (Note that this could happen even if they are
>>>> actually
>>>> used, if the compiler can figure out that the storage is not
>>>> needed.)
>>>
>>> --gc-sections and -fdata-sections does not affect variables for
>>> which I
>>> explicity set the section. They are not removed, even if not used
>>> (I'm
>>> not using LTO).
>>>
>>
>> -fdata-sections won't affect your explicit sections.  (This is a
>> setting
>> that is often used in embedded systems, as a way of minimising sizes,
>> but it actually adds significantly to code size and run-time on
>> devices
>> like ARM Cortex.)
> 
> This may be a bit off-topic here, but the firmware I'm working on (for
> ARM Cortex-M4 chip) is:
> - 205232 text, 6476 data and 83512 bss _WITH_ -fdata-sections;
> - 207652 text, 6572 data and 83512 bss _WITHOUT_ -fdata-sections;

Off-topic for this thread, perhaps, but interesting and relevant to the 
mailing list.

> 
> The first version is smaller both for flash and RAM - the difference is
> not significant (~1%), but there's no flash vs. RAM trade-off. Maybe
> the whole thing is a bit slower, but I wouldn't be so sure about that.
> 

Testing on a project of mine:

Without -fdata-sections:
    text	   data	    bss	    dec
   20352	    280	   6120	  26752

With -fdata-sections:

    text	   data	    bss	    dec	    hex	filename
   20800	    280	   6120	  27200

But I have figured out the difference.  I have -fsection-anchors 
enabled, which is what makes the code smaller.  Enabling -fdata-sections 
disables -fsection-anchors (or at least, makes it useless).  So it is 
not the -fno-data-sections itself that improves my code - it is the 
effect on -fsection-anchors.  Try -fsection-anchors instead of 
-fdata-sections.

The only reason you could have smaller data section with -fdata-sections 
is if you have variables that are defined but never used (or perhaps 
static variables that are used in a way that can be eliminated).


>> Yes, it would be.  But it's very easy to accidentally mess up your
>> variables when trying to add new ones or change existing ones.
> 
> Yes, understandable.
> 
>> A particular benefit I find with the struct solution is that you can
>> use
>> _Static_assert to check that the offsets of the different parts are
>> as
>> you expect them to be.  When you change things - replacing padding
>> and
>> "reserved" space with real variables - you will be glad of this extra
>> check.
>>
>> Another thing you can do with struct's is to have multiple struct
>> types
>> - you can have "struct params_v1" now, and later have "struct
>> params_v2"
>> for a new version.  You can use pointers to these types to read off
>> the
>> "param structure version number" item and then update the old
>> structure
>> to the new one when you first run the new software.  This can be a
>> lot
>> harder when the variables are defined independently.
>>
>> The struct method also makes it vastly easier to have multiple sets
>> of
>> parameters - perhaps a factory default set in flash.  Reset to
>> default
>> then becomes a nice memcpy from an initialised const struct.  Doing
>> this
>> with individual items in a special section in ram means duplicating
>> these items as const items within a special section in flash - it's a
>> maintenance problem waiting to happen.  And there is no equivalent of
>> "-Wmissing-field-initializers" to help you spot your bugs.
> 
> This is all fine as long as the use of such variables is very similar -
> for example ONLY as device configuration. The moment you start using
> them for completely different things then all the advantages you listed
> above (except checks with static assertions) are gone. If you have 10
> objects as device configuration, 10 objects as "persistent scratch-pad"
> (for logging information about hard crashes, faults and asserts) and
> another 10 as factory-only configuration, then there really is no
> advantage in keeping them so closely coupled together...
> 

I still don't see that as a problem - but of course it is your code, and 
you may prefer to organise it in a different way than I would.



More information about the Gcc-help mailing list