Optimising away memset() calls?

Fri Oct 10 22:34:00 GMT 2014

On 10/10/14 09:46, David Brown wrote:
> When a function is specified in the C standards, then the compiler
> /does/ know all about it.  It knows that the memset_s library function
> does not "store s in a global variable", because the C standard does not
> allow it to do that (or at least, it does not allow such an action to be
> visible to the program).
I was refering both compilers not knowing about memset_s definition and
those that do. For the later, the compiler knows that memset_s won't store
s in a global variable, but the semantic of "shall assume that the memory
indicated by s and n may be accessible in the future and thus must contain
the values indicated by c" would be equivalent in this aspect to "the 
pointer
is stored in a global variable".
> And the compiler is free to implement memset_s
> in any way it wants, including inlining it
That shouldn't be a problem.

> or perhaps even removing it
> as long as the behaviour is correct as seen by the C abstract machine.
My point was that both things (removing + correct behaviour) could not 
be done.
(I was midly expecting someone to readily present a counterexample, though)

> This is complicated by the fact that the standards don't actually
> specify what is meant by things like "memory accesses".
>
> Adding to that, as has been noted by others, particular architectures
> might need things like memory barriers, cache flushes, synchronisation
> instructions, etc., in order for the writes to be visible across the
> system.  The C compiler knows nothing about these things (it can provide
> helpful intrinsic functions, but can't use them automatically), because
> the C standards don't cover them.
This is an interesting point. I agree that the compiler could reorder 
the memset_s call,
but I don't think that more than delaying it a few statements for which 
it can prove
they don't access that area. the next library call (even if it is in the 
C spec, remember
that the way they are implemented is not defined).

Thus, the zeroed contents might not be immediatly available for an 
omniscient inspector,
but they would in a small delta. Or, if we have a bizarre architecture 
needing a barrier in
order to "commit" the memory write, that shall be performed by memset_s 
for fullfilling
the "must contain the values indicated by c" requeriment.

It is true that you may need special instructions for ensuring the new 
value from a concurrent
thread (I don't consider memset_s suitable for clearing a spinlock), but 
C doesn't deal with
threads or shared memory, thus you are . Then you either use another 
primitive to synchronize
them (and at that point the memory will have to be memsetted), or they 
are in a race condition
and there's nothing specified for what it may contain.

I also speculated with the idea of a processor that optimized the 
microcode in such a way that ended
up removing the memset_s call, but concluded that (in addition of the 
cpu requiring such global
knowledge not to be realistic) then memset_s would have to include its 
special barriers for fulfilling
the "shall assume that the memory indicated by s and n may be accessible 
in the future and thus
must contain the values indicated by c" if it was otherwise implemented 
with such instructions that
would allow the processor not to store the values in memory.

> So the only way to be absolutely sure that a memory area really is
> cleared is to use an external function that the compiler does not know
> about, and which also incorporates any required additional
> machine-specific code.  Thus you need to use memzero_explicit(),
> bzero_explicit(), or equivalent.
And is architecture specific and not portable. IMHO memset_s serves the 
task equally well, with the benefit
of being a standard function.

PS: As I was finishing this email, I thought the following dull 
implementation (error checking skipped):

errno_t memset_s(void *s, rsize_t smax, int c, rsize_t n) {
    size_t i; unsigned char *tmp;
    tmp = malloc(n);
    for (i = 0; i < n; i++) {
      tmp[i] = ((unsigned char *)s)[i] ^(unsigned char) c;
    }
    for (i = 0; i < n; i++) {
      ((unsigned char *)s)[i] ^= tmp[i];
    }
}

Although completely missing the point, I think it would be conformant.* 
If you know that a specific implementation
is flawed, please avoid it (you can use a replacement) or, better yet, 
replace your libc with a good one.