Post-increment constraint in inline assembly (SuperH)

Georg-Johann Lay avr@gjlay.de
Mon Jan 29 20:39:00 GMT 2018


Sébastien Michelland schrieb:
> On 01/29/2018 05:57 PM, Georg-Johann Lay wrote:
>> On 29.01.2018 14:19, Oleg Endo wrote:
>>> The problem (or rather disadvantage) with this approach is that the
>>> compiler doesn't know what the value of "src" is after it has been
>>> modified by the asm code.  Segher's suggestion looks like the better
>>> option.
> 
> I think Segher's option is technically the way to go, except (as you 
> pointed out) it doesn't /ensure/ that the compiler will use post-increment.
> 
> On the other hand, syntactically forcing the compiler to use 
> post-increment prevents it from optimizing my function in another way. 
> Getting in its way is (from my experience) usually not a good idea.
> 
>> The "m>"(*src) operand doesn't even express that src is changing, and 
>> the constraint allows to use post-increment, but does not force it.
> 
> How could GCC choose post-increment in this situation? The documentation 
> says that using an operand which is under the ">" constraint in multiple 
> instructions isn't valid. Does "*src" not count as a use of "src"?
> 
> Now, what would happen if GCC decided to use post-increment mode for 
> "m>"(*src)? Would that interfere with the way src opaquely changes 
> because of the "+r"(src) constraint?

My bad.  If GCC uses post-increment, then the value in the 
post-incremented register no more represents src. But when src+1 is used 
in the remainder, gcc detects that this value has already been computed 
and reuses the post-incremented reg instead of recomputing src+1.

Hence src does /not/ change, whereas the register used do address *src 
/does/ if post-increment is used.  As src does not change, there's no 
need to express it in terms of constraints.

>> Moreover, the explicit usage of src+1 might add additional overhead; 
>> it's clear that the respective operation "p = src+1;" should not be 
>> optimized away to have an effect, but /if/ is has an effect then this 
>> might be an overhead (which Sébastien wanted to get rid of).
> 
> Well, the problem I'm originally set out to solve is writing a decent 
> memcpy() function for unaligned source/destination pairs. I should have 
> mentioned it sooner (wild XY problem draws near).

Some libc implementation already perform such pre- and post-alignment, 
e.g. Newlib provided it is compiled for speed (and the machine part 
doesn't deviates from that default).

> I can see what you suggest: the compiler may choose plain dereferencing 
> and add another instruction to perform increment src afterwards. It does 
> so until -O2.
> 
> I don't /really/ like not knowing whether the compiler will generate the 
> code I planned, but in the worst-case situation I can still write my 
> trivial function in assembler and enable LTO.

Downside of assembler is that it cannot be inlined, so if the call 
overhead matters, you may want an assembler version with defined code 
sequence and with call overhead for large sizes, and a C implementation 
that can be inlined for small sizes.

Notice that gcc already performs inline expansion of memcpy provided it 
is not inhibited by -fno-builtin-memcpy, -ffreestanding etc.  In the 
latter case you can use __builtin_memcpy for small sizes.  The point 
where gcc switches from inline expansion and unrolling to libcall (if 
any) depends on optimization options, (known) alignment, size to copy 
and also how much work has been but into the respective backend.

Johann

> Right now I will stick to Segher's solution, partly because it behaves 
> better while compiling - see my attached filed for the test case.
> 
> I'll still remember the trick. Thank you all for your insights!
> 
> Regards,
> Sébastien



More information about the Gcc-help mailing list