limiting call clobbered registers for library functions

Paul Shortis pshortis@dataworx.com.au
Mon Feb 2 21:55:00 GMT 2015


On 02/02/15 18:55, Yury Gribov wrote:
> On 01/30/2015 11:16 AM, Matthew Fortune wrote:
>> Yury Gribov <y.gribov@samsung.com> writes:
>>> On 01/29/2015 08:32 PM, Richard Henderson wrote:
>>>> On 01/29/2015 02:08 AM, Paul Shortis wrote:
>>>>> I've ported GCC to a small 16 bit CPU that has single bit 
>>>>> shifts. So
>>>>> I've handled variable / multi-bit shifts using a mix of 
>>>>> inline shifts
>>>>> and calls to assembler support functions.
>>>>>
>>>>> The calls to the asm library functions clobber only one (by 
>>>>> const) or
>>>>> two
>>>>> (variable) registers but of course calling these functions 
>>>>> causes all
>>>>> of the standard call clobbered registers to be considered 
>>>>> clobbered,
>>>>> thus wasting lots of candidate registers for use in 
>>>>> expressions
>>>>> surrounding these shifts and causing unnecessary register 
>>>>> saves in
>>> the surrounding function prologue/epilogue.
>>>>>
>>>>> I've scrutinized and cloned the actions of other ports that 
>>>>> do the
>>>>> same, however I'm unable to convince the various passes 
>>>>> that only r1
>>>>> and r2 can be clobbered by these library calls.
>>>>>
>>>>> Is anyone able to point me in the proper direction for a 
>>>>> solution to
>>>>> this problem ?
>>>>
>>>> You wind up writing a pattern that contains a call, but isn't
>>>> represented in rtl as a call.
>>>
>>> Could it be useful to provide a pragma for specifying 
>>> function register
>>> usage? This would allow e.g. library writer to write a 
>>> hand-optimized
>>> assembly version and then inform compiler of it's binary 
>>> interface.
>>>
>>> Currently a surrogate of this can be achieved by putting 
>>> inline asm code
>>> in static inline functions in public library headers but this 
>>> has it's
>>> own disadvantages (e.g. code bloat).
>>
>> This sounds like a good idea in principle. I seem to recall 
>> seeing something
>> similar to this in other compiler frameworks that allow a 
>> number of special
>> calling conventions to be defined and enable functions to be 
>> attributed to use
>> one of them. I.e. not quite so general as specifying an 
>> arbitrary clobber list
>> but some sensible pre-defined alternative conventions.
>
> FYI a colleague from kernel mentioned that they already achieve 
> this by wrapping the actual call with inline asm e.g.
>
> static inline int foo(int x) {
>   asm(
>     ".global foo_core\n"
>     // foo_core accepts single parameter in %rax,
>     // returns result in %rax and
>     // clobbers %rbx
>     "call foo_core\n"
>     : "+a"(x)
>     :
>     : "rbx"
>   );
>   return x;
> }
>
> We still can't mark inline asm with things like 
> __attribute__((pure)), etc. though so it's not an ideal solution.
>
> -Y
>
>
Thanks everyone.

I've finally settled on an extension of the solution offered by 
Richard(from the SH port)

I've had to ...

     1.    write an expander that expands ...


             a)    short (bit count) constant shifts to an 
instruction pattern that emits asm for one or more inline shifts
             b)    other constant shifts to another instruction 
pattern that emits asm for a libary call and clobbers cc
             c)    variable shifts to another instruction pattern 
that emits asm for a libary call and clobbers ccplus r2

then for compare elimination two other instruction patters 
corresponding to b) and c) that set CC from a compare instead of 
clobbering it.

I could have avoided the expander and used a single instruction 
pattern for a)b)c) if if could have found a way to have 
alternative dependent clobbers in an instruction pattern. I 
investigated attributes but couldn't see how I would be able to 
achieve what I needed. Also tried clobber (match_dup 2) but when 
one of the alternatives has a constant for operands[2] the 
clobber is accepted silently by the .md compiler but doesn't 
actually clobber the non-constant alternatives.

A mechanism to implement alternative dependent clobbers would 
have allowed all this to be represented in a much more succinct 
and compact manner.

On another note... compare elimination didn't work for pattern c) 
and on inspection I found this snippet in compare-elim.c

static bool
arithmetic_flags_clobber_p (rtx insn)
{
...
...
   if (GET_CODE (pat) == PARALLEL && XVECLEN (pat, 0) ==2)
     {

which of course rejects any parallel pattern with a main rtx and 
more than one clobber (i.e. (c) above). So I changed this 
function so that it accepts patterns where rtx's one and upwards 
are all clobbers, one being cc. The resulting generated asm ...

         ld      r2,r4                    ; r2 holds the shift 
count, it will be clobbered to calculate an index into the 
sequence of shift instructions
         call    __ashl_v            ; variable ashl
         beq     .L4                    ; branch on result
         ld      r1,r3

If anyone can see any fault in this change please call out

Of course, the problem with using inline asm is that you have to 
arrange for them to be included in EVERY compile and they don't 
allow compare elimination to be easily implemented.

Paul.







More information about the Gcc mailing list