[PATCH 2/6] RISC-V Port: gcc

Richard Henderson rth@redhat.com
Tue Jan 31 18:35:00 GMT 2017


On 01/30/2017 05:10 PM, Andrew Waterman wrote:
>>> +(define_expand "clear_cache"
>>> +  [(match_operand 0 "pmode_register_operand")
>>> +   (match_operand 1 "pmode_register_operand")]
>>> +  ""
>>> +  "
>>> +{
>>> +  emit_insn (gen_fence_i ());
>>> +  DONE;
>>> +}")
>>
>>
>> Do you need a FENCE before the FENCE.I?
> 
> It's actually not clear to me what the semantics of clear_cache are
> for multiprocessors.  Can you shed some light?
> 
> If thread A does modifies code then sets a flag, then thread B reads
> the flag and executes a FENCE.I, then thread A needs a FENCE before
> setting the flag and thread B needs a fence before the FENCE.I.  But,
> is it not the software's responsibility to insert both fences, rather
> than assuming one of the fences is folded into clear_cache?

Your introduction of "flag" confuses the issue.

Having re-read the description in section 2.7, I see that FENCE.I is
thread-local and is all that is required for a single thread to sync its own I
and D caches.  I think perhaps I'd mis-read or mis-remembered before.

Practically speaking, I'm not sure we have put any real thought about what
needs to happen for threads using on-stack trampolines.  Certainly no other gcc
port attempts to broadcast the need for an icache flush to other cpus.

So just leave as-is -- __builtin_clear_cache works properly for the local thread.


>>> +(define_insn "call_value_multiple_internal"
>>> +  [(set (match_operand 0 "register_operand" "")
>>> +       (call (mem:SI (match_operand 1 "call_insn_operand" "l,S"))
>>> +             (match_operand 2 "" "")))
>>> +   (set (match_operand 3 "register_operand" "")
>>> +       (call (mem:SI (match_dup 1))
>>> +             (match_dup 2)))
>>
>>
>> Any reason for this?  Your return value registers are sequential.  The
>> normal thing to do is just use e.g. (reg:TI 10).
> 
> I think we'd need different patterns for mixed int/FP struct returns
> (which use a0 and fa0) if we took that approach.

Ah.  Other targets get away with using a PARALLEL.

>From sparc.c, function_arg_record_value:

  data.ret = gen_rtx_PARALLEL (mode, rtvec_alloc (data.stack + nregs));

and sparc.md:

(define_insn "*call_value_symbolic_sp64"
  [(set (match_operand 0 "" "")
        (call (mem:DI (match_operand:DI 1 "symbolic_operand" "s"))
              (match_operand 2 "" "")))
   (clobber (reg:DI O7_REG))]

So you wind up with things like

  (set (parallel [
         (reg:DI o0)
         (reg:DF fp0)
         (reg:DI o1)
         (reg:DF fp1)
        ])
       (call (mem:DI (symbol_ref:DI "function"))
             (const_int 0)))

We do the same for x86_64 -- in function_value_64:

  ret = construct_container (mode, orig_mode, valtype, 1,
                             X86_64_REGPARM_MAX, X86_64_SSE_REGPARM_MAX,
                             x86_64_int_return_registers, 0);


r~



More information about the Gcc-patches mailing list