This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH 2/6] RISC-V Port: gcc
On Tue, Jan 31, 2017 at 10:32 AM, Richard Henderson <rth@redhat.com> wrote:
> On 01/30/2017 05:10 PM, Andrew Waterman wrote:
>>>> +(define_expand "clear_cache"
>>>> + [(match_operand 0 "pmode_register_operand")
>>>> + (match_operand 1 "pmode_register_operand")]
>>>> + ""
>>>> + "
>>>> +{
>>>> + emit_insn (gen_fence_i ());
>>>> + DONE;
>>>> +}")
>>>
>>>
>>> Do you need a FENCE before the FENCE.I?
>>
>> It's actually not clear to me what the semantics of clear_cache are
>> for multiprocessors. Can you shed some light?
>>
>> If thread A does modifies code then sets a flag, then thread B reads
>> the flag and executes a FENCE.I, then thread A needs a FENCE before
>> setting the flag and thread B needs a fence before the FENCE.I. But,
>> is it not the software's responsibility to insert both fences, rather
>> than assuming one of the fences is folded into clear_cache?
>
> Your introduction of "flag" confuses the issue.
>
> Having re-read the description in section 2.7, I see that FENCE.I is
> thread-local and is all that is required for a single thread to sync its own I
> and D caches. I think perhaps I'd mis-read or mis-remembered before.
>
> Practically speaking, I'm not sure we have put any real thought about what
> needs to happen for threads using on-stack trampolines. Certainly no other gcc
> port attempts to broadcast the need for an icache flush to other cpus.
>
> So just leave as-is -- __builtin_clear_cache works properly for the local thread.
Sorry, I jumped right to the assumption that you were suggesting a
FENCE for multiprocessor synchronization. Indeed, it's not necessary
for enforcing ordering between local stores and FENCE.I.
>
>
>>>> +(define_insn "call_value_multiple_internal"
>>>> + [(set (match_operand 0 "register_operand" "")
>>>> + (call (mem:SI (match_operand 1 "call_insn_operand" "l,S"))
>>>> + (match_operand 2 "" "")))
>>>> + (set (match_operand 3 "register_operand" "")
>>>> + (call (mem:SI (match_dup 1))
>>>> + (match_dup 2)))
>>>
>>>
>>> Any reason for this? Your return value registers are sequential. The
>>> normal thing to do is just use e.g. (reg:TI 10).
>>
>> I think we'd need different patterns for mixed int/FP struct returns
>> (which use a0 and fa0) if we took that approach.
>
> Ah. Other targets get away with using a PARALLEL.
>
> >From sparc.c, function_arg_record_value:
>
> data.ret = gen_rtx_PARALLEL (mode, rtvec_alloc (data.stack + nregs));
>
> and sparc.md:
>
> (define_insn "*call_value_symbolic_sp64"
> [(set (match_operand 0 "" "")
> (call (mem:DI (match_operand:DI 1 "symbolic_operand" "s"))
> (match_operand 2 "" "")))
> (clobber (reg:DI O7_REG))]
>
> So you wind up with things like
>
> (set (parallel [
> (reg:DI o0)
> (reg:DF fp0)
> (reg:DI o1)
> (reg:DF fp1)
> ])
> (call (mem:DI (symbol_ref:DI "function"))
> (const_int 0)))
>
> We do the same for x86_64 -- in function_value_64:
>
> ret = construct_container (mode, orig_mode, valtype, 1,
> X86_64_REGPARM_MAX, X86_64_SSE_REGPARM_MAX,
> x86_64_int_return_registers, 0);
Thanks--these pointers were quite helpful.
>
>
> r~