This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [patch,avr] Fix PR 65657 - read from __memx address space tramples arguments to function call

From: Senthil Kumar Selvaraj <senthil_kumar dot selvaraj at atmel dot com>
To: Georg-Johann Lay <avr at gjlay dot de>
Cc: <gcc-patches at gcc dot gnu dot org>, <chertykov at gmail dot com>
Date: Thu, 23 Apr 2015 12:34:24 +0530
Subject: Re: [patch,avr] Fix PR 65657 - read from __memx address space tramples arguments to function call
Authentication-results: sourceware.org; auth=none
References: <20150416064347 dot GA31491 at atmel dot com> <552F7A8D dot 1080709 at gjlay dot de> <20150416092802 dot GC31491 at atmel dot com> <552FE79E dot 2040509 at gjlay dot de> <20150417074655 dot GA3186 at atmel dot com>

On Fri, Apr 17, 2015 at 01:16:58PM +0530, Senthil Kumar Selvaraj wrote:
> On Thu, Apr 16, 2015 at 06:47:26PM +0200, Georg-Johann Lay wrote:
> > Am 04/16/2015 um 11:28 AM schrieb Senthil Kumar Selvaraj:
> > >On Thu, Apr 16, 2015 at 11:02:05AM +0200, Georg-Johann Lay wrote:
> > >>Am 04/16/2015 um 08:43 AM schrieb Senthil Kumar Selvaraj:
> > >>>This patch fixes PR 65657.
> > >>
> > >>The following artifact appears to be PR63633.
> > >>
> > >
> > >I did see that one - unfortunately, that fix won't help here. IIUC, you
> > >check if input/output operand hard regs are in the clobber list,
> > >and if yes, you generate pseudos to save and restore clobbered hard
> > >regs.
> > >
> > >In this case, the reg is actually clobbered by a different insn (one
> > 
> > Arrgh, yes...
> > 
> > >that loads the next argument to the function). So unless I blindly generate pseudos for
> > >all regs in the clobber list, the clobbering will still happen.
> > >
> > >FWIW, I did try saving/restoring all clobbered regs, and it did fix the
> > >problem - just that it appeared like a (worse) hack to me. Aren't we
> > >manually replicating what RA/reload should be doing?
> > 
> > As it appears, we'll have to do it by hand.  The attaches patch is just a
> > sketch that indicates how the problem could be approached.  Notice the new
> > assertions in the split expanders; they will throw ICE until the fix is
> > actually installed.
> > 
> > The critical insn are generated in movMM expander and are changed to have no
> > clobbers (SCRATCHes actually).  An a later pass, when appropriate life info
> > can be made available, run yet another avr pass that
> > 
> > 1a) Save-restore needed hard regs around the insn.
> > 
> > 2a) Kick out hard regs overlapping the clobbers, e.g. in set_src, into new
> > pseudos.  Maybe that could happen due to some hard regs progagation, or we
> > can use a new predicate similar combine_pseudo_register_operand.
> > 
> > 3) Replace scratch -> hard regs for all scratch_operands.
> > 
> > 2b) Restore respective hard regs from their pseudos.
> > 
> > 1b) Restore respective hard regs from their pseudos.
> > 
> > 
> > And maybe we can get better code by allocating things like address register
> > by hand and get better code then.
> > 
> > When I implemented some of the libgcc insns I tried to express the operand
> > by means of constraints, e.h. for (reg:HI 22) and let register allocator do
> > the job.
> > 
> > The resulting code was functional but *horrific*.
> > 
> > The register allocator is not yet ready to generate efficient code in such
> > demanding situations...
> > 
> > >
> > >What do you think?
> > >
> > 
> > IMO sooner or later we'll need such an infrastructure; maybe also for
> > non-mov insn that are implemented by transparent libcalls like divmod, mul,
> > etc.
> 
> I'm curious how other embedded targets handle this though - arm, for
> example? Surely they should have some libcalls/builtins which need 
> specific hard regs? I should check that out.
> 
> > 
> > >>>When cfgexpand.c expands a function call, it first figures out the
> > >>>registers in which the arguments will go, followed by expansion of the
> > >>>arguments themselves (right to left). It later emits mov insns to set
> > >>>the precomputed registers with the expanded RTXes.
> > >>>
> > >>>If one of the arguments is a __memx char pointer dereference, the mov
> > >>>eventually expands to gen_xload<mode>_A (for certain cases), which
> > >>>clobbers R22, R21 and Z.  This causes problems if one of those
> > >>>clobbered registers was precomputed to hold another argument.
> > >>>
> > >>>In general, call expansion does not appear to take clobbers into account -
> > 
> > We had been warned that using hard regs is evil...  But without that
> > technique the code quality would decrease way too much.
> 
> Oh ok - do you know some place where this is documented or was
> discussed?
> > 
> > >>>when it checks for argument overlap, the RTX (args[i].value) is only a MEM
> > >>>in QImode for the memx deref - the clobber shows up when it eventually
> > >>>calls emit_move_insn, at which point, it is too late.
> > 
> > Such situations could only be handled by a target hook which allowed to
> > expand specific trees by hand...  Such a hook could cater for insn that must
> > use hard registers.
> 
> Yes, I guess so :(
> > 
> > >>>This does not happen for a int pointer dereference - turns out that
> > >>>precompute_register_parameters does a copy_to_mode_reg if the
> > >>>cost of args[i].value is more than COSTS_N_INSNS(1) i.e., it creates a
> > >>>pseudo and later assigns the pseudo to the arg register. This is done
> > >>>before any moves to arg registers is done, so other arguments are not
> > >>>overwritten.
> > >>>
> > >>>Doing the same thing - providing a better cost estimate for a MEM rtx in
> > >>>the non-default address space, makes this problem go away, and that is
> > >>>what this patch does. Regression testing does not show any new failures.
> > 
> > Can you tell something about overall code quality? If it is not
> > significantly worse then I'd propose to apply your rtx-costs solution soon.
> > The full fix will take more time to work it out.
> > 
> 
> For this piece of code - 
> 
> void foo ( char c, unsigned int d);
> void readx (const char __memx *x)
> {
>     foo (*x, 0xABCD);
> }
> 
> the buggy code (for readx) looks like this
> 
> 	mov r18,r22
> 	mov r25,r23
> 	ldi r22,lo8(-51)
> 	ldi r23,lo8(-85)
> 	mov r30,r18
> 	mov r31,r25
> 	mov r21,r24
> 	call __xload_1
> 	mov r24,r22
> 	jmp foo
> 
> With the patch, the code looks like this
> 
> 	movw r30,r22
> 	mov r21,r24
> 	call __xload_1
> 	mov r24,r22
> 	ldi r22,lo8(-51)
> 	ldi r23,lo8(-85)
> 	jmp foo
> 
> The code turns out to be better in this case, as the loads to r22,r23
> get done later.
> 
> The dump for cfgexpand is where the change originates - the memx dereference is 
> saved in a pseudo before the moves to argument registers start.
> 
> Before:
> 
> (insn 2 4 3 2 (set (reg/v/f:PSI 43 [ x ])
>         (reg:PSI 22 r22 [ x ])) foo.c:4 -1
>      (nil))
> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> (insn 6 3 7 2 (set (reg:HI 22 r22)
>         (const_int -21555 [0xffffffffffffabcd])) foo.c:5 -1
>      (nil))
> (insn 7 6 8 2 (parallel [
>             (set (reg:QI 24 r24)
>                 (mem:QI (reg/v/f:PSI 43 [ x ]) [0 *x_2(D)+0 S1 A8 AS7]))
>             (clobber (reg:QI 22 r22))
>             (clobber (reg:QI 21 r21))
>             (clobber (reg:HI 30 r30))
>         ]) foo.c:5 -1
>      (nil))
> (call_insn/j 8 7 9 2 (parallel [
>             (call (mem:HI (symbol_ref:HI ("foo") [flags 0x41]  <function_decl 0x7f7129589900 foo>) [0 foo S2 A8])
>                 (const_int 0 [0]))
>             (use (const_int 1 [0x1]))
>         ]) foo.c:5 -1
>      (nil)
>     (expr_list:REG_EQUAL (use (reg:QI 24 r24))
>         (expr_list:REG_NONNEG (use (reg:HI 22 r22))
>             (nil))))
> (barrier 9 8 0)
> 
> 
> After:
> 
> (insn 2 4 3 2 (set (reg/v/f:PSI 43 [ x ])
>         (reg:PSI 22 r22 [ x ])) foo.c:4 -1
>      (nil))
> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> (insn 6 3 7 2 (parallel [
>             (set (reg:QI 44)
>                 (mem:QI (reg/v/f:PSI 43 [ x ]) [0 *x_2(D)+0 S1 A8 AS7]))
>             (clobber (reg:QI 22 r22))
>             (clobber (reg:QI 21 r21))
>             (clobber (reg:HI 30 r30))
>         ]) foo.c:5 -1
>      (nil))
> (insn 7 6 8 2 (set (reg:HI 22 r22)
>         (const_int -21555 [0xffffffffffffabcd])) foo.c:5 -1
>      (nil))
> (insn 8 7 9 2 (set (reg:QI 24 r24)
>         (reg:QI 44)) foo.c:5 -1
>      (nil))
> (call_insn/j 9 8 10 2 (parallel [
>             (call (mem:HI (symbol_ref:HI ("foo") [flags 0x41]  <function_decl 0x7fa986a0f900 foo>) [0 foo S2 A8])
>                 (const_int 0 [0]))
>             (use (const_int 1 [0x1]))
>         ]) foo.c:5 -1
>      (nil)
>     (expr_list:REG_EQUAL (use (reg:QI 24 r24))
>         (expr_list:REG_NONNEG (use (reg:HI 22 r22))
>             (nil))))
> (barrier 10 9 0)
> 
> Without the patch, and with the arguments
> reversed, like this
> 
> void foo (  unsigned int d, char c);
> void readx (const char __memx *x)
> {
>     foo (0xABCD, *x);
> }
> 
> the compiler elides the dereference itself - it just emits
> 
> 	ldi r24,lo8(-51)
> 	ldi r25,lo8(-85)
> 	jmp foo
> 
> Scary :). That's because expand has
> 
> (insn 6 3 7 2 (parallel [
>             (set (reg:QI 22 r22)
>                 (mem:QI (reg/v/f:PSI 43 [ x ]) [0 *x_2(D)+0 S1 A8 AS7]))
>             (clobber (reg:QI 22 r22))
>             (clobber (reg:QI 21 r21))
>             (clobber (reg:HI 30 r30))
>         ]) foo.c:5 -1
>      (nil))
> 
> but this, I guess, will not happen if the fix for PR63633 was applied.
> 
> With my cost patch, I get
> 
> 	movw r30,r22
> 	mov r21,r24
> 	call __xload_1
> 	ldi r24,lo8(-51)
> 	ldi r25,lo8(-85)
> 	jmp foo
> 
> For the testcase posted originally for the bug,
> <snip>
> 
> extern void writeFlashByte(uint8_t byte, uint16_t handle);
> 
> void writeBuf(const uint8_t __memx *from, const uint8_t __memx *to)
> {
>     uint16_t handle = ((uint16_t)(((__uint24)to) & 0xFFFFUL));
>     writeFlashByte(*from++, handle++);
> }
> 
> </snip>
> 
> Before:
> 
> 	mov r27,r22
> 	mov r26,r23
> 	mov r21,r24
> 	mov r24,r20
> 	movw r22,r18
> 	mov r30,r27
> 	mov r31,r26
> 	call __xload_1
> 	mov r24,r22
> 	jmp writeFlashByte
> 
> After:
> 
> 	push r16
> 	push r17
> 	movw r16,r18
> 	mov r18,r20
> 	movw r30,r22
> 	mov r21,r24
> 	call __xload_1
> 	mov r24,r22
> 	movw r22,r16
> 	pop r17
> 	pop r16
> 	jmp writeFlashByte
> 
> Looks like worse code, but reload had to choose call saved regs (r16 and r17) for zero_extendpsisi's 
> output operand, as we don't allow REG_X (or greater) for SI mode values
> in avr_hard_regno_mode_ok.
> 
> Looks reasonable to me - what do you guys say?

I realize Johann is working on eliminating hard regs usage in favor of register
classes/constraints instead, but I think this patch that sets the correct 
cost for MEM rtx in non-default address space does nothing "wrong", even
if it is only a tangential fix to the problem.

What do you guys say?

> 
> Regards
> Senthil
> > 
> > Johann
> >

References:
- [patch,avr] Fix PR 65657 - read from __memx address space tramples arguments to function call
  - From: Senthil Kumar Selvaraj
- Re: [patch,avr] Fix PR 65657 - read from __memx address space tramples arguments to function call
  - From: Georg-Johann Lay
- Re: [patch,avr] Fix PR 65657 - read from __memx address space tramples arguments to function call
  - From: Senthil Kumar Selvaraj
- Re: [patch,avr] Fix PR 65657 - read from __memx address space tramples arguments to function call
  - From: Georg-Johann Lay
- Re: [patch,avr] Fix PR 65657 - read from __memx address space tramples arguments to function call
  - From: Senthil Kumar Selvaraj

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]