Bug 48609 - Inefficient complex float argument passing/return
Summary: Inefficient complex float argument passing/return
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.7.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
: 48607 77851 (view as bug list)
Depends on:
Blocks: argument, return
  Show dependency treegraph
 
Reported: 2011-04-14 13:57 UTC by H.J. Lu
Modified: 2023-05-15 07:22 UTC (History)
3 users (show)

See Also:
Host:
Target: x86_64
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-08-02 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2011-04-14 13:57:30 UTC
[hjl@gnu-6 pr1000]$ cat s2.i
typedef _Complex float SCtype;
extern SCtype bar;
void
foo (SCtype x)
{
  bar = x;
}
[hjl@gnu-6 pr1000]$ /usr/gcc-4.7/bin/gcc -S -O2 s2.i   
[hjl@gnu-6 pr1000]$ cat s2.s
	.file	"s2.i"
	.text
	.p2align 4,,15
	.globl	foo
	.type	foo, @function
foo:
.LFB0:
	.cfi_startproc
	movq	%xmm0, -8(%rsp)
	movl	-8(%rsp), %eax
	movl	%eax, bar(%rip)
	movl	-4(%rsp), %eax
	movl	%eax, bar+4(%rip)
	ret
	.cfi_endproc
.LFE0:
	.size	foo, .-foo

We should simply do

movq	%xmm0, bar(%rip)
Comment 1 H.J. Lu 2011-04-14 14:04:20 UTC
Load has the same problem:

[hjl@gnu-6 pr1000]$ cat load.i
typedef _Complex float SCtype;
extern SCtype foo;
SCtype
bar ()
{
  return foo;
}
[hjl@gnu-6 pr1000]$ /usr/gcc-4.7/bin/gcc -S -O2  load.i   
[hjl@gnu-6 pr1000]$ cat load.s
	.file	"load.i"
	.text
	.p2align 4,,15
	.globl	bar
	.type	bar, @function
bar:
.LFB0:
	.cfi_startproc
	movl	foo(%rip), %eax
	movl	%eax, -8(%rsp)
	movl	foo+4(%rip), %eax
	movl	%eax, -4(%rsp)
	movq	-8(%rsp), %xmm0
	ret
	.cfi_endproc
.LFE0:
	.size	bar, .-bar
Comment 2 Andrew Pinski 2021-08-02 18:02:04 UTC
Confirmed, In this case, it is a middle-end issue, I suspect if we used V2SFmode for the incoming argument, it might work better.  Right now we produce:
(insn 2 9 3 2 (set (reg:DI 86)
        (reg:DI 20 xmm0 [ x ])) "/app/example.cpp":5:1 -1
     (nil))
(insn 3 2 4 2 (set (mem/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars)
                (const_int -8 [0xfffffffffffffff8])) [0  S8 A64])
        (reg:DI 86)) "/app/example.cpp":5:1 -1
     (nil))
(insn 4 3 5 2 (set (reg:SF 84)
        (mem/c:SF (plus:DI (reg/f:DI 77 virtual-stack-vars)
                (const_int -8 [0xfffffffffffffff8])) [0  S4 A64])) "/app/example.cpp":5:1 -1
     (nil))
(insn 5 4 6 2 (set (reg:SF 85)
        (mem/c:SF (plus:DI (reg/f:DI 77 virtual-stack-vars)
                (const_int -4 [0xfffffffffffffffc])) [0  S4 A32])) "/app/example.cpp":5:1 -1
     (nil))

----- CUT ----
Return has the same issue:
(insn 13 12 14 2 (set (mem/c:SF (plus:DI (reg/f:DI 77 virtual-stack-vars)
                (const_int -8 [0xfffffffffffffff8])) [0  S4 A32])
        (reg:SF 84)) "/app/example.cpp":7:1 -1
     (nil))
(insn 14 13 15 2 (set (mem/c:SF (plus:DI (reg/f:DI 77 virtual-stack-vars)
                (const_int -4 [0xfffffffffffffffc])) [0  S4 A32])
        (reg:SF 85)) "/app/example.cpp":7:1 -1
     (nil))
(insn 15 14 16 2 (set (reg:DI 20 xmm0)
        (mem/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars)
                (const_int -8 [0xfffffffffffffff8])) [0  S8 A32])) "/app/example.cpp":7:1 -1
     (nil))
Comment 3 Andrew Pinski 2021-08-02 18:06:42 UTC
*** Bug 48607 has been marked as a duplicate of this bug. ***
Comment 4 Andrew Pinski 2021-08-07 05:28:17 UTC
*** Bug 77851 has been marked as a duplicate of this bug. ***
Comment 5 Hongtao.liu 2021-08-17 10:35:16 UTC
(In reply to Andrew Pinski from comment #2)
> Confirmed, In this case, it is a middle-end issue, I suspect if we used
> V2SFmode for the incoming argument, it might work better.  Right now we
Yes, under TAREGT_SSE2 and TARGET_64BIT, we support movv2sf, i think it's reasonable to use V2SFmode instead of DImode as incoming argument mode for SCmode.
Comment 6 Hongtao.liu 2021-08-17 10:44:03 UTC
(In reply to Hongtao.liu from comment #5)
> (In reply to Andrew Pinski from comment #2)
> > Confirmed, In this case, it is a middle-end issue, I suspect if we used
> > V2SFmode for the incoming argument, it might work better.  Right now we
> Yes, under TAREGT_SSE2 and TARGET_64BIT, we support movv2sf, i think it's
> reasonable to use V2SFmode instead of DImode as incoming argument mode for
> SCmode.

Doesn't help here

foo:
.LFB0:
	.cfi_startproc
	movlps	%xmm0, -8(%rsp)	# 3	[c=4 l=5]  *movv2sf_internal/14
	movss	-8(%rsp), %xmm0	# 16	[c=8 l=6]  *movsf_internal/7
	movss	%xmm0, bar(%rip)	# 11	[c=4 l=8]  *movsf_internal/8
	movss	-4(%rsp), %xmm0	# 17	[c=8 l=6]  *movsf_internal/7
	movss	%xmm0, bar+4(%rip)	# 12	[c=4 l=8]  *movsf_internal/8
	ret		# 21	[c=0 l=1]  simple_return_internal
Comment 7 Andrew Pinski 2021-08-17 10:55:55 UTC
(In reply to Hongtao.liu from comment #6)
> (In reply to Hongtao.liu from comment #5)
> > (In reply to Andrew Pinski from comment #2)
> > > Confirmed, In this case, it is a middle-end issue, I suspect if we used
> > > V2SFmode for the incoming argument, it might work better.  Right now we
> > Yes, under TAREGT_SSE2 and TARGET_64BIT, we support movv2sf, i think it's
> > reasonable to use V2SFmode instead of DImode as incoming argument mode for
> > SCmode.
> 
> Doesn't help here
> 
> foo:
> .LFB0:
> 	.cfi_startproc
> 	movlps	%xmm0, -8(%rsp)	# 3	[c=4 l=5]  *movv2sf_internal/14
> 	movss	-8(%rsp), %xmm0	# 16	[c=8 l=6]  *movsf_internal/7
> 	movss	%xmm0, bar(%rip)	# 11	[c=4 l=8]  *movsf_internal/8
> 	movss	-4(%rsp), %xmm0	# 17	[c=8 l=6]  *movsf_internal/7
> 	movss	%xmm0, bar+4(%rip)	# 12	[c=4 l=8]  *movsf_internal/8
> 	ret		# 21	[c=0 l=1]  simple_return_internal

You have to do a little bit more. Like change how the extraction for the two parts for the concat.
Comment 8 Hongtao.liu 2021-08-17 11:42:15 UTC
> You have to do a little bit more. Like change how the extraction for the two
> parts for the concat.

We already have vec_extractv2sfsf/vec_setv2sf, will debug to figure out why they're not used.
Comment 9 Hongtao.liu 2021-08-17 12:25:40 UTC
(In reply to Hongtao.liu from comment #8)
> > You have to do a little bit more. Like change how the extraction for the two
> > parts for the concat.
> 
> We already have vec_extractv2sfsf/vec_setv2sf, will debug to figure out why
> they're not used.

we are using stack to copy from V2SF to concat:SC
src: (reg:V2SF 20 xmm0 [ x ])
dest: (concat:SC (reg:SF 84)
        (reg:SF 85))

----cut from emit_group_store-----------------
	      else
		{
		  dest = assign_stack_temp (tmp_mode,
					    GET_MODE_SIZE (tmp_mode));
		  emit_move_insn (dest, tmps[i]);
		  dst = adjust_address (dest, dest_mode, bytepos);
		}
	      break;
-------cut end---------------------