67206 – Redundant spills in simple copy loop for 32-bit x86 target

Bug 67206 - Redundant spills in simple copy loop for 32-bit x86 target

Summary: Redundant spills in simple copy loop for 32-bit x86 target

Status:	UNCONFIRMED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	6.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2015-08-13 14:46 UTC by Yuri Rumyantsev
Modified:	2021-07-25 00:48 UTC (History)
CC List:	0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
test-case to reproduce (132 bytes, text/x-csrc) 2015-08-13 14:48 UTC, Yuri Rumyantsev	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Yuri Rumyantsev 2015-08-13 14:46:48 UTC

For attached simple test-case we can see strange spills to stack, namely
    for (i=0; i<n; i++)
      out[j * n + i] = in[j * n + i];

.L9:
	movdqa	(%eax), %xmm0
	addl	$1, %edx
	movdqu	%xmm0, (%ecx)
	addl	$16, %eax
	movdqa	%xmm0, 32(%esp)  ?? Redundant
	addl	$16, %ecx
	movl	%eax, 32(%esp)   ?? Redundant
	cmpl	52(%esp), %edx
	movl	%ecx, 48(%esp)   ?? Redundant
	jb	.L9

Another issue is that loop distribution is not recognized such loop and memmove loop. Note that this is reproduced with 4-9 compiler.

Comment 1 Yuri Rumyantsev 2015-08-13 14:48:45 UTC

Created attachment 36180 [details]
test-case to reproduce

Must be compiled with -O3 -m32 -march=slm to reproduce.

Comment 2 Richard Biener 2015-08-14 08:47:10 UTC

The memmove issue is because of

(compute_affine_dependence
  stmt_a: _16 = *_15;
  stmt_b: *_12 = _16;
) -> dependence analysis failed

      /* Now check that if there is a dependence this dependence is
         of a suitable form for memmove.  */
      vec<loop_p> loops = vNULL;
      ddr_p ddr;
      loops.safe_push (loop);
      ddr = initialize_data_dependence_relation (single_load, single_store,
                                                 loops);
      compute_affine_dependence (ddr, loop);
      if (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know)
        {
          free_dependence_relation (ddr);
          loops.release ();
          return;
        }

note that we don't use dependence analysis only to decide memcpy vs. memmove
(we use general alias analysis for that) but it is used to guard against
a[i+1] = a[i] which is not a memmove.  The loop in the example could be of
that form if out == in + 1.

Comment 3 Andrew Pinski 2021-07-25 00:48:27 UTC

.L4:
        movzbl  (%eax), %ebx
        addl    $1, %eax
        addl    $1, %edx
        movb    %bl, -1(%edx)
        cmpl    %ecx, %eax
        jne     .L4

The memmove issue is still there.