Bug 67206 - Redundant spills in simple copy loop for 32-bit x86 target
Summary: Redundant spills in simple copy loop for 32-bit x86 target
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 6.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2015-08-13 14:46 UTC by Yuri Rumyantsev
Modified: 2021-07-25 00:48 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
test-case to reproduce (132 bytes, text/x-csrc)
2015-08-13 14:48 UTC, Yuri Rumyantsev
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yuri Rumyantsev 2015-08-13 14:46:48 UTC
For attached simple test-case we can see strange spills to stack, namely
    for (i=0; i<n; i++)
      out[j * n + i] = in[j * n + i];

.L9:
	movdqa	(%eax), %xmm0
	addl	$1, %edx
	movdqu	%xmm0, (%ecx)
	addl	$16, %eax
	movdqa	%xmm0, 32(%esp)  ?? Redundant
	addl	$16, %ecx
	movl	%eax, 32(%esp)   ?? Redundant
	cmpl	52(%esp), %edx
	movl	%ecx, 48(%esp)   ?? Redundant
	jb	.L9

Another issue is that loop distribution is not recognized such loop and memmove loop. Note that this is reproduced with 4-9 compiler.
Comment 1 Yuri Rumyantsev 2015-08-13 14:48:45 UTC
Created attachment 36180 [details]
test-case to reproduce

Must be compiled with -O3 -m32 -march=slm to reproduce.
Comment 2 Richard Biener 2015-08-14 08:47:10 UTC
The memmove issue is because of

(compute_affine_dependence
  stmt_a: _16 = *_15;
  stmt_b: *_12 = _16;
) -> dependence analysis failed

      /* Now check that if there is a dependence this dependence is
         of a suitable form for memmove.  */
      vec<loop_p> loops = vNULL;
      ddr_p ddr;
      loops.safe_push (loop);
      ddr = initialize_data_dependence_relation (single_load, single_store,
                                                 loops);
      compute_affine_dependence (ddr, loop);
      if (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know)
        {
          free_dependence_relation (ddr);
          loops.release ();
          return;
        }

note that we don't use dependence analysis only to decide memcpy vs. memmove
(we use general alias analysis for that) but it is used to guard against
a[i+1] = a[i] which is not a memmove.  The loop in the example could be of
that form if out == in + 1.
Comment 3 Andrew Pinski 2021-07-25 00:48:27 UTC
.L4:
        movzbl  (%eax), %ebx
        addl    $1, %eax
        addl    $1, %edx
        movb    %bl, -1(%edx)
        cmpl    %ecx, %eax
        jne     .L4

The memmove issue is still there.