Bug 38592 - Optimize memmove / memcmp combination
Summary: Optimize memmove / memcmp combination
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.4.0
: P5 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on: 62156
Blocks: Fortran_character 36854
  Show dependency treegraph
 
Reported: 2008-12-21 10:46 UTC by Thomas Koenig
Modified: 2023-11-21 21:18 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-11-11 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Koenig 2008-12-21 10:46:10 UTC
$ cat foo.f90
program main
  character(len=3) :: a
  a = 'yes'
  print *,'yes' == a
end program main
$ gfortran -fdump-tree-optimized -O3 -S foo.f90
$ grep compare_string foo.s
        call    _gfortran_compare_string
Comment 1 Thomas Koenig 2008-12-21 12:03:37 UTC
To clarify, we could detect that the variable only ever
has the value of 'yes'.  We already do the right
thing with constants:

$ cat foo-2.f90
program main
  character(len=3), parameter :: a = 'yes'
  print *,'yes' == a
end program main
$ gfortran -fdump-tree-original foo-2.f90
$ grep compare foo-2.f90.003t.original
$ 
Comment 2 Daniel Franke 2008-12-28 01:28:44 UTC
This is generally the case, not just for/with strings.

I don't think that anything will ever happen in this regard, at least, not on the frontend side. If (some) intrinsics, including string comparisons, were inlined, the optimizers could probably do something about it. 
Comment 3 Paul Thomas 2009-02-19 05:44:12 UTC
Thomas, As a matter of curiosity, do other compilers catch this?

Confirmed

Paul
Comment 4 Francois-Xavier Coudert 2009-05-08 09:30:55 UTC
(In reply to comment #3)
> As a matter of curiosity, do other compilers catch this?

Intel does not.
Comment 5 Tobias Burnus 2009-05-08 11:25:40 UTC
> > As a matter of curiosity, do other compilers catch this?
> Intel does not.

Sure? If I look at the assembler of ifort 11.1 with -O3, I only see:
        call      __intel_new_proc_init                         #1.9
        call      for_set_reentrancy                            #1.9
        call      for_write_seq_lis                             #4.3
And if I'm not mistaken, sunf95 also does not have any call anymore.
Comment 6 kargls 2016-10-07 19:39:41 UTC
Sometime between 4.8.4 and 4.9.4, the _gfortran_compare_string
has been replaced by a memcmp().

% gfc48 -O3 -fdump-tree-optimized -S -fdump-tree-original a.f90
% grep compare a.s
        call    _gfortran_compare_string
% gfc49 -O3 -fdump-tree-optimized -S -fdump-tree-original a.f90
% grep compare a.s


With gcc7, the -fdump-tree-original gives

  __builtin_memmove ((void *) &a, (void *) &"yes"[1]{lb: 1 sz: 1}, 3);
  {
    struct __st_parameter_dt dt_parm.0;

    dt_parm.0.common.filename = &"a.f90"[1]{lb: 1 sz: 1};
    dt_parm.0.common.line = 4;
    dt_parm.0.common.flags = 128;
    dt_parm.0.common.unit = 6;
    _gfortran_st_write (&dt_parm.0);
    {
      logical(kind=4) D.3408;

      D.3408 = __builtin_memcmp ((void *) &"yes"[1]{lb: 1 sz: 1},
                                 (void *) &a, 3) == 0;
      _gfortran_transfer_logical_write (&dt_parm.0, &D.3408, 4);
    }
    _gfortran_st_write_done (&dt_parm.0);

and -fdump-tree-optimized shows

  <bb 2>:
  MEM[(c_char * {ref-all})&a] = "yes";
  dt_parm.0.common.filename = &"a.f90"[1]{lb: 1 sz: 1};
  dt_parm.0.common.line = 4;
  dt_parm.0.common.flags = 128;
  dt_parm.0.common.unit = 6;
  _gfortran_st_write (&dt_parm.0);
  _1 = __builtin_memcmp_eq (&"yes"[1]{lb: 1 sz: 1}, &a, 3);
  _2 = _1 == 0;
  D.3408 = _2;
  _gfortran_transfer_logical_write (&dt_parm.0, &D.3408, 4);
  D.3408 ={v} {CLOBBER};
  _gfortran_st_write_done (&dt_parm.0);
  dt_parm.0 ={v} {CLOBBER};
  a ={v} {CLOBBER};
  return;

I personally think this is non-issue now and the bug can be
closed.
Comment 7 Thomas Koenig 2016-10-09 11:23:49 UTC
We still do the comparison, although with memcmp now.

More interesting question is if we could/should do
forward propagation of values in the front end,
or if this is something that the middle-end should,
in principle, do.
Comment 8 Thomas Koenig 2017-08-15 12:02:52 UTC
I think the best place to fix this is in the middle-end.

Here is a C test case:

int yes()
{
  char a[3];
  __builtin_memmove (a, "yes", 3);
    return __builtin_memcmp(a,"yes",3);
}

This generates a memcmp with current trunk:

.LC0:
        .string "yes"
        .text
        .p2align 4,,15
        .globl  yes
        .type   yes, @function
yes:
.LFB0:
        .cfi_startproc
        subq    $24, %rsp
        .cfi_def_cfa_offset 32
        movl    $25977, %eax
        movl    $3, %edx
        leaq    13(%rsp), %rdi
        movl    $.LC0, %esi
        movw    %ax, 13(%rsp)
        movb    $115, 15(%rsp)
        call    memcmp
        addq    $24, %rsp
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
.LFE0:
        .size   yes, .-yes
        .ident  "GCC: (GNU) 8.0.0 20170730 (experimental)"
        .section        .note.GNU-stack,"",@progbits
Comment 9 Martin Sebor 2017-08-15 14:11:18 UTC
The optimization should be easily doable in tree-ssa-strlen.c.  What makes it less than straightforward is that gimple-fold.c folds constant size memcpy/memmove into a MEM_REF which the strlen pass doesn't know how to handle.  I think the best way is to defer the folding until after the strlen pass has run.  That will not only make the optimization easily implementable but also make it possible to detect past the end reads/writes in calls to the functions because only the strlen pass knows the sizes of the source sequences.  (Bug 81703 tracks another instance of missing strlen optimization due to early folding.  As an aside, deferring the folding is complementary to handling MEM_REF in the strlen pass.)
Comment 10 Thomas Koenig 2020-11-11 09:41:05 UTC
For the C test case, we now get

yes:
.LFB0:
        .cfi_startproc
        movl    $25977, %eax
        movb    $115, -1(%rsp)
        movw    %ax, -3(%rsp)
        movzbl  -2(%rsp), %eax
        subl    $101, %eax
        jne     .L1
        movzbl  -1(%rsp), %eax
        subl    $115, %eax
.L1:
        ret

So, optimized further, but not folded.

clang 7 folds this completely:

yes:                                    # @yes
        .cfi_startproc
# %bb.0:
        xorl    %eax, %eax
        retq
.Lfunc_end0: