$ cat foo.f90 program main character(len=3) :: a a = 'yes' print *,'yes' == a end program main $ gfortran -fdump-tree-optimized -O3 -S foo.f90 $ grep compare_string foo.s call _gfortran_compare_string
To clarify, we could detect that the variable only ever has the value of 'yes'. We already do the right thing with constants: $ cat foo-2.f90 program main character(len=3), parameter :: a = 'yes' print *,'yes' == a end program main $ gfortran -fdump-tree-original foo-2.f90 $ grep compare foo-2.f90.003t.original $
This is generally the case, not just for/with strings. I don't think that anything will ever happen in this regard, at least, not on the frontend side. If (some) intrinsics, including string comparisons, were inlined, the optimizers could probably do something about it.
Thomas, As a matter of curiosity, do other compilers catch this? Confirmed Paul
(In reply to comment #3) > As a matter of curiosity, do other compilers catch this? Intel does not.
> > As a matter of curiosity, do other compilers catch this? > Intel does not. Sure? If I look at the assembler of ifort 11.1 with -O3, I only see: call __intel_new_proc_init #1.9 call for_set_reentrancy #1.9 call for_write_seq_lis #4.3 And if I'm not mistaken, sunf95 also does not have any call anymore.
Sometime between 4.8.4 and 4.9.4, the _gfortran_compare_string has been replaced by a memcmp(). % gfc48 -O3 -fdump-tree-optimized -S -fdump-tree-original a.f90 % grep compare a.s call _gfortran_compare_string % gfc49 -O3 -fdump-tree-optimized -S -fdump-tree-original a.f90 % grep compare a.s With gcc7, the -fdump-tree-original gives __builtin_memmove ((void *) &a, (void *) &"yes"[1]{lb: 1 sz: 1}, 3); { struct __st_parameter_dt dt_parm.0; dt_parm.0.common.filename = &"a.f90"[1]{lb: 1 sz: 1}; dt_parm.0.common.line = 4; dt_parm.0.common.flags = 128; dt_parm.0.common.unit = 6; _gfortran_st_write (&dt_parm.0); { logical(kind=4) D.3408; D.3408 = __builtin_memcmp ((void *) &"yes"[1]{lb: 1 sz: 1}, (void *) &a, 3) == 0; _gfortran_transfer_logical_write (&dt_parm.0, &D.3408, 4); } _gfortran_st_write_done (&dt_parm.0); and -fdump-tree-optimized shows <bb 2>: MEM[(c_char * {ref-all})&a] = "yes"; dt_parm.0.common.filename = &"a.f90"[1]{lb: 1 sz: 1}; dt_parm.0.common.line = 4; dt_parm.0.common.flags = 128; dt_parm.0.common.unit = 6; _gfortran_st_write (&dt_parm.0); _1 = __builtin_memcmp_eq (&"yes"[1]{lb: 1 sz: 1}, &a, 3); _2 = _1 == 0; D.3408 = _2; _gfortran_transfer_logical_write (&dt_parm.0, &D.3408, 4); D.3408 ={v} {CLOBBER}; _gfortran_st_write_done (&dt_parm.0); dt_parm.0 ={v} {CLOBBER}; a ={v} {CLOBBER}; return; I personally think this is non-issue now and the bug can be closed.
We still do the comparison, although with memcmp now. More interesting question is if we could/should do forward propagation of values in the front end, or if this is something that the middle-end should, in principle, do.
I think the best place to fix this is in the middle-end. Here is a C test case: int yes() { char a[3]; __builtin_memmove (a, "yes", 3); return __builtin_memcmp(a,"yes",3); } This generates a memcmp with current trunk: .LC0: .string "yes" .text .p2align 4,,15 .globl yes .type yes, @function yes: .LFB0: .cfi_startproc subq $24, %rsp .cfi_def_cfa_offset 32 movl $25977, %eax movl $3, %edx leaq 13(%rsp), %rdi movl $.LC0, %esi movw %ax, 13(%rsp) movb $115, 15(%rsp) call memcmp addq $24, %rsp .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE0: .size yes, .-yes .ident "GCC: (GNU) 8.0.0 20170730 (experimental)" .section .note.GNU-stack,"",@progbits
The optimization should be easily doable in tree-ssa-strlen.c. What makes it less than straightforward is that gimple-fold.c folds constant size memcpy/memmove into a MEM_REF which the strlen pass doesn't know how to handle. I think the best way is to defer the folding until after the strlen pass has run. That will not only make the optimization easily implementable but also make it possible to detect past the end reads/writes in calls to the functions because only the strlen pass knows the sizes of the source sequences. (Bug 81703 tracks another instance of missing strlen optimization due to early folding. As an aside, deferring the folding is complementary to handling MEM_REF in the strlen pass.)
For the C test case, we now get yes: .LFB0: .cfi_startproc movl $25977, %eax movb $115, -1(%rsp) movw %ax, -3(%rsp) movzbl -2(%rsp), %eax subl $101, %eax jne .L1 movzbl -1(%rsp), %eax subl $115, %eax .L1: ret So, optimized further, but not folded. clang 7 folds this completely: yes: # @yes .cfi_startproc # %bb.0: xorl %eax, %eax retq .Lfunc_end0: