[Bug target/80479] [7/8 Regression] strcmp() produces valgrind errors on ppc64le

Fri Apr 21 18:10:00 GMT 2017

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80479

--- Comment #10 from acsawdey at gcc dot gnu.org ---
OK, so I'm the culprit who added the strncmp/strcmp inline expansion.

If both strings have alignment > 8 we cannot inadvertently cross a page
boundary doing 8B loads. For any argument that has smaller alignment, it emits
a runtime check to see if the inline code would cross a 4k boundary. If so, the
library function call is used instead of the inline code. The testcase
gcc.dg/strncmp-2.c tests that we don't step over this by allocating 2 pages and
using mprotect PROT_NONE on the second, then trying to provoke things by
putting strings right up to the boundary. The code generation for
strncmp/strcmp is done by the same code in rs6000.c so testing strncmp for this
mostly also tests whether strcmp has any issues.

The generated comparison code, while it does use 8B loads, also makes use of
cmpb to make sure that data beyond the 0 byte is not significant in the result.

Startup: load two doublewords, are they equal?

        ldbrx 9,28,10
        ldbrx 10,30,10
        subf. 3,10,9
        beq 0,.L23

If they are, go to this piece that looks to see if there was a zero byte:

.L23:
        cmpb 10,9,3
        cmpdi 7,10,0
        beq 7,.L22

If we don't branch, the strings are equal, result of zero is in r3 and we are
done.

If we didn't branch to .L23 above, we fall through to this piece that computes
the final result by finding the correct differing byte and subtracting:

.L11:
        cmpb 3,9,10
        cmpb 8,9,26
        addi 31,31,1
        orc 3,8,3
        cntlzd 3,3
        addi 3,3,8
        rldcl 9,9,3,56
        rldcl 3,10,3,56
        subf 3,3,9
        extsw 9,3

If we did go to .L22 then we have a repeating sequence like this to load and
compare 8B at a time:

.L22:
        addi 9,8,8
        addi 10,4,8
        ldbrx 9,0,9
        ldbrx 10,0,10
        subf. 3,10,9
        bne 0,.L11
        cmpb 10,9,3
        cmpdi 7,10,0
        bne 7,.L10

Here we either go to the L11 piece to extract the differing bytes and subtract,
or we found a zero byte and strings are equal (r3=0) and bail out to L10.

At the end of our 64 bytes of inline comparison we bail out to strcmp:

        addi 4,4,64
        addi 3,8,64
        bl strcmp

So, yes we do read 8B at a time, but the code makes use of cmpb so that the
bytes following the zero byte are never significant to the comparison.

On the other hand I've already had to fix this a couple times so it is
certainly possible that errors remain -- please do let me know if you see
something.