freeing memory in a shared library affects the performance of a program that uses it
Alberto Gcchelp
alberto.gcchelp@gmail.com
Tue May 25 08:29:22 GMT 2021
> If the time is where you indicated, while the slowdown is present, then
the likely cause is CPU cache misses. Those
> cache misses could be caused by reuse of the fragmented memory freed by
that line in the library (vs. if those many
> fragments were not freed, the subsequent allocations would take a
contiguous chunk of additional address space,
> which might be more cache friendly).
I believe that that diagnosis may explain what I'm observing.
I profiled several testcases and, as before, most of the runtime is spent
in the matrix-vector products.
If I call gmsh::finalize, matrix-vector products take up to 2 times longer
than if I don't. Other parts of the programs aren't significantly affected.
There are no allocations or deallocations in those matrix-vector products.
The instructions involved should be approximately those that I pasted
bellow. IDVecVec is a std::vector<std::vector<std::size_t>> containing the
indices of each cell's neighbour cells.
It's still surprising to me that freeing memory in a shared library, when
there is plenty of free RAM available (forgot to mention that my testcases
consume very little memory), affects the performance of a totally unrelated
code. Is there a remedy other than not calling gmsh::finalize?
The good thing is that I should be able to prepare a more or less reduced
testcase for the Gmsh devs to test.
Thanks so much for your help!
push r15
push r14
push r13
push r12
push rbp
push rbx
mov rbp, QWORD PTR [rdi]
test rbp, rbp
je .L23
mov rax, QWORD PTR [rsi+24]
mov r14, QWORD PTR [rdi+8]
mov r11, QWORD PTR [rax]
mov rax, QWORD PTR [rsi+8]
mov r15, QWORD PTR [rsi+32]
mov r13, QWORD PTR [rax+8]
mov rax, QWORD PTR [rsi]
mov r10, QWORD PTR VF::TMalla<2ul>::IDVecVec[rip]
mov r12, QWORD PTR [rax+8]
vmovsd xmm3, QWORD PTR .LC1[rip]
mov rbx, rsi
sal rbp, 3
xor r9d, r9d
vxorpd xmm4, xmm4, xmm4
.L16:
mov rax, QWORD PTR [r11+8+r9*2]
mov rcx, QWORD PTR [r11+r9*2]
mov rdx, QWORD PTR [r10]
lea rdi, [rax+rcx*8]
mov rsi, QWORD PTR [r10+8]
cmp rax, rdi
je .L18
cmp rdx, rsi
je .L18
mov r8, QWORD PTR [r15+8]
vmovsd xmm0, xmm4, xmm4
.L14:
mov rcx, QWORD PTR [rdx]
vmovsd xmm5, QWORD PTR [rax]
add rdx, 8
vfmadd231sd xmm0, xmm5, QWORD PTR [r8+rcx*8]
add rax, 8
cmp rsi, rdx
je .L13
cmp rdi, rax
jne .L14
.L13:
vmovsd xmm1, QWORD PTR [r12+r9]
vdivsd xmm2, xmm3, QWORD PTR [rbx+16]
vmulsd xmm1, xmm1, QWORD PTR [r13+0+r9]
add r10, 24
vfmadd132sd xmm1, xmm0, xmm2
vmovsd QWORD PTR [r14+r9], xmm1
add r9, 8
cmp rbp, r9
jne .L16
.L23:
pop rbx
pop rbp
pop r12
pop r13
pop r14
pop r15
ret
.L18:
vmovsd xmm0, xmm4, xmm4
jmp .L13
More information about the Gcc-help
mailing list