freeing memory in a shared library affects the performance of a program that uses it
Tue May 25 08:29:22 GMT 2021
> If the time is where you indicated, while the slowdown is present, then
the likely cause is CPU cache misses. Those
> cache misses could be caused by reuse of the fragmented memory freed by
that line in the library (vs. if those many
> fragments were not freed, the subsequent allocations would take a
contiguous chunk of additional address space,
> which might be more cache friendly).
I believe that that diagnosis may explain what I'm observing.
I profiled several testcases and, as before, most of the runtime is spent
in the matrix-vector products.
If I call gmsh::finalize, matrix-vector products take up to 2 times longer
than if I don't. Other parts of the programs aren't significantly affected.
There are no allocations or deallocations in those matrix-vector products.
The instructions involved should be approximately those that I pasted
bellow. IDVecVec is a std::vector<std::vector<std::size_t>> containing the
indices of each cell's neighbour cells.
It's still surprising to me that freeing memory in a shared library, when
there is plenty of free RAM available (forgot to mention that my testcases
consume very little memory), affects the performance of a totally unrelated
code. Is there a remedy other than not calling gmsh::finalize?
The good thing is that I should be able to prepare a more or less reduced
testcase for the Gmsh devs to test.
Thanks so much for your help!
mov rbp, QWORD PTR [rdi]
test rbp, rbp
mov rax, QWORD PTR [rsi+24]
mov r14, QWORD PTR [rdi+8]
mov r11, QWORD PTR [rax]
mov rax, QWORD PTR [rsi+8]
mov r15, QWORD PTR [rsi+32]
mov r13, QWORD PTR [rax+8]
mov rax, QWORD PTR [rsi]
mov r10, QWORD PTR VF::TMalla<2ul>::IDVecVec[rip]
mov r12, QWORD PTR [rax+8]
vmovsd xmm3, QWORD PTR .LC1[rip]
mov rbx, rsi
sal rbp, 3
xor r9d, r9d
vxorpd xmm4, xmm4, xmm4
mov rax, QWORD PTR [r11+8+r9*2]
mov rcx, QWORD PTR [r11+r9*2]
mov rdx, QWORD PTR [r10]
lea rdi, [rax+rcx*8]
mov rsi, QWORD PTR [r10+8]
cmp rax, rdi
cmp rdx, rsi
mov r8, QWORD PTR [r15+8]
vmovsd xmm0, xmm4, xmm4
mov rcx, QWORD PTR [rdx]
vmovsd xmm5, QWORD PTR [rax]
add rdx, 8
vfmadd231sd xmm0, xmm5, QWORD PTR [r8+rcx*8]
add rax, 8
cmp rsi, rdx
cmp rdi, rax
vmovsd xmm1, QWORD PTR [r12+r9]
vdivsd xmm2, xmm3, QWORD PTR [rbx+16]
vmulsd xmm1, xmm1, QWORD PTR [r13+0+r9]
add r10, 24
vfmadd132sd xmm1, xmm0, xmm2
vmovsd QWORD PTR [r14+r9], xmm1
add r9, 8
cmp rbp, r9
vmovsd xmm0, xmm4, xmm4
More information about the Gcc-help