Dark's Code Dump

Possibly useful

Debian 10 kernel slab memory leak

I’ve faced a situation on one of my VPSes where kernel slab memory spontaneously started leaking:

https://imgkk.com/i/q3v5.png

In slabtop, I found the culprit to be kmalloc-64. This is fairly meaningless, but after some searching I found you can add slub_debug=U to the kernel command line. Then, you can see the source of slab allocs of this type by viewing /sys/kernel/slab/kmalloc-64/alloc_calls.

This pointed me to an issue with KVM paravirtualisation of page faults:

[20:48:53][root@kyubey][/sys/kernel/slab/kmalloc-64]# cat alloc_calls
     27 x86_vector_alloc_irqs+0xf6/0x3b0 age=1061960/1062568/1063054 pid=0-92
     15 mp_irqdomain_alloc+0x79/0x290 age=1063054/1063054/1063054 pid=0
 138245 kvm_async_pf_task_wake+0x83/0x110 age=0/474464/1062430 pid=0-1517
     31 reserve_memtype+0xb3/0x2c0 age=1060763/1062079/1063055 pid=0-273
     24 __request_region+0x6e/0x190 age=1060825/1062237/1063050 pid=1-282
...

My host has confirmed it is unlikely to be on their end, so I’m stumped as to where this came from out of the blue.

Anyway, a valid workaround is to add no-kvmapf to the kernel command line.

Comments

Daniel L says:

I ended up filing a kernel bug for this to see if the KVM team can look into it: https://bugzilla.kernel.org/show_bug.cgi?id=208081. If you have any other comments to add to the bug report, feel free to comment on it 🙂

Thank you for this post! I just encountered a very similar issue on one of my VPSes, and also found `kvm_async_pf_task_wake` to be the cause.

Is there any downside to adding `no-kvmapf` to the kernel command line?

Dark says:

Had it in place since this post with no obvious ill effect

I imagine the only impact is marginally higher CPU usage, which is the host’s problem 🙂

Daniel L says:

Thanks! I think I’ll try it out.

My situation is really strange… I have two VPSes configured identically (same Debian version, same kernel version, all the same software), and only one of them is exhibiting this behaviour. I contacted the host and they said there’s no difference between the nodes – they were physically built at the same time and use all the same software versions. I also posted on ServerFault earlier today (https://serverfault.com/questions/1020241/debugging-kmalloc-64-slab-allocations-memory-leak ) before finding your post.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.