I use Mailu as a simple mail server to manage the mails for one of my domain. This application uses ClamAv as an antivirus solution. To host this application I use VPS with only 4 GB RAM. Few weeks ago my Zabbix started sending notifications once a day that I have OOM on the mail server. It was strange as the problem appeared only once per day in the +/- same time. I tried to investigate the problem and found that the reason was in ClamAV…
As usual I started form the logs and found the next information in “/var/log/message” log
$ cat /var/log/messages
Apr 13 09:06:15 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=docker-64230e9dc02047fce79aa1756597ac1ad35f32e06757b458c832e0eb1e56824a.scope,mems_allowed=0,global_oom,task_memcg=/system.slice/docker-7ecfd2105c98da8c0d5e77367741505a9ef5d08d86e8654184aa367ec57bc99b.scope,task=clamd,pid=2758294,uid=0
Apr 13 09:06:15 kernel: Out of memory: Killed process 2758294 (clamd) total-vm:2868800kB, anon-rss:2749056kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:5476kB oom_score_adj:0
Apr 13 09:06:15 systemd[1]: docker-7ecfd2105c98da8c0d5e77367741505a9ef5d08d86e8654184aa367ec57bc99b.scope: A process of this unit has been killed by the OOM killer.
I compared the time from these logs with the time of OOM notifications with Zabbix. I found that I received notifications each time when I have same problem in log on server
$ cat messages | grep " A process of this unit has been killed"
Apr 7 08:42:26 systemd[1]: docker-7ecfd2105c98da8c0d5e77367741505a9ef5d08d86e8654184aa367ec57bc99b.scope: A process of this unit has been killed by the OOM killer.
Apr 8 08:47:17 systemd[1]: docker-7ecfd2105c98da8c0d5e77367741505a9ef5d08d86e8654184aa367ec57bc99b.scope: A process of this unit has been killed by the OOM killer.
Apr 9 08:50:19 systemd[1]: docker-7ecfd2105c98da8c0d5e77367741505a9ef5d08d86e8654184aa367ec57bc99b.scope: A process of this unit has been killed by the OOM killer.
Apr 10 08:56:21 systemd[1]: docker-7ecfd2105c98da8c0d5e77367741505a9ef5d08d86e8654184aa367ec57bc99b.scope: A process of this unit has been killed by the OOM killer.
Apr 11 08:59:05 systemd[1]: docker-7ecfd2105c98da8c0d5e77367741505a9ef5d08d86e8654184aa367ec57bc99b.scope: A process of this unit has been killed by the OOM killer.
Apr 12 09:02:30 systemd[1]: docker-7ecfd2105c98da8c0d5e77367741505a9ef5d08d86e8654184aa367ec57bc99b.scope: A process of this unit has been killed by the OOM killer.
Apr 13 09:06:15 systemd[1]: docker-7ecfd2105c98da8c0d5e77367741505a9ef5d08d86e8654184aa367ec57bc99b.scope: A process of this unit has been killed by the OOM killer.
Next, I created simple logger to collect the stats of the docker containers to understand was clamd process the root cause of OOM or some other process consumed the memory, and clamd was killed as consumed more RAM than others.
$ sudo crontab -e
* * * * * echo date >> /root/docker.stat && docker stats --no-stream >> /root/docker.stat
* * * * * sleep 15 && echo date >> /root/docker.stat && docker stats --no-stream >> /root/docker.stat
* * * * * sleep 30 && echo date >> /root/docker.stat && docker stats --no-stream >> /root/docker.stat
* * * * * sleep 45 && echo date >> /root/docker.stat && docker stats --no-stream >> /root/docker.stat
Next, time when I observed the problem I checked the log and found that antivirus container consumed all the memory, so it confirmed that the main consumer is ClamAV process
$ sudo cat /root/docker.stat
...
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM %
09:05:32 7ecfd2105c98 mailu-antivirus-1 2.97% 1.409GiB / 3.726GiB 37.81%
09:05:47 7ecfd2105c98 mailu-antivirus-1 0.00% 1.409GiB / 3.726GiB 37.81%
09:06:02 7ecfd2105c98 mailu-antivirus-1 3.10% 1.409GiB / 3.726GiB 37.81%
09:06:47 7ecfd2105c98 mailu-antivirus-1 96.25% 2.143GiB / 3.726GiB 57.53%
09:07:01 7ecfd2105c98 mailu-antivirus-1 98.21% 2.493GiB / 3.726GiB 66.92%
09:07:16 7ecfd2105c98 mailu-antivirus-1 98.81% 131.7MiB / 3.726GiB 3.45%
09:07:31 7ecfd2105c98 mailu-antivirus-1 98.81% 131.7MiB / 3.726GiB 3.45%
Here we can observe that normal memory consumption for mailu-antivirus-1 container was around 37%, but in some moment it grows to 57% and then to 67% and then drops to 3%.
Other containers consumption in the same time
Sun Apr 14 09:07:01 AM UTC 2024
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
accebd376afe mailu-webmail-1 0.01% 7.945MiB / 3.726GiB 0.21% 192kB / 0B 42.5GB / 1.73MB 4
b279ebca86ca mailu-imap-1 0.00% 24.39MiB / 3.726GiB 0.64% 21.6MB / 26.4MB 29.7GB / 17.3GB 8
817ff4c37dd0 mailu-smtp-1 0.17% 40.53MiB / 3.726GiB 1.06% 719MB / 968MB 326GB / 14.3MB 10
9cda4709ab85 mailu-antispam-1 0.37% 151.7MiB / 3.726GiB 3.98% 10.5GB / 3.84GB 373GB / 13GB 5
64230e9dc020 mailu-admin-1 0.31% 135.2MiB / 3.726GiB 3.54% 2.87GB / 3.57GB 431GB / 90.2MB 5
a5b6ba3103cf mailu-front-1 0.02% 34.41MiB / 3.726GiB 0.90% 7.7GB / 10.8GB 149GB / 7.81MB 4
965b16c56bc6 mailu-redis-1 0.42% 8.164MiB / 3.726GiB 0.21% 4.9GB / 10GB 422GB / 9.49GB 5
>>>>>>>>>>>> mailu-antivirus-1 98.21% 2.493GiB / 3.726GiB 66.92% 31.1kB / 3.86kB 1.06GB / 403MB 5
b9e40d970cc0 mailu-fetchmail-1 0.00% 14.42MiB / 3.726GiB 0.38% 30.7MB / 37.4MB 20.9GB / 28.7kB 1
7328c52c73f5 mailu-resolver-1 0.00% 25.75MiB / 3.726GiB 0.67% 1.35GB / 1.29GB 10.6GB / 3.2MB 1
96d493d224c4 portainer 0.00% 17.49MiB / 3.726GiB 0.46% 0B / 0B 370GB / 4.8GB 6
I checked configuration of ClamAV inside the antivirus container
$ docker exec -ti mailu-antivirus-1 /bin/sh
$ ls /etc/clamav/
clamd.conf freshclam.conf
I checked freshclam.conf
$ cat /etc/clamav/freshclam.conf
###############
# General
###############
DatabaseDirectory /data
UpdateLogFile /dev/stdout
LogTime yes
PidFile /run/freshclam.pid
DatabaseOwner root
###############
# Updates
###############
DatabaseMirror database.clamav.net
ScriptedUpdates yes
NotifyClamd /etc/clamav/clamd.conf
Bytecode yes
I checked clamd.conf
$ cat /etc/clamav/clamd.conf
###############
# General
###############
DatabaseDirectory /data
TemporaryDirectory /tmp
LogTime yes
PidFile /run/clamd.pid
LocalSocket /tmp/clamd.sock
TCPSocket 3310
Foreground yes
###############
# Results
###############
DetectPUA yes
ExcludePUA NetTool
ExcludePUA PWTool
HeuristicAlerts yes
Bytecode yes
###############
# Scan
###############
ScanPE yes
DisableCertCheck yes
ScanELF yes
AlertBrokenExecutables yes
ScanOLE2 yes
ScanPDF yes
ScanSWF yes
ScanMail yes
PhishingSignatures yes
PhishingScanURLs yes
ScanHTML yes
ScanArchive yes
###############
# Scan
###############
MaxScanSize 150M
MaxFileSize 30M
MaxRecursion 10
MaxFiles 15000
MaxEmbeddedPE 10M
MaxHTMLNormalize 10M
MaxHTMLNoTags 2M
MaxScriptNormalize 5M
MaxZipTypeRcg 1M
MaxPartitions 128
MaxIconsPE 200
PCREMatchLimit 10000
PCRERecMatchLimit 10000
I didn’t find any options that can lead to such behavior.
Then I found on the internet that in version 0.103.0 ClamAV added “Non-blocking database reloads” which can increase memory usage up to x2 during the virus databases update
– https://blog.clamav.net/2020/09/clamav-01030-released.html
Major changes
- clamd can now reload the signature database without blocking scanning. This multi-threaded database reload improvement was made possible thanks to a community effort.
This looked very similar to my case as I didn’t have this option in the configuration which meant it was enabled by default.
As ClamAV release notes said it can be disabled by adding next option to clamd config
$ sudo nano /etc/clamav/clamd.conf
ConcurrentDatabaseReload no
then you need to restart clamav
$ sudo systemctl restart clam
But I decided to keep previous configuration and increase swap file instead (as it was set to 0), so some other process can avoid problem if they need a bit more memory for short period.