Problem with starting Bacula components if open file limit is large
Summary
Reproducibility | Platform | OS | OS Version | Product Version |
---|---|---|---|---|
always | AMD64 | Alpine | 3.18.3 | 13.0.2 |
Description
Hello Everybody,
I would like to report a strange issue with very long Bacula components start (couple of minutes). It happens if the limit of opened files is set in system to a high value. Normally I don't need to set it to high value, but Docker sets by default in container to 1073741816
and it means for Bacula a problem with starting.
Steps to Reproduce
Set opened files limit set to 1024
99011c89f7c2:/# ulimit -n 1024
99011c89f7c2:/# ulimit -n -H 1024
99011c89f7c2:/# ulimit -a
real-time non-blocking time (microseconds, -R) unlimited
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15436
max locked memory (kbytes, -l) 8192
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
99011c89f7c2:/# /etc/bacula/scripts/bacula-ctl-dir start
Director starts immediately and there is possible connecting to it by bconsole.
Set opened files limit set to 1073741816
99011c89f7c2:/# ulimit -n -H 1073741816
99011c89f7c2:/# ulimit -n 1073741816
99011c89f7c2:/# ulimit -a
real-time non-blocking time (microseconds, -R) unlimited
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15436
max locked memory (kbytes, -l) 8192
max memory size (kbytes, -m) unlimited
open files (-n) 1073741816
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
99011c89f7c2:/# /etc/bacula/scripts/bacula-ctl-dir start
Director starts couple of minutes and until this time there is not possible connect to it by bconsole. The Director process is displayed in the process list:
9672cbb6147b:/# ps auxfwww | grep bacula-dir
52 root 1:23 /usr/sbin/bacula-dir -u bacula -g bacula -v -c /etc/bacula/bacula-dir.conf
but it does not listen on 9101
:
9672cbb6147b:/# netstat -tlnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 1/nginx: master pro
tcp 0 0 0.0.0.0:9097 0.0.0.0:* LISTEN 1/nginx: master pro
tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN -
tcp 0 0 :::80 :::* LISTEN 1/nginx: master pro
tcp 0 0 ::1:5432 :::* LISTEN -
When I am trying to connect to it by strace
to see what happens, I am seeing millions of lines about bad file descriptor:
99011c89f7c2:/# pidof bacula-dir
52
99011c89f7c2:/# strace -p 52
close(799827234) = -1 EBADF (Bad file descriptor)
close(799827233) = -1 EBADF (Bad file descriptor)
close(799827232) = -1 EBADF (Bad file descriptor)
close(799827231) = -1 EBADF (Bad file descriptor)
close(799827230) = -1 EBADF (Bad file descriptor)
close(799827229) = -1 EBADF (Bad file descriptor)
close(799827228) = -1 EBADF (Bad file descriptor)
close(799827227) = -1 EBADF (Bad file descriptor)
close(799827226) = -1 EBADF (Bad file descriptor)
close(799827225) = -1 EBADF (Bad file descriptor)
close(799827224) = -1 EBADF (Bad file descriptor)
close(799827223) = -1 EBADF (Bad file descriptor)
close(799827222) = -1 EBADF (Bad file descriptor)
close(799827221) = -1 EBADF (Bad file descriptor)
close(799827220) = -1 EBADF (Bad file descriptor)
close(799827219) = -1 EBADF (Bad file descriptor)
close(799827218) = -1 EBADF (Bad file descriptor)
close(799827217) = -1 EBADF (Bad file descriptor)
close(799827216) = -1 EBADF (Bad file descriptor)
close(799827215) = -1 EBADF (Bad file descriptor)
close(799827214) = -1 EBADF (Bad file descriptor)
close(799827213) = -1 EBADF (Bad file descriptor)
close(799827212) = -1 EBADF (Bad file descriptor)
close(799827211) = -1 EBADF (Bad file descriptor)
close(799827210) = -1 EBADF (Bad file descriptor)
close(799827209) = -1 EBADF (Bad file descriptor)
close(799827208) = -1 EBADF (Bad file descriptor)
close(799827207) = -1 EBADF (Bad file descriptor)
close(799827206) = -1 EBADF (Bad file descriptor)
close(799827205) = -1 EBADF (Bad file descriptor)
close(799827204) = -1 EBADF (Bad file descriptor)
close(799827203) = -1 EBADF (Bad file descriptor)
close(799827202) = -1 EBADF (Bad file descriptor)
close(799827201) = -1 EBADF (Bad file descriptor)
close(799827200) = -1 EBADF (Bad file descriptor)
close(799827199) = -1 EBADF (Bad file descriptor)
close(799827198) = -1 EBADF (Bad file descriptor)
close(799827197) = -1 EBADF (Bad file descriptor)
close(799827196) = -1 EBADF (Bad file descriptor)
close(799827195) = -1 EBADF (Bad file descriptor)
It looks that these lines about bad file descriptor are displayed as long as the opened files limit is reached and then the Director starts.
The same happens for other Bacula components (FD, SD).
Thanks in advance for checking and fixing it.
Info for Docker users
There is possible to set max. number of opened files for dockerd
by adding to it in the Docker systemd
unit the following parameter (for soft/hard limit opened files 1024)
--default-ulimit nofile=1024:1024
In my environment it looks like this:
ExecStart=/usr/bin/dockerd --default-ulimit nofile=1024:1024 -H fd:// --containerd=/run/containerd/containerd.soc
and after the limit is set to 1024 and Bacula daemons start properly.
I am reporting this problem for Bacula 13.0.2
but I have been observing this problem for a long time with older Bacula versions too.
Best regards, Marcin Haba (gani)