Hi Daniel!
Thanks for your thorough response.
Your reproduction steps and assumptions about my situation are almost entirely correct, let me provide some more information.
I measured the CPU usage with HTOP, so it was 100% as a single core, not as the entire conceptual CPU. I measured it in various other ways as you have indicated above, and the usage does not present differently depending on the method that I use to measure CPU Usage. Consistently, it was an entire core. I also specifically used docker top for taiga-docker-taiga-async-1 and it was showing the same numbers.
here’s the output of docker top taiga-docker-taiga-async-1
UID PID PPID C STIME TTY TIME CMD
999 2919184 2919150 15 08:50 ? 00:00:03 /opt/venv/bin/python /opt/venv/bin/celery -A taiga.celery worker -B --concurrency 4 -l INFO
999 2919902 2919184 0 08:50 ? 00:00:00 /opt/venv/bin/python /opt/venv/bin/celery -A taiga.celery worker -B --concurrency 4 -l INFO
999 2919914 2919184 0 08:50 ? 00:00:00 /opt/venv/bin/python /opt/venv/bin/celery -A taiga.celery worker -B --concurrency 4 -l INFO
999 2919952 2919184 0 08:50 ? 00:00:00 /opt/venv/bin/python /opt/venv/bin/celery -A taiga.celery worker -B --concurrency 4 -l INFO
999 2919953 2919184 0 08:50 ? 00:00:00 /opt/venv/bin/python /opt/venv/bin/celery -A taiga.celery worker -B --concurrency 4 -l INFO
999 2919954 2919184 99 08:50 ? 00:00:22 /opt/venv/bin/python /opt/venv/bin/celery -A taiga.celery worker -B --concurrency 4 -l INFO
You can see the unfortunate 99
on the last service.
As for what I commented out, it was the contents of the taiga-back/taiga/telemetry/tasks.py:send_telemetry()
task function. I had ENABLE_TELEMETRY set to False, and that did not fix the issue. the following command shows “False” to indicate that it was correctly passed over into the docker environment
host@host# docker compose exec taiga-async /bin/bash
root@container:/taiga-back# echo $ENABLE_TELEMETRY
False
host@host# docker compose exec taiga-back /bin/bash
root@container:/taiga-back# echo $ENABLE_TELEMETRY
False
You are correct that RabbitMQ is not causing the issue. In an oversight, I followed various troubleshooting steps to get RabbitMQ’s CPU load down, and it went down - but obviously it did not fix the issue as it was in the taiga-async container.
Here’s the output of docker top taiga-docker-taiga-async-rabbitmq-1
UID PID PPID C STIME TTY TIME CMD
100 2919177 2919098 0 08:50 ? 00:00:00 /bin/sh /opt/rabbitmq/sbin/rabbitmq-server
100 2919523 2919177 0 08:50 ? 00:00:00 /usr/local/lib/erlang/erts-12.3.2.1/bin/epmd -daemon
100 2920508 2919177 2 08:50 ? 00:00:07 /usr/local/lib/erlang/erts-12.3.2.1/bin/beam.smp -W w -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 512 -MMmcs 30 -P 1048576 -t 5000000 -stbt db -zdbbl 128000 -sbwt none -sbwtdcpu none -sbwtdio none -B i -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa -noshell -noinput -s rabbit boot -boot start_sasl -lager crash_log false -lager handlers []
100 2920534 2920508 0 08:50 ? 00:00:00 erl_child_setup 1073741816
100 2920836 2920534 0 08:50 ? 00:00:00 inet_gethost 4
100 2920837 2920836 0 08:50 ? 00:00:00 inet_gethost 4
And the same for docker top taiga-docker-taiga-events-rabbitmq-1
UID PID PPID C STIME TTY TIME CMD
100 2918554 2918482 0 08:50 ? 00:00:00 /bin/sh /opt/rabbitmq/sbin/rabbitmq-server
100 2919456 2918554 0 08:50 ? 00:00:00 /usr/local/lib/erlang/erts-12.3.2.1/bin/epmd -daemon
100 2920405 2918554 1 08:50 ? 00:00:07 /usr/local/lib/erlang/erts-12.3.2.1/bin/beam.smp -W w -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 512 -MMmcs 30 -P 1048576 -t 5000000 -stbt db -zdbbl 128000 -sbwt none -sbwtdcpu none -sbwtdio none -B i -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa -noshell -noinput -s rabbit boot -boot start_sasl -lager crash_log false -lager handlers []
100 2920455 2920405 0 08:50 ? 00:00:00 erl_child_setup 1073741816
100 2920732 2920455 0 08:50 ? 00:00:00 inet_gethost 4
100 2920733 2920732 0 08:50 ? 00:00:00 inet_gethost 4
I am unable to troubleshoot for a while, however, after some digging, I do suspect that the issue arose when I changed the DNS settings of my host machine which then got copied into the container on execution. A lot of the services that I am hosting were challenged by the DNS change, and several of them had telemetry as the issue, as it’s one of the only outwards directed functionalities in many of my services.
brief troubleshooting when I had the issue showed me that just adding the google and cloudflare dns as follows:
dns:
- 8.8.8.8
- 114.114.114.114
to the docker-compose.yml file under taiga-async did not fix the high cpu issue, so perhaps I needed to add an ipv6 resolution, or maybe it’s stuck some other way. If it was an internet issue, it may be reproducible through setting an iptables rule against the container ip. I’ll have to check that at another time.
Certainly the issue is still present, only instead of commenting out the code I just end the task when I start the compose project. Perhaps I will create a source volume mount and play around with bad internet. It’s always fun to dive into new codebases.
on a side note, what software are you using for this community forum? I quite like it!