Hi Daniel,
Discourse is quite cool, I’m going to look into it for my own purposes.
Most of the RabbitMQ steps I followed are outlined under the link you provided, no new information there.
After a significant amount of troubleshooting, I think I’ve found the issue relating to high CPU usage.
It wasn’t the telemetry after all, as you’ve indicated you found improbable. The reason why commenting it out stopped the issue from occurring was because I neglected to put pass in the function and so it was failing to work in the first place. ah well.
As for the actual solution to the issue.
I started by logging all the task executions and monitoring with flower. I quickly found that none of the native tasks were to blame for the high CPU usage.
I used py-spy to record the time spent in each function and found close_open_fds() to be taking up 100% of the CPU time.
see:
[user@host taiga-docker]# py-spy top --pid <pid>
Collecting samples from '/opt/venv/bin/python /opt/venv/bin/celery -A taiga.celery worker -B --concurrency 4 -l DEBUG -E' (python v3.11.4)
Total Samples 1900
GIL: 60.00%, Active: 100.00%, Threads: 1
%Own %Total OwnTime TotalTime Function (filename)
100.00% 100.00% 19.00s 19.00s close_open_fds (billiard/compat.py)
0.00% 100.00% 0.000s 19.00s start (celery/worker/worker.py)
0.00% 100.00% 0.000s 19.00s run (celery/beat.py)
0.00% 100.00% 0.000s 19.00s main (click/core.py)
0.00% 100.00% 0.000s 19.00s _Popen (billiard/context.py)
0.00% 100.00% 0.000s 19.00s __init__ (billiard/popen_fork.py)
0.00% 100.00% 0.000s 19.00s start (celery/bootsteps.py)
0.00% 100.00% 0.000s 19.00s _launch (billiard/popen_fork.py)
0.00% 100.00% 0.000s 19.00s main (celery/bin/celery.py)
0.00% 100.00% 0.000s 19.00s main (celery/__main__.py)
0.00% 100.00% 0.000s 19.00s invoke (click/core.py)
0.00% 100.00% 0.000s 19.00s __call__ (click/core.py)
0.00% 100.00% 0.000s 19.00s caller (celery/bin/base.py)
0.00% 100.00% 0.000s 19.00s <module> (celery)
0.00% 100.00% 0.000s 19.00s start (billiard/process.py)
0.00% 100.00% 0.000s 19.00s worker (celery/bin/worker.py)
0.00% 100.00% 0.000s 19.00s _bootstrap (billiard/process.py)
0.00% 100.00% 0.000s 19.00s new_func (click/decorators.py)
That led me to this bug:
The temporary solution I adopted was to alter taiga-back/docker/async_entrypoint.sh as follows:
#!/usr/bin/env bash
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
#
# Copyright (c) 2021-present Kaleidos Ventures SL
set -euo pipefail
# Give permission to taiga:taiga after mounting volumes
echo "Give permission to taiga:taiga"
chown -R taiga:taiga /taiga-back
# Start Celery processes
echo "Starting Celery..."
exec gosu taiga celery -A taiga.celery worker \
--concurrency 4 \
-l DEBUG \
-E \
"$@" &
exec gosu taiga celery -A taiga.celery beat -l DEBUG
certainly the -l DEBUG and -E commands aren’t necessary, just the separation of the beat and the workers.
I hope this is helpful for you and your team.