High Celery CPU Usage in taiga-back-async

Hi Daniel,

Discourse is quite cool, I’m going to look into it for my own purposes.

Most of the RabbitMQ steps I followed are outlined under the link you provided, no new information there.

After a significant amount of troubleshooting, I think I’ve found the issue relating to high CPU usage.

It wasn’t the telemetry after all, as you’ve indicated you found improbable. The reason why commenting it out stopped the issue from occurring was because I neglected to put pass in the function and so it was failing to work in the first place. ah well.

As for the actual solution to the issue.

I started by logging all the task executions and monitoring with flower. I quickly found that none of the native tasks were to blame for the high CPU usage.

I used py-spy to record the time spent in each function and found close_open_fds() to be taking up 100% of the CPU time.

see:


[user@host taiga-docker]# py-spy top --pid <pid>                                                                              
                                                                                                                                                                                    
Collecting samples from '/opt/venv/bin/python /opt/venv/bin/celery -A taiga.celery worker -B --concurrency 4 -l DEBUG -E' (python v3.11.4)                                          
Total Samples 1900                                                                                                                                                                  
GIL: 60.00%, Active: 100.00%, Threads: 1                                                                                                                                            
                                                                                                                                                                                    
  %Own   %Total  OwnTime  TotalTime  Function (filename)                                                                                                                            
100.00% 100.00%   19.00s    19.00s   close_open_fds (billiard/compat.py)                                                                                                            
  0.00% 100.00%   0.000s    19.00s   start (celery/worker/worker.py)                                                                                                                
  0.00% 100.00%   0.000s    19.00s   run (celery/beat.py)                                                                                                                           
  0.00% 100.00%   0.000s    19.00s   main (click/core.py)                                                                                                                           
  0.00% 100.00%   0.000s    19.00s   _Popen (billiard/context.py)                                                                                                                   
  0.00% 100.00%   0.000s    19.00s   __init__ (billiard/popen_fork.py)                                                                                                              
  0.00% 100.00%   0.000s    19.00s   start (celery/bootsteps.py)                                                                                                                    
  0.00% 100.00%   0.000s    19.00s   _launch (billiard/popen_fork.py)                                                                                                               
  0.00% 100.00%   0.000s    19.00s   main (celery/bin/celery.py)                                                                                                                    
  0.00% 100.00%   0.000s    19.00s   main (celery/__main__.py)                                                                                                                      
  0.00% 100.00%   0.000s    19.00s   invoke (click/core.py)                                                                                                                         
  0.00% 100.00%   0.000s    19.00s   __call__ (click/core.py)                                                                                                                       
  0.00% 100.00%   0.000s    19.00s   caller (celery/bin/base.py)                                                                                                                    
  0.00% 100.00%   0.000s    19.00s   <module> (celery)                                                                                                                              
  0.00% 100.00%   0.000s    19.00s   start (billiard/process.py)                                                                                                                    
  0.00% 100.00%   0.000s    19.00s   worker (celery/bin/worker.py)                                                                                                                  
  0.00% 100.00%   0.000s    19.00s   _bootstrap (billiard/process.py)                                                                                                               
  0.00% 100.00%   0.000s    19.00s   new_func (click/decorators.py)  

That led me to this bug:

The temporary solution I adopted was to alter taiga-back/docker/async_entrypoint.sh as follows:

#!/usr/bin/env bash

# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
#
# Copyright (c) 2021-present Kaleidos Ventures SL

set -euo pipefail

# Give permission to taiga:taiga after mounting volumes
echo "Give permission to taiga:taiga"
chown -R taiga:taiga /taiga-back

# Start Celery processes
echo "Starting Celery..."
exec gosu taiga celery -A taiga.celery worker \
    --concurrency 4 \
    -l DEBUG \
    -E \
    "$@" &

exec gosu taiga celery -A taiga.celery beat -l DEBUG

certainly the -l DEBUG and -E commands aren’t necessary, just the separation of the beat and the workers.

I hope this is helpful for you and your team.