Unexpected CPU usage by RabbitMQ (beam.smp) under no load

A Taiga.io installation running in a KVM virtual machine with Ubuntu 22.04 server using Docker shows abnormal CPU usage when no one is using Taiga. Two processes related to rabbitmq, beam.smp, are observed consuming a significant amount of CPU even when there is no load.


    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                             
 291599 root      20   0 1156220  79852  19164 S  21,6   2,0   0:00.65 /usr/local/lib/erlang/erts-12.3.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -bo+ 
 291609 root      20   0 1156188  78980  19068 S  21,3   2,0   0:00.64 /usr/local/lib/erlang/erts-12.3.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -bo+ 

Perhaps related to: erlang - RabbitMQ (beam.smp) and high CPU/memory load issue - Stack Overflow

Reading the link you provided, how big, which and how many files are in /var/lib/rabbitmq/mnesia/rabbit/ ?
You can get it with the next command: du -sh /var/lib/rabbitmq/mnesia/rabbit/*

How many connections do you have when idle? You can get it with the next command: rabbitmqctl list_connections

How is the status of the queues? You can get it with the next command: rabbitmqctl list_queues

I suppose there is no problem with resources like storage and memory, could you double check it?

Hello Pablo.

About the files we have:

bash-5.1# du -hs /var/lib/rabbitmq/mnesia/rabbit\@taiga-events-rabbitmq/*
4.0K	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/DECISION_TAB.LOG
4.0K	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/LATEST.LOG
4.0K	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/cluster_nodes.config
36.0K	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/msg_stores
4.0K	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/nodes_running_at_shutdown
386.7M	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/quorum
4.0K	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/rabbit_durable_exchange.DCD
4.0K	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/rabbit_durable_queue.DCD
4.0K	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/rabbit_durable_route.DCD
4.0K	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/rabbit_runtime_parameters.DCD
4.0K	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/rabbit_serial
4.0K	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/rabbit_topic_permission.DCD
4.0K	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/rabbit_user.DCD
4.0K	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/rabbit_user_permission.DCD
4.0K	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/rabbit_vhost.DCD
40.0K	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/schema.DAT
4.0K	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/schema_version
bash-5.1# du -hs /var/lib/rabbitmq/mnesia/rabbit\@taiga-events-rabbitmq/quorum/rabbit\@taiga-events-rabbitmq/*
386.7M	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/quorum/rabbit@taiga-events-rabbitmq/00000016.wal
8.0K	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/quorum/rabbit@taiga-events-rabbitmq/meta.dets
8.0K	/var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/quorum/rabbit@taiga-events-rabbitmq/names.dets
bash-5.1# 

I cannot obtain either the connections or the queues.

bash-5.1# rabbitmqctl --erlang-cookie "secret-erlang-cookie" list_connections
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Listing connections ...
bash-5.1# rabbitmqctl --erlang-cookie "secret-erlang-cookie" list_queues
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Timeout: 60.0 seconds ...
Listing queues for vhost / ...
bash-5.1#
bash-5.1# rabbitmq-diagnostics status --erlang-cookie "secret-erlang-cookie"
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Status of node rabbit@taiga-events-rabbitmq ...
Runtime

OS PID: 322
OS: Linux
Uptime (seconds): 419985
Is under maintenance?: false
RabbitMQ version: 3.8.34
Node name: rabbit@taiga-events-rabbitmq
Erlang configuration: Erlang/OTP 24 [erts-12.3.2.1] [source] [64-bit] [smp:1:1] [ds:1:1:10] [async-threads:1] [jit:no-native-stack]
Crypto library: OpenSSL 1.1.1o  3 May 2022
Erlang processes: 395 used, 1048576 limit
Scheduler run queue: 1
Cluster heartbeat timeout (net_ticktime): 60

Plugins

Enabled plugin file: /etc/rabbitmq/enabled_plugins
Enabled plugins:

 * rabbitmq_prometheus
 * prometheus
 * rabbitmq_management
 * amqp_client
 * rabbitmq_web_dispatch
 * cowboy
 * cowlib
 * rabbitmq_management_agent

Data directory

Node data directory: /var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq
Raft data directory: /var/lib/rabbitmq/mnesia/rabbit@taiga-events-rabbitmq/quorum/rabbit@taiga-events-rabbitmq

Config files

 * /etc/rabbitmq/rabbitmq.conf

Log file(s)

 * <stdout>

Alarms

(none)

Memory

Total memory used: 0.1607 gb
Calculation strategy: rss
Memory high watermark setting: 0.4 of available memory, computed to: 1.6459 gb

reserved_unallocated: 0.0751 gb (46.7 %)
code: 0.0397 gb (24.73 %)
other_proc: 0.0373 gb (23.22 %)
other_system: 0.0242 gb (15.04 %)
other_ets: 0.0033 gb (2.04 %)
atom: 0.0015 gb (0.93 %)
plugins: 0.0015 gb (0.93 %)
binary: 4.0e-4 gb (0.24 %)
mgmt_db: 2.0e-4 gb (0.14 %)
mnesia: 1.0e-4 gb (0.06 %)
metrics: 1.0e-4 gb (0.04 %)
msg_index: 0.0 gb (0.02 %)
quorum_ets: 0.0 gb (0.01 %)
connection_other: 0.0 gb (0.0 %)
allocated_unused: 0.0 gb (0.0 %)
connection_channels: 0.0 gb (0.0 %)
connection_readers: 0.0 gb (0.0 %)
connection_writers: 0.0 gb (0.0 %)
queue_procs: 0.0 gb (0.0 %)
queue_slave_procs: 0.0 gb (0.0 %)
quorum_queue_procs: 0.0 gb (0.0 %)

File Descriptors

Total: 2, limit: 1048479
Sockets: 0, limit: 943629

Free Disk Space

Low free disk space watermark: 0.05 gb
Free disk space: 38.3276 gb

Totals

Connection count: 0
Queue count: 0
Virtual host count: 1

Listeners

Interface: [::], port: 15672, protocol: http, purpose: HTTP API
Interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
bash-5.1#

And yes, the system has free memory and disk space.

Thank you for your attention :slight_smile:

1 Like

Since I do not have too much experience with Taiga administration and your environment seems to be a production environment, let’s wait for Taiga guys to review the information and check if the proposed solution of your link could fix your problem or not.

Also, at this moment, I see the same behavior in my Docker-based installation on Hetzner VM. So I’m very interested in any solution of such kind of situation.

Hi @vcarceler,

There are two services in docker related to the RabbitMQ messaging:

  • taiga-async-rabbitmq to manage the asynchronous tasks in Taiga, like the email delivery or the importing project background processes.
  • taiga-events-rabbitmq to manage the asynchronous user notifications, like the mentions in the comments or the description.

We have monitor the CPU usage for the two rabbimq services and by no means should be a constant value of 21,6%. Its usual value should be around 0.7-1% for both, and, ocasionally, having some punctual peaks of 20% (and just a matter of a second).

The first thing we would suggest is to verify the two involved pair of services are properly configured, reviewing the starting logs of taiga-events-rabbitmq/taiga-async-rabbitmq and their consumers taiga-async/taiga-events. They shouldn’t reflect any error and connect correctly to rabbitmq.

In order to have more information about the number of queues and the status of their messages, it could be a good idea to expose the internal management UI’s ports in the docker-compose.yml:

  taiga-async-rabbitmq:
    image: rabbitmq:3.8-management-alpine
    ports:
      - "15673:15672"
      
    taiga-events-rabbitmq:
    image: rabbitmq:3.8-management-alpine
    ports:
      - "15672:15672"

This would allow to access to http://<TAIGA_DOMAIN>:15672 and http://<TAIGA_DOMAIN>:15673 to monitor the two rabbitmq services.

You shouldn’t see any high number there (either for connections, queues or messages) as they can be the source of a high CPU usage (according to the link you provided).

If you prefer, you can get the number of queues as @Pablohn26 suggested by cli once connected to either of the two containers (taiga-docker-taiga-async-rabbitmq-1 or taiga-docker-taiga-events-rabbitmq-1):

$ docker exec -it taiga-docker-taiga-async-rabbitmq-1 /bin/bash

bash-5.1# rabbitmqctl list_queues -p taiga
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Timeout: 60.0 seconds ...
Listing queues for vhost taiga ...
name	messages
tasks	4

It would be also interesting to monitor the complete execution of an asyncronous task flow to detect any possible problem. You could take the importing project task for example:

  1. export a project from Taiga using the menu Settings > Project > Export (it will create an asynchronous task to send an email with the link to download the exported .json)

  2. stop the taiga-async service stopped from docker
    $ docker stop taiga-docker-taiga-async-1

  3. Import the previous .json as a new project Project > New project > Import project > Taiga (this should create a message in the task queue in a ready status as the consumer service is stopped)
    image

  4. Re-launch the Celery service to process the importing project task
    $ docker start taiga-docker-taiga-async-1

  5. Any queued messages should have consumed and it shouldn’t be any message in the “Ready” status.
    image

One last thing to try could be to disable the rabbitmq management plugin itself, as sometimes it involves a high CPU usage.

$ docker exec -it taiga-docker-taiga-async-rabbitmq-1 /bin/bash

bash-5.1# rabbitmq-plugins disable rabbitmq_management
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Disabling plugins on node rabbit@taiga-async-rabbitmq:
rabbitmq_management
The following plugins have been configured:
  rabbitmq_management_agent
  rabbitmq_prometheus
  rabbitmq_web_dispatch
Applying plugin configuration to rabbit@taiga-async-rabbitmq...
The following plugins have been disabled:
  rabbitmq_management

stopped 1 plugins.

We really hope this answer helps you, and please, keep us informed if you finally come to a solution or if you need more help.

Hello Daniel.

This is the result of list_queues

root@taiga:/var/log# docker exec -it taiga-docker_taiga-async-rabbitmq_1 /bin/bash
bash-5.1# rabbitmqctl list_queues -p taiga
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Timeout: 60.0 seconds ...
Listing queues for vhost taiga ...
name	messages
celeryev.ede35294-8e9a-4d6f-8db3-3ff18cc752c4	0
celery@95c798490bef.celery.pidbox	0
tasks	0
bash-5.1#

Disabling rabbitmq_management plugin doesn’t help :frowning:

I can see this in syslog

root@taiga:/var/log# tail -f /var/log/syslog
May 24 13:16:36 taiga systemd[1]: run-docker-runtime\x2drunc-moby-d8221166b635e5ebf94cf9aac3e7a277a348814221ad2cba1ccd89bb28290fd9-runc.T5yIcd.mount: Deactivated successfully.
May 24 13:16:39 taiga systemd[1]: run-docker-runtime\x2drunc-moby-d8221166b635e5ebf94cf9aac3e7a277a348814221ad2cba1ccd89bb28290fd9-runc.od3Psz.mount: Deactivated successfully.
May 24 13:16:41 taiga systemd[1]: run-docker-runtime\x2drunc-moby-1a8b1d7c2cb10cbe8bad3f31f1a52b142b9d81a49653bdca52dc90bb106a2ff0-runc.J2w3kv.mount: Deactivated successfully.
May 24 13:16:42 taiga systemd[1]: run-docker-runtime\x2drunc-moby-d8221166b635e5ebf94cf9aac3e7a277a348814221ad2cba1ccd89bb28290fd9-runc.MSmtQe.mount: Deactivated successfully.
May 24 13:16:43 taiga systemd[1]: run-docker-runtime\x2drunc-moby-03b214f41bcaa1636fdd550fd820233fb76838be55f92cdcee6f4c63b648b5b9-runc.Gyyz7h.mount: Deactivated successfully.
May 24 13:16:44 taiga systemd[1]: run-docker-runtime\x2drunc-moby-d8221166b635e5ebf94cf9aac3e7a277a348814221ad2cba1ccd89bb28290fd9-runc.009BGM.mount: Deactivated successfully.
May 24 13:16:47 taiga systemd[1]: run-docker-runtime\x2drunc-moby-d8221166b635e5ebf94cf9aac3e7a277a348814221ad2cba1ccd89bb28290fd9-runc.nehqnR.mount: Deactivated successfully.
May 24 13:16:49 taiga systemd[1]: run-docker-runtime\x2drunc-moby-1a8b1d7c2cb10cbe8bad3f31f1a52b142b9d81a49653bdca52dc90bb106a2ff0-runc.d3xh4Z.mount: Deactivated successfully.

Thank you

It seems to be right.

Could you please start Taiga with docker compose up -d and post here the logs for these services:
$ docker compose logs --follow taiga-back taiga-async taiga-events-rabbitmq taiga-events taiga-events-rabbitmq

Enter some blank lines in the terminal and try to import/export a project in Taiga.
Does it get created? Can you post the new lines?.

Please, verify you don’t post any sensible data.

Thanks in advance

Here there’s another approach you can take: Runtime Tuning — RabbitMQ. Try defining some of the suggested environment variables to tweak the Earlang virtual machine, to see if it reduces your idle CPU usage.

Was this issue ever resolved, @vcarceler @daniel.herrero?

Because we are seeing a similar behavior of beam.smp using an disproportional amount of system resources.

Hi @viamonster,

I’m not sure if someone has finally solved the issue, but if that’s the case, it would be great to share the solution with the community.

Meanwhile, we’ve read that sometimes the problem can be fixed by simply updating to the newest versions of RabbitMQ. And this process is straightforward.

It just involves udpating two images in the docker-compose.yml file, following these steps:

  1. Stop Taiga with docker compose down
  2. Update the two services that use RabbitMQ (in docker-compose.yml) to use its latest stable version:
  taiga-events-rabbitmq:
    image: rabbitmq:3.12.2-management-alpine

  taiga-async-rabbitmq:
    image: rabbitmq:3.12.2-management-alpine
  1. To avoid the complicated process of updating RabbitMQ from the 3.8 version, it’s better to start from scratch and delete its previous clusters:
docker rm taiga-docker-taiga-events-rabbitmq-1 taiga-docker-taiga-async-rabbitmq-1
docker volume rm taiga-docker_taiga-async-rabbitmq-data taiga-docker_taiga-events-rabbitmq-data

Then, you should be able to start Taiga normally with docker-compose up.

Since it’s been difficult for us to reproduce the problem, could someone who experienced this issue try updating the images and report back with the new metrics?

Thanks in advance!

Hi @vcarceler, @Pablohn26, @anmcarrow, @viamonster

We’ve just release a Docker patch that may fix some of these CPU problems.

Take a look at this:

Please, let us know if it helps to reduce the CPU usage.

Unfortunately, beam.smp is still generating cpu usage spikes.

Hi @viamonster

Well, I wouldn’t consider that as a high CPU usage peak. I remember the last time I reproduced this issue, previously to the mentioned fix, it was aroung 80-90% CPU load.

If you still find it high, I recomend you to try some of the fine runtime tunning in RabbitMQ.

Please, remember to share it with us if you finally achieve to reduce those spikes.

In my case, it decreased the CPU usage dramatically. Thanks a lot.

1 Like