Userstories/api.py endpoints worker timeouts

After a migration from a docker compose installation into k8s, when creating new tickets on the kanban board, frontend hangs up and timeouts.
The configuration changed a lot between the instances, as the services were renamed during migration.

This is how the log looks like on backend(image docker.io/taigaio/taiga-back:6.7.0):

Executing pending migrations
Operations to perform:
  Apply all migrations: admin, attachments, auth, bitbucket, contact, contenttypes, custom_attributes, djmail, easy_thumbnails, epics, external_apps, feedback, github, gitlab, gogs, history, issues, likes, milestones, notifications, projects, references, sessions, settings, tasks, telemetry, timeline, token_denylist, users, userstorage, userstories, votes, webhooks, wiki
Running migrations:
  No migrations to apply.
Load default templates
Installed 2 object(s) from 1 fixture(s)
Give permission to taiga:taiga
Starting Taiga API...
[2023-08-18 19:45:51 +0000] [1] [INFO] Starting gunicorn 20.1.0
[2023-08-18 19:45:51 +0000] [1] [INFO] Listening at: http://0.0.0.0:8000 (1)
[2023-08-18 19:45:51 +0000] [1] [INFO] Using worker: sync
[2023-08-18 19:45:51 +0000] [31] [INFO] Booting worker with pid: 31
[2023-08-18 19:45:51 +0000] [32] [INFO] Booting worker with pid: 32
[2023-08-18 19:45:51 +0000] [33] [INFO] Booting worker with pid: 33
/opt/venv/lib/python3.11/site-packages/easy_thumbnails/source_generators.py:47: UserWarning: Could not import VIL for SVG image support: No module named 'reportlab'.
  warnings.warn(f"Could not import VIL for SVG image support: {ie}.")
/opt/venv/lib/python3.11/site-packages/easy_thumbnails/source_generators.py:47: UserWarning: Could not import VIL for SVG image support: No module named 'reportlab'.
  warnings.warn(f"Could not import VIL for SVG image support: {ie}.")
[2023-08-18 19:46:55 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:31)
[2023-08-18 19:46:56 +0000] [74] [INFO] Booting worker with pid: 74

The rest of the system seem to work fine, I’m just baffled I could not see any logs that would explain the timeouts.

By means of debugging with print statements I’ve managed to pinpoint that the worker hangs up when the response is being returned:

[2023-08-18 20:02:27 +0000] [1] [INFO] Handling signal: hup
[2023-08-18 20:02:27 +0000] [1] [INFO] Hang up: Master
[2023-08-18 20:02:28 +0000] [703] [INFO] Booting worker with pid: 703
[2023-08-18 20:02:28 +0000] [704] [INFO] Booting worker with pid: 704
[2023-08-18 20:02:28 +0000] [705] [INFO] Booting worker with pid: 705
[2023-08-18 20:02:28 +0000] [1] [WARNING] Worker with pid 32 was terminated due to signal 15
/opt/venv/lib/python3.11/site-packages/easy_thumbnails/source_generators.py:47: UserWarning: Could not import VIL for SVG image support: No module named 'reportlab'.
  warnings.warn(f"Could not import VIL for SVG image support: {ie}.")
  File "/opt/venv/bin/gunicorn", line 8, in <module>
    sys.exit(run())
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/app/wsgiapp.py", line 67, in run
    WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/app/base.py", line 231, in run
    super().run()
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/app/base.py", line 72, in run
    Arbiter(self).run()
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/arbiter.py", line 224, in run
    handler()
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/arbiter.py", line 253, in handle_hup
    self.reload()
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/arbiter.py", line 481, in reload
    self.spawn_worker()
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
    worker.init_process()
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/workers/base.py", line 142, in init_process
    self.run()
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/workers/sync.py", line 125, in run
    self.run_for_one(timeout)
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/workers/sync.py", line 69, in run_for_one
    self.accept(listener)
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/workers/sync.py", line 31, in accept
    self.handle(listener, client, addr)
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/workers/sync.py", line 136, in handle
    self.handle_request(listener, req, client, addr)
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/workers/sync.py", line 179, in handle_request
    respiter = self.wsgi(environ, resp.start_response)
  File "/opt/venv/lib/python3.11/site-packages/django/core/handlers/wsgi.py", line 133, in __call__
    response = self.get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/core/handlers/base.py", line 130, in get_response
    response = self._middleware_chain(request)
  File "/opt/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/taiga-back/taiga/base/middleware/cors.py", line 32, in __call__
    response = self.get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/taiga-back/taiga/events/middleware.py", line 44, in __call__
    response = self.get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/opt/venv/lib/python3.11/site-packages/django/core/handlers/base.py", line 181, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/taiga-back/taiga/base/api/viewsets.py", line 95, in view
    return self.dispatch(request, *args, **kwargs)
  File "/opt/venv/lib/python3.11/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "/taiga-back/taiga/base/api/views.py", line 447, in dispatch
    response = handler(request, *args, **kwargs)
  File "/usr/local/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
  File "/taiga-back/taiga/projects/userstories/api.py", line 309, in create
    traceback.print_stack()

The call hangs up between the userstories/api.py return statement and api/views.py handler call.
The handler never returns a response. I guess there should be some django magic I cannot see in between those traces.

I don’t see any spike in CPU activity when this happens.

Can you help me troubleshoot this?
Logs:
events

Starting Taiga events

> taigaio-events@6.7.0 start:production
> NODE_ENV=production node ./src/index.js

{"message":"WS server is started","level":"info","service":"taiga-events","timestamp":"2023-08-18 19:45:46"}
{"message":"Liveness / Readiness server was started in http://localhost:3023/healthz","level":"info","service":"taiga-events","timestamp":"2023-08-18 19:45:46"}
{"message":"ws-connection-open: 3a96965b-72d1-4521-807b-07f7c945e430","level":"info","service":"taiga-events","timestamp":"2023-08-18 19:46:20"}
{"message":"auth: {\"cmd\":\"auth\",\"data\":{\"token\":\"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjkyNDY1MjY1LCJqdGkiOiJmOGJmOWI3YmUxMGQ0MDVhOTBiYWI5MjIyYjFjNjAzZCIsInVzZXJfaWQiOjZ9.jeD2s_3vKhSD5rbqFcuxYhthPsE3_cG6PBh4nU15ouc\",\"sessionId\":\"a06a19ba6548e4c6ee159890557a560e9d3a301e\"}}","level":"info","service":"taiga-events","timestamp":"2023-08-18 19:46:20"}
{"message":"subscribe: {\"cmd\":\"subscribe\",\"routing_key\":\"notifications\"}","level":"info","service":"taiga-events","timestamp":"2023-08-18 19:46:20"}
{"message":"subscribe: {\"cmd\":\"subscribe\",\"routing_key\":\"live_notifications.6\"}","level":"info","service":"taiga-events","timestamp":"2023-08-18 19:46:20"}
{"message":"subscribe: {\"cmd\":\"subscribe\",\"routing_key\":\"web_notifications.6\"}","level":"info","service":"taiga-events","timestamp":"2023-08-18 19:46:20"}
{"message":"subscribe: {\"cmd\":\"subscribe\",\"routing_key\":\"changes.project.1.userstories\"}","level":"info","service":"taiga-events","timestamp":"2023-08-18 19:46:20"}
{"message":"subscribe: {\"cmd\":\"subscribe\",\"routing_key\":\"changes.project.1.projects\"}","level":"info","service":"taiga-events","timestamp":"2023-08-18 19:46:20"}
{"message":"ws-connection-open: b58c8e9f-848a-4c13-9587-520fc673def2","level":"info","service":"taiga-events","timestamp":"2023-08-18 19:48:53"}
{"message":"auth: {\"cmd\":\"auth\",\"data\":{\"token\":\"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjkyNDQ4MjE0LCJqdGkiOiI5YTFlZmVlNmM5MTc0MjNiYWNmNjlhMzA5YTc1MjZmMyIsInVzZXJfaWQiOjh9.bXkWR6CB5lDWbqt0FY4g7RyIpXbAuj7UFK2IUxYAJXM\",\"sessionId\":\"8155d4a302a87612c567af70bc57244620e50f44\"}}","level":"info","service":"taiga-events","timestamp":"2023-08-18 19:48:53"}
{"message":"subscribe: {\"cmd\":\"subscribe\",\"routing_key\":\"notifications\"}","level":"info","service":"taiga-events","timestamp":"2023-08-18 19:48:53"}
{"message":"subscribe: {\"cmd\":\"subscribe\",\"routing_key\":\"web_notifications.8\"}","level":"info","service":"taiga-events","timestamp":"2023-08-18 19:48:53"}

async:

[2023-08-18 20:05:51,583: INFO/Beat] Scheduler: Sending due task send-bulk-emails (taiga.projects.notifications.tasks.send_bulk_email)
[2023-08-18 20:05:51,586: INFO/MainProcess] Task taiga.projects.notifications.tasks.send_bulk_email[a5716085-818d-44d6-867d-de17f4cc1d4a] received
[2023-08-18 20:07:51,583: INFO/Beat] Scheduler: Sending due task send-bulk-emails (taiga.projects.notifications.tasks.send_bulk_email)
[2023-08-18 20:07:51,586: INFO/MainProcess] Task taiga.projects.notifications.tasks.send_bulk_email[7237cef6-0d8a-4117-9eb9-49b4e69adf64] received
[2023-08-18 20:09:51,583: INFO/Beat] Scheduler: Sending due task send-bulk-emails (taiga.projects.notifications.tasks.send_bulk_email)
[2023-08-18 20:09:51,587: INFO/MainProcess] Task taiga.projects.notifications.tasks.send_bulk_email[f7bba4e1-f4e3-4d2a-8f2b-58a37fbd355e] received

rabbitmq:

2023-08-18 19:43:30.162917+00:00 [warning] <0.9621.0> closing AMQP connection <0.9621.0> (10.42.0.86:48342 -> 10.42.2.122:5672, vhost: 'taiga', user: 'taiga'):
2023-08-18 19:43:30.162917+00:00 [warning] <0.9621.0> client unexpectedly closed TCP connection
2023-08-18 19:43:31.179932+00:00 [info] <0.9514.0> closing AMQP connection <0.9514.0> (10.42.0.86:43494 -> 10.42.2.122:5672, vhost: 'taiga', user: 'taiga')
2023-08-18 19:43:31.181251+00:00 [info] <0.9506.0> closing AMQP connection <0.9506.0> (10.42.0.86:43490 -> 10.42.2.122:5672, vhost: 'taiga', user: 'taiga')
2023-08-18 19:43:32.066112+00:00 [warning] <0.9533.0> closing AMQP connection <0.9533.0> (10.42.0.86:43508 -> 10.42.2.122:5672, vhost: 'taiga', user: 'taiga'):
2023-08-18 19:43:32.066112+00:00 [warning] <0.9533.0> client unexpectedly closed TCP connection
2023-08-18 19:43:51.804398+00:00 [info] <0.9736.0> accepting AMQP connection <0.9736.0> (10.42.0.87:59394 -> 10.42.2.122:5672)
2023-08-18 19:43:51.806984+00:00 [info] <0.9736.0> connection <0.9736.0> (10.42.0.87:59394 -> 10.42.2.122:5672): user 'taiga' authenticated and granted access to vhost 'taiga'
2023-08-18 19:43:51.810602+00:00 [info] <0.9744.0> accepting AMQP connection <0.9744.0> (10.42.0.87:59408 -> 10.42.2.122:5672)
2023-08-18 19:43:51.812692+00:00 [info] <0.9744.0> connection <0.9744.0> (10.42.0.87:59408 -> 10.42.2.122:5672): user 'taiga' authenticated and granted access to vhost 'taiga'
2023-08-18 19:43:51.823583+00:00 [info] <0.9768.0> accepting AMQP connection <0.9768.0> (10.42.0.87:59422 -> 10.42.2.122:5672)
2023-08-18 19:43:51.825457+00:00 [info] <0.9768.0> connection <0.9768.0> (10.42.0.87:59422 -> 10.42.2.122:5672): user 'taiga' authenticated and granted access to vhost 'taiga'
2023-08-18 19:45:38.945993+00:00 [info] <0.9744.0> closing AMQP connection <0.9744.0> (10.42.0.87:59408 -> 10.42.2.122:5672, vhost: 'taiga', user: 'taiga')
2023-08-18 19:45:38.947314+00:00 [info] <0.9736.0> closing AMQP connection <0.9736.0> (10.42.0.87:59394 -> 10.42.2.122:5672, vhost: 'taiga', user: 'taiga')
2023-08-18 19:45:39.818711+00:00 [warning] <0.9768.0> closing AMQP connection <0.9768.0> (10.42.0.87:59422 -> 10.42.2.122:5672, vhost: 'taiga', user: 'taiga'):
2023-08-18 19:45:39.818711+00:00 [warning] <0.9768.0> client unexpectedly closed TCP connection
2023-08-18 19:45:50.157593+00:00 [info] <0.9868.0> accepting AMQP connection <0.9868.0> (10.42.0.88:51698 -> 10.42.2.122:5672)
2023-08-18 19:45:50.166227+00:00 [info] <0.9868.0> connection <0.9868.0> (10.42.0.88:51698 -> 10.42.2.122:5672): user 'taiga' authenticated and granted access to vhost 'taiga'
2023-08-18 19:45:50.178386+00:00 [info] <0.9876.0> accepting AMQP connection <0.9876.0> (10.42.0.88:51704 -> 10.42.2.122:5672)
2023-08-18 19:45:50.180721+00:00 [info] <0.9876.0> connection <0.9876.0> (10.42.0.88:51704 -> 10.42.2.122:5672): user 'taiga' authenticated and granted access to vhost 'taiga'
2023-08-18 19:45:50.195572+00:00 [info] <0.9895.0> accepting AMQP connection <0.9895.0> (10.42.0.88:51710 -> 10.42.2.122:5672)
2023-08-18 19:45:50.198777+00:00 [info] <0.9895.0> connection <0.9895.0> (10.42.0.88:51710 -> 10.42.2.122:5672): user 'taiga' authenticated and granted access to vhost 'taiga'
2023-08-18 19:46:20.194089+00:00 [info] <0.9949.0> accepting AMQP connection <0.9949.0> (10.42.0.88:49888 -> 10.42.2.122:5672)
2023-08-18 19:46:20.242401+00:00 [info] <0.9949.0> connection <0.9949.0> (10.42.0.88:49888 -> 10.42.2.122:5672): user 'taiga' authenticated and granted access to vhost 'taiga'
2023-08-18 19:47:51.590034+00:00 [info] <0.10035.0> accepting AMQP connection <0.10035.0> (10.42.0.88:59458 -> 10.42.2.122:5672)
2023-08-18 19:47:51.592479+00:00 [info] <0.10035.0> connection <0.10035.0> (10.42.0.88:59458 -> 10.42.2.122:5672): user 'taiga' authenticated and granted access to vhost 'taiga'

postgresql:

postgresql 17:08:43.04 
postgresql 17:08:43.04 Welcome to the Bitnami postgresql container
postgresql 17:08:43.04 Subscribe to project updates by watching https://github.com/bitnami/containers
postgresql 17:08:43.04 Submit issues and feature requests at https://github.com/bitnami/containers/issues
postgresql 17:08:43.05 
postgresql 17:08:43.05 INFO  ==> ** Starting PostgreSQL setup **
postgresql 17:08:43.07 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql 17:08:43.08 INFO  ==> Loading custom pre-init scripts...
postgresql 17:08:43.08 INFO  ==> Initializing PostgreSQL database...
postgresql 17:08:43.09 INFO  ==> pg_hba.conf file not detected. Generating it...
postgresql 17:08:43.09 INFO  ==> Generating local authentication configuration
postgresql 17:08:43.10 INFO  ==> Deploying PostgreSQL with persisted data...
postgresql 17:08:43.11 INFO  ==> Configuring replication parameters
postgresql 17:08:43.14 INFO  ==> Configuring fsync
postgresql 17:08:43.15 INFO  ==> Configuring synchronous_replication
postgresql 17:08:43.18 INFO  ==> Loading custom scripts...
postgresql 17:08:43.18 INFO  ==> Enabling remote connections
postgresql 17:08:43.19 INFO  ==> ** PostgreSQL setup finished! **

postgresql 17:08:43.22 INFO  ==> ** Starting PostgreSQL **
2023-08-18 17:08:43.340 GMT [1] LOG:  pgaudit extension initialized
2023-08-18 17:08:43.340 GMT [1] LOG:  starting PostgreSQL 12.16 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-08-18 17:08:43.341 GMT [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-08-18 17:08:43.341 GMT [1] LOG:  listening on IPv6 address "::", port 5432
2023-08-18 17:08:43.374 GMT [1] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2023-08-18 17:08:43.466 GMT [89] LOG:  database system was shut down at 2023-08-18 17:02:32 GMT
2023-08-18 17:08:43.598 GMT [1] LOG:  database system is ready to accept connections

:wave: @decappi

First of all, we are not k8s experts and currently we cannot give good support to k8s topics. That said, I’m taking a look at the logs.

Just to confirm, the docker image worked as expected before migrating to k8s? or you had this error as well?

As far as I see, you may have some requirements missing:
/opt/venv/lib/python3.11/site-packages/easy_thumbnails/source_generators.py:47: UserWarning: Could not import VIL for SVG image support: No module named 'reportlab'.
This is a warning and should not be creating the error. To dismiss this line of troubleshooting: do this only happen when creating issues with images? or it happens also with issues text-only?

How did you add the “print_stack” statement? Did you clone the repo, change some files and build a new image? Or you’re adding manually inside a container?

Solved:
Using the helm chart from here: https://github.com/jmferrer/taiga-docker/tree/taiga6-helm-chart/helm-chart
Values are not documented in the README.md, so I just took the relevant values from the values.yml and modified them.

:wave: @yami
Thanks, I’ve figured it out!

To dismiss this line of troubleshooting: do this only happen when creating issues with images? or it happens also with issues text-only?

This happened on starting the container, no issues were created yet, you can see I needed a minute to create a new issue after starting the container.

Just to confirm, the docker image worked as expected before migrating to k8s? or you had this error as well?

No, this error was new. I assume this was due to some version difference on my previous helm chart.

How did you add the “print_stack” statement? Did you clone the repo, change some files and build a new image? Or you’re adding manually inside a container?

I’ve added the statements by logging into the taiga-back container, then

apt update
apt install neovim
nvim <WHATEVER FILE I WAS DEBUGGING>
kill -HUP 1 #this restarts the workers with the new code

You guys should really consider officially supporting k8s, take the linked helm chart as a basis for your own. Now Taiga looks very nice in my cluster :grinning:

1 Like