Performance Degradation

Incident Report for BlueDolphin

Resolved

What happened?

Between 14:00 and 14:30 a platform issue resulted in service performance degradation and issues. Customers experienced issues with navigation bar and admin section.

What went wrong and why?

One instance of user service , which is responsible for authorization lost communication with cache and data layer responding in timeouts. Authenticated users could not authorize and components required authorization stopped working.

How did we respond?

The error was corrected by removing faulted instance. Most resources were automatically restored and communication was re-established.

14:00 CET on 06-10-2025 – Customer impact began, triggered by the change described above.
14:00 CET on 06-10-2025 – Issue detected by the customers, and monitoring.
14:15 CET on 06-10-2025 – Investigation commenced by our DevOps and Back-End team.
14:20 CET on 06-10-2025 – We performed steps to revert code change.
14:22 CET on 06-10-2025 – Code release revered, issue still exists.
14:27 CET on 06-10-2025 – Issue detected on specific service and communication between Azure Redis.
14:30 CET on 06-10-2025 – Faulty instance of the service removed. Errors dropped.
14:35 CET on 06-10-2025 – Validation of recovery was confirmed for the majority of impacted services.

How are we making incidents like this less likely or less impactful?

We will add more alerts to detect earlier similar issue and automate recovery process.
We will investigate ways to improve the recovery time for resources affected by such issues.
Posted Oct 06, 2025 - 14:00 CEST