Degraded Auth/Statistics/Transaction Service due to high latencies from Cassandra in Rugby environment
Likely affected endpoints:
What did we learn?
Due to its way of working, it is hard to be proactive in such situations.
The issue started at around 04:10 CET on Jan 5th and ended at 08:25 CET on the same day.
Posted Jan 14, 2022 - 10:53 CET
We're back to normal. The culprit was a spike in deleting/updating some Cassandra depending services. The issue started at around 4 AM and caused a certain node to go down. A restoration resolved the situation.
Also the Transaction Service was affected by the issue from 04:10 to 08:25 CET, with a partial error rate on the affected APIs, with highest degradation (up to 10% failure rate) from 07:50 to 08:15 CET.
Posted Jan 05, 2022 - 08:31 CET
Mostly affected is the production environment for the Royal Bank of Scotland. We've already discovered the culprit for this issue and took action to fix it.