Increased latency

Incident Report for Sardine AI

Postmortem

Impact

High latency for v1/customers and v1/issuing-risk endpoints, latency was observed to be around 3.5 - 4 secs during the outage

Timeline

Latency reported around 5:12PM EST

Service restored ~2hours later at 7:20 PM EST

Root Cause

Scaling issue - Primary DB instance was at max capacity (>90%) although the number of connections were low.

Resolution

Scaled up Primary instance from 8 vCPU, 64 GB to 16 vCPU, 128 GB . After scaling the db server, CPU utilization drops and latency came down. Since then CPU usage on primary has been below 50%.

Posted Jul 17, 2024 - 18:29 UTC

Resolved

We experienced degraded performance on one of our databases. Resources were scaled as needed to cope with demand.

Posted Jul 16, 2024 - 23:40 UTC

Identified

Experiencing increased latency. Our engineering team is already working on it

Posted Jul 16, 2024 - 21:12 UTC

This incident affected: Customer APIs.