Temporary Latency Degradation
Incident Report for Sardine AI
Postmortem

Incident Overview

  • Duration: Approximately 1 hours, from 2023-12-01 08:01:00 to 2023-12-01 09:05:00 PST
  • Services Affected: Customers API, Issuing API
  • Impact: elevated latency

Root Cause Analysis

  • Primary Issue: The degradation was due to an increase in the traffic by multiple jobs via multiple clients that coincidentally runs on the first day of the month
  • Second Factor: While traffic was not significant, traffic resulted in unexpected query pattern that caused slow database queries
  • Third Factor: This coincided with recent database configuration change sardine was internally rolling out for performance optimization. During rollout we didn’t provision enough read replica resources
  • Technical Impact: The result was a performance degradation

Action items

  1. Optimize database configuration including provisioning more replica, use appropriate resource configuration, and optimizing parameters
  2. Optimize query and access pattern
Posted Dec 07, 2023 - 17:47 UTC

Resolved
We had a latency degradation for the customers API and issuing API from around 8:01am PT to 9:05am PT. The issue was identified and fixed, and we will share a post-mortem as soon as possible.
Posted Dec 01, 2023 - 17:33 UTC
This incident affected: Customer APIs.