Summary
On May 6, 2026 from approximately 16:48 to 17:57 UTC, customers using Sardine's /v1/customers and /v1/issuing/risks APIs experienced elevated latency and degraded responses. We sincerely apologize for the disruption this caused. This document summarizes what happened, why it happened, and the steps we are taking to prevent recurrence.
What Happened
During this window, requests to the affected endpoints experienced one of two behaviors:
SITO reason code, indicating that Sardine was unable to compute certain risk signals within the expected timeframe. Customers still received rule evaluation results, but with limited signals./v1/customers traffic, requests returned HTTP 500 errors.The incident was resolved at approximately 17:57 UTC after our team rerouted database traffic to a healthy replicas.
Why It Happened
The root cause was an infrastructure failure in our cloud provider's (Google Cloud) database service in the US-central1 region. Internal resource shortage for certain instance types caused a routine automatic update operation on our primary read replica to fail. While Google Cloud UI and CLI reported instance to be healthy, database instances were not properly handling incoming queries. Our team performed a manual failover to redirect traffic to a healthy replicas, which restored service.
What We're Doing About It
We are taking the following actions to reduce the likelihood and impact of similar incidents:
We are also requiring a full root cause analysis from Google Cloud within 3 business days.
We take the reliability of our platform seriously and apologize again for the impact this incident had on your operations. Please reach out to your account team or risksupport@sardine.ai if you have questions.