Purpose: This report provides an overview of the recent service disruption impacting users of the Customers API.
Apology: We sincerely apologize for any inconvenience this disruption may have caused. We remain dedicated to maintaining high service availability and reliability.
Incident Overview
Duration: Approximately 30 minutes, from: 2025-07-17, 08:35 PM UTC to 2025-07-17, 09:05 PM UTC
Region Affected: All regions
Services Affected: Customers API
Root Cause Analysis
Primary Issue:
A change in the type of 3 features used in the context of our rules engine when processing Customers API calls inadvertently made several incoming requests to fail
Detailed Explanation:
The issue occurred because the deployment of this update followed a canary strategy. This meant that, during a period of time, different instances of some internal services were processing the same feature differently. This caused unmarshalling errors to occur in cross-service communication, which in turn failed the overall request associated with them.
Impact
Service Accessibility: The majority of requests to the Customers API failed during the incident window.
Detection and Recovery Time
A few clients reached about a spike in errors from the Customers API. About the same time, our monitors spotted an abnormal error rate in the API and paged the on-call engineer. Once aware of the issue, our engineering them immediately found the root cause and rolled back the faulty commit.
Corrective Actions and Improvements
Immediate Response:
The faulty commit was removed from production as soon as the problem was discovered, promptly restoring the Customers API to normal operation for all partners.
Preventive Measures:
Monitors and alerts are going to be put in place in our Sandbox environment to prevent this kind of issue from happening again in production
Conclusion
Commitment:
Sardine remains firmly committed to delivering reliable and resilient services to our partners. We deeply regret the inconvenience caused by this incident and appreciate your patience and understanding.
Appreciation:
Thank you for your continued trust and partnership. We value your support as we strengthen our systems and processes to ensure greater reliability and stability.
Posted Jul 18, 2025 - 18:39 UTC
Resolved
This incident has been resolved.
Posted Jul 17, 2025 - 21:17 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jul 17, 2025 - 21:09 UTC
Identified
The issue has been identified and a fix is being implemented.