Error evaluating rules in /v1/customers API endpoint

Incident Report for Sardine AI

Postmortem

To: Affected Partners

From: Sardine.ai

Introduction

  • Purpose: This report provides an overview of the recent service disruption impacting users of the Customers API.
  • Apology: We sincerely apologize for any inconvenience this disruption may have caused. We remain dedicated to maintaining high service availability and reliability.

Incident Overview

  • Duration: Approximately 30 minutes, from: 2025-07-17, 08:35 PM UTC to 2025-07-17, 09:05 PM UTC
  • Region Affected: All regions
  • Services Affected: Customers API

Root Cause Analysis

  • Primary Issue:

    A change in the type of 3 features used in the context of our rules engine when processing Customers API calls inadvertently made several incoming requests to fail

  • Detailed Explanation:

    The issue occurred because the deployment of this update followed a canary strategy. This meant that, during a period of time, different instances of some internal services were processing the same feature differently. This caused unmarshalling errors to occur in cross-service communication, which in turn failed the overall request associated with them.

Impact

  • Service Accessibility: The majority of requests to the Customers API failed during the incident window.

Detection and Recovery Time

  • A few clients reached about a spike in errors from the Customers API. About the same time, our monitors spotted an abnormal error rate in the API and paged the on-call engineer. Once aware of the issue, our engineering them immediately found the root cause and rolled back the faulty commit.

Corrective Actions and Improvements

  • Immediate Response:

    The faulty commit was removed from production as soon as the problem was discovered, promptly restoring the Customers API to normal operation for all partners.

  • Preventive Measures:

    Monitors and alerts are going to be put in place in our Sandbox environment to prevent this kind of issue from happening again in production

Conclusion

  • Commitment:

    Sardine remains firmly committed to delivering reliable and resilient services to our partners. We deeply regret the inconvenience caused by this incident and appreciate your patience and understanding.

  • Appreciation:

    Thank you for your continued trust and partnership. We value your support as we strengthen our systems and processes to ensure greater reliability and stability.

Posted Jul 18, 2025 - 18:39 UTC

Resolved

This incident has been resolved.
Posted Jul 17, 2025 - 21:17 UTC

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Jul 17, 2025 - 21:09 UTC

Identified

The issue has been identified and a fix is being implemented.
Posted Jul 17, 2025 - 20:59 UTC
This incident affected: Customer APIs.