Intermittent connectivity issue on EU region

Incident Report for Sardine AI

Postmortem

Introduction

  • Purpose: This report provides an overview of the recent service disruption impacting users in the EU region.
  • Apology: We sincerely apologize for the inconvenience this may have caused and remain committed to maintaining a high level of service reliability.

Incident Overview

  • Duration: 45 minutes, from 2025-04-22 13:45 to 14:30 UTC
  • Region Affected: EU
  • Services Affected: /v1/customers endpoint and business-events service

Root Cause Analysis

  • Primary Issue: A misconfigured feature flag initiated the disruption.
  • Secondary Factor: A related configuration change caused service instability.

Impact

  • Service Accessibility: Intermittent connectivity issues were experienced throughout the incident window.
  • Service Downtime: The business-events service was fully unavailable for part of the duration.
  • Summary: intermittent connectivity issues across the EU region during the affected window.

Corrective Actions and Improvements

  • Immediate Response: The misconfiguration was reverted and services were promptly restored.
  • Ongoing Improvements: We are implementing additional safeguards around configuration changes and enhancing monitoring across regional environments.

Conclusion

  • Commitment: We remain focused on delivering dependable and resilient services to all partners.
  • Appreciation: Thank you for your understanding and continued trust.
Posted Apr 23, 2025 - 12:56 UTC

Resolved

We noticed an intermittent connectivity issue to certain endpoints. /v1/customers in particular on EU region. Engineers were tasked to fix it up immediately. We have got everything back up and running smoothly by 2:30 p.m. (PT), 9:30 p.m. (CEST)
Posted Apr 22, 2025 - 19:30 UTC