Sonar ACH Datapack Issue
Incident Report for Sardine AI
Postmortem

Incident Description

The Sonar API started responding to a series of 500 errors to the ACH datapack users, it happened between 3:20 pm to 6:50 pm EST.

Impact

ACH Datapack users were unable to receive a proper response to their requests for 3 hours.

Timeline

(all EST)

  • At 3:20 PM manual config change on Sardine risk service (not SONAR service) was performed by a engineer
  • At 3:25 PM the 5xx monitor started showing non 0 5xx responses to requests using the ACH Datapack
  • At 5:06 PM we received a message from a ACH datapack user that reported receiving 5xx error messages for some time
  • At 5:18 PM The team got notified and started investigating
  • At 5:30 PM The most recent deployment was rolled back, but did not impact the errors.
  • From 5:30 PM to 6:30 Further investigation is done and a hotfix has to be deployed.
  • 6:50PM the hotfix is in place and fixes the issue.

Root cause

We had an exceptional deployment for all risk related services at Sardine that Tuesday evening and also some internal configurations were changed at the same time, this led to the sonarAPI receiving extra data from the bank enrichment service which SonarAPI was not parsing correctly yet, causing the application to panic when doing bank enrichment.

What went wrong

  • Our monitors didn’t cover 5xx responses at a low volume like this case, we were at the highest point returning 18 5xx per 5 minutes, our threshold for warning was at least double that, and for alerts it would be 4x this volume.
  • Our automated test didn’t cover this scenario as it relies on certain production configuration

Actions

  • Fixed the monitoring tool to now respond to any 5xx sonar responses, previously it was calibrated to be resilient to some 5xx requests before triggering warnings and alerts
  • We’ll review all the bank enrichment code in sonarAPI so that extra/unwanted/invalid values won’t affect SonarAPI reliability.
  • We’re continuing to add more tests to guarantee the integration between sonar and the enrichment vendors is covered.
Posted Oct 11, 2024 - 17:06 UTC

Resolved
ACH Datapack users were unable to receive a proper response to their requests
Posted Oct 11, 2024 - 17:05 UTC