Due to the misconfiguration of the internal system combined, one of our internal API requests contains a huge payload. To prevent a potential DDoS attack, the service stopped accepting these requests and began returning a status code 413 (Payload Too Large). This, in turn, caused our customer’s API calls to fail, resulting in status code 500 errors.
The issue was mitigated within a few minutes for most of the clients.
Improve alerts
Improve the process for escalation and communication
Improve the process for pre-prod testing and rollout to prevent issues related to configuration update