API Outage
Incident Report for Omnivore.io
Postmortem

Overview

On November 24, 2023, Olo's Omnivore API experienced a disruption between 21:17 UTC and 22:12 UTC. During this time all API operations with the exception of Add Payment, Open Ticket, and Submit Order were failing, and 25% of Omnivore-related webhooks experienced delayed delivery.

What Happened

On November 24, 2023, Olo experienced a disruption to the Omnivore API and related webhook delivery, caused by a failure in the automated process for creating new Omnivore API instances. As traffic to the Omnivore API increased, its auto-scaling system was unable to add capacity to meet it. As a result, at 21:17 UTC all API operations with the exception of Add Payment, Open Ticket, and Submit Order began to fail, and 25% of Omnivore-related webhooks began to experience delayed delivery.

We discovered that some of our package dependencies had been updated by their maintainers to require a newer runtime version than what was available in our deployment pipeline. This caused the bootstrapping process to fail for new instances that were needed to handle current traffic levels. With this identified, we implemented and deployed a fix to remove the failing dependencies from the API's critical path, allowing the system to resume scaling out additional API instances and restoring service at 21:12 UTC. 

Next Steps

  • We have already made improvements to our alerting to automatically detect and mitigate similar issues before they become critical.
  • We will complete our in-progress migration of all Omnivore services into our newer hosting environment, which removes these dependencies as a failure point.
Posted Jan 05, 2024 - 13:56 PST

Resolved
All systems have been functioning normally with API and Webhooks flowing normally for several hours. We will follow up with a postmortem by 12/1/2023.
Posted Nov 24, 2023 - 18:31 PST
Monitoring
We have identified the issue and implemented a fix. We are monitoring systems to ensure stability. API and webhooks traffic are flowing normally.
Posted Nov 24, 2023 - 14:25 PST
Investigating
We are currently investigating an issue that is affecting the Omnivore API.
Posted Nov 24, 2023 - 13:55 PST
This incident affected: Core Services (API, Webhooks, Control Panel).