Merge Queue

Your Merge Queue Shouldn't Go Down When GitHub Does

March 13, 20263 min read
Matt Matheson
Matt Matheson

On February 9th, GitHub went down five separate times in about twelve hours. Nearly every major subsystem was hit: Pull requests, webhooks, Actions, Git operations. An independent analysis of GitHub's status page shows roughly 91% uptime over the last 90 days, with 37 incidents in February 2026 alone. If you use a merge queue in your GitHub repo, you've felt this pain.

A merge queue is uniquely sensitive to GitHub instability because it depends on webhooks, the REST API, Actions, status checks, and the merge API all working simultaneously. A webhook that arrives 2.5 hours late (or never arrives) can leave PRs stuck in the queue forever. GitHub's native merge queue is tightly coupled to all of these subsystems with no independent recovery mechanism. We've seen community discussions where teams report their queue stuck for 9+ hours with zero visibility into why.

We built Trunk's merge queue around a different assumption: GitHub will fail, and our queue needs to keep running when it does.

Durable event buffering

GitHub webhooks don't hit our merge service directly. They flow through a durable SQS queue that acts as a buffer. If our workers are down for any reason, messages wait. If GitHub recovers from an outage and sends a backlog of events all at once, the buffer absorbs the spike. Nothing is silently dropped.

Reconciliation: the real differentiator

The event-driven layer handles the happy path, but the piece that actually keeps the queue reliable is a reconciliation layer that runs completely independently of webhooks.

During a GitHub outage, a separate service monitors every active merge queue and compares our database state against what GitHub's API actually reports. Items stuck waiting to queue for too long get re-checked against the API. Items stuck testing for too long get their check suites re-fetched directly from GitHub. A full state sync periodically updates our status & check run records, using GitHub's API as the source of truth.

This means even if GitHub stops delivering webhooks completely (which has happened), the queue catches up on its own. No human intervention, no manual queue resets.

The principle

The reconciliation layer is the piece most merge queues don't have. When we were building it, we kept discovering new ways GitHub could fail us silently. A completed check run whose webhook never arrived. A dependency cycle that would deadlock the whole queue. Each one became a smoke detector, a scheduled check that queries GitHub directly and auto-recovers when it finds drift, without relying on anything GitHub pushes to us.

What this means for your team

GitHub's reliability issues aren't going away. The ongoing Azure migration has introduced instability since October 2025, and it's not done yet. If your queue goes down every time GitHub's status page turns yellow, that's not an inevitability. It's an architectural choice. Give Trunk a try or explore the docs.

Try it yourself or
request a demo

Get started for free