Keeping CI Green with

Keeping CI Green with
Keeping CI Green with
Trunk Merge QueueandTrunk Flaky Tests
Caseware

How a 200-engineer team went from 6-hour merge queues to 90-minute merges on day one

With 14 days data for free

Challenges

6-8 hour median queue times
40% test flakiness rate
Homegrown sequential merge queue

Solution

Intelligent batching with 4-8 PRs simultaneously
Automatic bisection isolates failures
Anti-flake protection prevents false ejections

Results

6+ hours → 90 minutes median wait
75% queue time reduction on day one
Zero developer complaints about merge delays

"It was a material sore point for us. So much so that we actually had to implement a war room to figure out how to mitigate the issues we were facing."
Amir TooleVP Platform Engineering, Caseware

The Challenge

Caseware's 200-person engineering organization operates one of the more demanding CI environments in enterprise software: a 900-project NX monorepo with deeply interdependent packages, all merging into a single develop branch. Their internally-built merge queue, running on AWS Lambda, had served the team well but was designed for sequential processing - one PR at a time. As the organization scaled, sprint endings brought 50-60 PRs daily, pushing median wait times to six to eight hours with peaks stretching longer. Leadership recognized that the bottleneck required a purpose-built solution.

Native NX Support

Caseware selected Trunk Merge Queue specifically because it natively integrates with NX's affected graph - the dependency map that determines which projects need to rebuild when code changes. Most merge queue solutions treat monorepos as monoliths, running the entire test suite regardless of what changed. Trunk's NX integration understands interdependencies, running only the tests that matter for each PR. This graph-aware approach was a requirement for Caseware; without it, any new queue would face the same throughput constraints at scale.

Intelligent Batching

The single biggest throughput gain came from moving away from sequential processing entirely. Trunk batches groups of PRs together, testing them simultaneously rather than one at a time. Caseware configured batches of four PRs with a maximum wait time of 15 minutes - meaning no developer waits more than 15 minutes for their PR to enter a batch, and each batch validates multiple PRs in parallel. This alone increased effective throughput by 4x without adding any CI infrastructure. The result was immediate: median queue times dropped 75% on day one.

Automatic Bisection

When a test fails in a batch, traditional merge queues eject the entire batch - requiring all developers to resubmit regardless of which PR caused the failure. Trunk's automatic bisection identifies the single failing PR through a binary search, removes only that PR, and lets the others merge. For Caseware, this eliminated manual triage and re-queuing, freeing the DevOps team from investigating which PR caused each failure and letting engineers focus on shipping code rather than monitoring queue status.

The New Normal: 90 Minutes and Forgotten Complaints

Like most large-scale test suites, Caseware's CI included tests with intermittent failures - a common challenge in complex distributed systems. Trunk's anti-flake features addressed this directly. Optimistic Merging allows known-flaky tests to fail without blocking the merge, while Pending Failure Depth prevents transient failures from cascading through the queue. The combination meant engineers could trust the queue to merge their code without manual intervention. Caseware also adopted Trunk's Flaky Tests to surface test reliability trends and prioritize fixes - closing the loop between unblocking merges and improving CI health over time.

The Outcome

Median queue time dropped from 6+ hours to just 90 minutes. Even the P95 - the longest 5% of waits - shrank from multiple days to a consistent 2-5 hours during peak sprint loads. The DevOps team shifted from queue operations to strategic infrastructure work. The team now monitors a single health metric: commits reaching main within 2 hours. If that threshold trips, they investigate; otherwise, the queue runs itself. Toole summed up the transformation: "I can't remember the last time someone complained about queue length. It's been great, honestly."