The problem every developer knows
You've just finished implementing a critical feature, pushed your code, and opened a pull request. The CI starts running, you grab coffee, and when you return... your CI is red.
Your heart sinks as you dive into the sea of CI logs, only to discover it's not your fault; it was some completely unrelated test that spuriously failed. Relief that it wasn't your code quickly fades as reality sets in: you're still stuck re-running your CI job and hoping that pesky flaky test this time decides to pass.
As fun as this sounds, there's a better way to build code and keep your CI green.
You've just been the victim of a flaky test.
Now multiply this by every developer on your team, every day. That's hundreds of wasted hours per quarter just managing test noise instead of shipping features.
This isn't just a small team problem, flaky tests impact organizations of all sizes. Solo developers lose hours to mysterious test failures. Mid-sized teams see their velocity drop as everyone learns to "just retry the build." And larger organizations? The pain multiplies across every team, every repository, every day.
Google dealt with this exact problem at scale. As Jeff Listfield from Google's engineering team explained:
"You put up your diff for review, and you get a failure. And it's totally unrelated to the change you make...you start looking into it, spend an hour or two, you're digging into it. You realize this test is actually flaky."
When Google, with all its engineering resources and tooling sophistication, calls flaky tests a major productivity drain, you know this is a universal challenge.
After eight months of building and testing with our developer community, Trunk Flaky Tests is now out of beta. What started as a solution to our own CI frustrations has become the tool that companies like Zillow, Metabase, and others rely on to keep their development velocity high.
Why flaky tests matter more than you think
Our analysis of billions (yes, billions, with a “B”) of test uploads revealed some eye-opening patterns:
Teams underestimate the real cost of flaky tests: The real cost isn't CI compute but engineering time. When a single flaky test contaminates your test suite, the entire pipeline needs to rerun, and a developer needs to investigate whether it was their code or just noise. That's 15-30 minutes per incident, multiplied across every PR, every day.
It turns engineers into test babysitters: Developers slowly realize a significant chunk of their day is spent babysitting tests instead of writing code, and team morale erodes. You hired engineers to ship features, not to play detective on flaky infrastructure.
The problem scales with your ambition: As teams write more end-to-end and integration tests, the tests that actually catch real-world bugs, flaky test rates increase dramatically. Better testing practices paradoxically make the problem worse.
Retrying wastes everyone's time: Automatic retries waste CI resources and developer patience. Nobody wants to wait an extra 15 minutes hoping the build goes green this time, especially when it might fail again on the next run.
Disabling tests creates a silent risk: Disabling flaky tests feels pragmatic until you realize you've quietly lost chunks of your test coverage and the bugs those tests would have caught start slipping through to production.
Manual triage creates hidden bottlenecks: Manual triage sounds responsible, but it accumulates burned engineering hours as someone has to play detective on every suspicious failure, time that scales poorly as your test suite grows.

Our approach: intelligent quarantine
Instead of choosing between unreliable CI and losing test coverage, Trunk Flaky Tests introduces a third option: intelligent quarantine.
Here's how it works:
Statistical detection that actually works
We analyze your test results over time to identify genuine patterns of unreliability:
Our detection algorithm considers context and patterns across multiple test runs
We distinguish between main branch failures (more concerning) and feature or working branch failures (expected during development)
Smart quarantine system
Once we detect a flaky test, here's the key insight: quarantined tests keep running, but their failures won't block your CI.
You stay in control. You can quarantine tests manually when you spot patterns or let our detection system flag them for you. Either way, the quarantine accomplishes what matters: keeping your pipeline reliable while maintaining full test coverage.
This means:
✅ Your CI pipeline stays green, PRs can merge and Merge Queues keep flowing
✅ You maintain test coverage and can still catch real bugs
✅ You get visibility into when flaky tests are fixed
✅ No more debates about which tests to disable
Works with your existing setup
Whether you're using Jest, PyTest, RSpec, or any other testing framework, and whether you're running on GitHub Actions, Jenkins, CircleCI, or another CI provider, Trunk Flaky Tests integrates in minutes:
1jobs:2 test:3 name: Upload Tests4 runs-on: ubuntu-latest5 steps:6 - name: Run Tests7 run: ...8 - name: Upload Test Results to Trunk.io9 if: "!cancelled()" # Upload the results even if the tests fail10 continue-on-error: true # don't fail this job if the upload fails11 uses: trunk-io/analytics-uploader@v112 with:13 junit-paths: "**/junit.xml"14 org-slug: <TRUNK_ORG_SLUG>15 token: ${{ secrets.TRUNK_TOKEN }}
Deep investigation tools
When you're ready to fix a flaky test, you get all the context you need:
Summaries of all the different ways the test fails
Stack traces from multiple failure instances
Failure rate trends and environmental correlations
Real results from real teams
Our beta users have seen meaningful improvements in their CI reliability and developer productivity:
“It's really nice to be able to go into the dashboard and see—oh, this has flaked on 15 different PRs today. This is actually impacting the organization and needs to be either quarantined or fixed quickly.”
"Trunk's Flaky Test Solution is so far the best one we've worked with, and we look forward to continuing to work with it."
"I primarily focused on the flaky tests tab and found all the information I sought. The app provided an excellent summary of our E2E pain points."
What's next
Our roadmap includes enhanced failure pattern recognition, more CI integrations, and advanced team collaboration features for distributed engineering organizations.
But today is about this milestone: every engineering team can now have the same sophisticated flaky test management that companies like Google and Meta build internally. Teams at Zillow, Metabase, BetterUp, Field Nation, Descript, and Brex are already using Trunk Flaky Tests to reclaim hundreds of engineering hours every quarter, and now your team can too.

Ready to give your team reliable CI?
🚀 Get started: app.trunk.io
📖 Documentation: docs.trunk.io/flaky-tests
💬 Community support: Slack workspace
We're also launching on Product Hunt today! If reliable CI matters to your team, we'd appreciate your support.
https://www.producthunt.com/products/trunk-io
About Trunk: We're building the developer experience platform that helps engineering teams ship faster while maintaining quality. Learn more at trunk.io.