Almost eight months ago we opened early access to our flaky tests solution to help companies find and neutralize their flaky tests. Since then we’ve processed 20.2 million uploads from our development partners. We are excited to announce today - the next phase in this product - as we open our Flaky Tests solution to the public.
After years of building tools targeting developer pain points, we are now tackling the most painful element of the SDLC (software development lifecycle) - flaky tests. Flaky tests are not only painful, they are productivity-sapping vampires. They waste developer time, slow down engineering velocity, and bring misery to your engineering team.
The traditional “solution” to address flakiness in an organization's test suite is to simply re-run the failing test job and pray that the flaky test this time around passes. This approach is classic engineering can-kicking. The problem with this path is that your engineers spend inordinate amounts of time rerunning tests to clear the CI gauntlet instead of working on the next important ticket.
The alternative to re-running flaky tests is to disable the offending test altogether. Of course, this comes with its own drawbacks - that test was written for a reason. Disabling a test creates new blindspots in your codebase; the result of commenting out tests is weaker overall test coverage and worse signal from your suite of tests. Not to mention the endless infighting that the push to disable a test in a repository can trigger. Even the flakiest test will find an internal champion that pushes back against its removal from the test suite.
Our North Star
When setting out to build this solution, we took an engineering-focused approach. That means that front and center is the measurement of the impact of flaky tests on your engineering organization. Flaky tests aren’t abstractly problematic - they are productivity drains because and only when they block your engineers from doing their work. If a test behaves erratically on an unmonitored hourly job - that might be cause for concern, but a flaky test failing a pull request that should otherwise be passing is a five-alarm fire.
Progress in an engineering project is directly tied to the creation and submission of pull requests and business-critical software requires reliable testing to ensure that those pull requests move the project forward with introducing regressions. Since every pull request cannot merge without passing tests - even a single flaky test in a system can cause massive disruption. That is why we built our flaky test solution around the core metric of the blocked PR. At the end of the day, the most important thing is that your pull requests are not being throttled by unrelated test failures.
Flaky Detection
You can’t dig your way out of a problem until you know how deep you are, so the first thing Flaky Tests does is monitor your test runs and aggregate their performance. Using machine learning and advanced heuristics, we can identify flaky tests across all your branches giving you a high-level view of the state of your codebase and the impact any individual test has on your engineering velocity.
From the test detail page, every reported execution of your tests is aggregated and categorized by source. That means you can quickly see how often a test fails on pull requests vs. your main branch and how often your engineers retry those flaky tests to overcome their non-reliability.
Our flaky test solution is agnostic to whatever CI provider or testing framework you use. Integration is as simple as adding a step to your CI workflows to securely upload your test results to our service for analysis.
Failure Fingerprinting
Using our proprietary ML pipeline, our solution analyzes and condenses the failure conditions for your tests. In simple terms, we can identify the unique classes of failure that a test exhibits. This is extremely important for your engineers to understand if a failure they are seeing is a newly introduced bug or a previously seen flaky result that can be safely ignored.
Since introducing failure fingerprinting, we have experienced firsthand the power it puts into engineers’ hands. Engineers don’t have to guess or collect failure patterns from peers over Slack or by reviewing endless CI logs to hunt for a pattern in the noise. Our test detail view empowers engineers to see just how often a test has failed with similar conditions and therefore make intelligent decisions when triaging failures.
Quarantining
Once a flaky test is identified, you can choose to quarantine it. Quarantined tests will still run in your testing flows but their failures won’t fail your CI jobs. We like to think of quarantining as allowing you to have your cake and eat it too. Rather than simply disabling a flaky test, quarantining lets you keep it in the test suite, and get signal from its output all without blocking your engineers from moving forward when it flakes on them.
Ticketing
One of the most requested features during our private beta was a way to track flaky tests through your existing ticketing system. As of today, you can now integrate Trunk Flaky Tests with your Jira instance to track your flaky tests and ensure they get fixed. In the coming months, we’ll also add support for automatic ticket creation, auto assignment via CODEOWNERS, and Github/Linear support.
Integrated into your existing workflow
No engineer is asking for a new dashboard to have to check, they already have more than enough to ignore, which is why we built flaky tests to integrate directly into your existing workflow. That is why we built a native flaky test report to bring your engineers the summary of what is broken in their pull request embedded into your GitHub workflow. No more diving through logs to find what broke in your pull request and no more guessing whether a test is known to be flaky. It’s like having a magic decoder ring for your CI logs.
Join the Public Beta
More important than processing 20.2 million test uploads we’ve also identified thousands of flaky tests in our development partners' code bases (and more than a handful in our own codebase). We’d like to take a moment to thank them for all they contributed to make this offering what it is today.
We made this product because we are engineers first, and we’ve felt the pain of flaky tests first-hand. If you search through our Slack history you’ll find a lot of “Is this test flaky?” - at least you used to. Now everyone knows what is flaky in our system and we can skip all that slack chatter. We’re excited to open this product up to a larger audience and help your engineering team conquer your flaky tests.