How to Determine if a Test is Flaky
Flaky tests can frustrate you because they sometimes pass and other times fail without any code changes. Understanding how to reproduce a flaky test is crucial for maintaining a reliable test suite. Let's explore how to identify if a test is flaky and what tools and techniques can help you.
Criteria for Identifying Flaky Tests
A test is flaky when it produces inconsistent results. It passes sometimes and fails other times without any changes in the code.
Multiple Instances: You need to see the test fail and pass multiple times under the same conditions.
Inconsistent Results: If you rerun the test and get different outcomes each time, it's likely flaky.
Differentiating Between Legitimate Test Failures and Flaky Behavior
Not every test failure means the test is flaky. Sometimes, the test is just identifying a real issue in the code.
Consistent Failures: If a test consistently fails with the same error, the problem is likely in the code.
Random Failures: When a test fails randomly, especially in different environments, it's probably flaky.
To distinguish between these, rerun the tests several times. Consistent failures point to actual issues, while random failures suggest flakiness.
Tools and Techniques for Tracking Test Flakiness Over Time
Tracking test flakiness helps you understand patterns and identify flaky tests more effectively.
CI Tools: Continuous Integration (CI) tools like Jenkins or CircleCI can run tests multiple times automatically.
Test Analytics: Use analytics tools to monitor test results over time. This can help you see which tests fail intermittently.
Running tests repeatedly in different environments can also help identify flakiness. If a test fails on CI but not locally, it's a sign of flakiness.
The Importance of Documenting Test Failures for Analysis
Documenting when and how tests fail is essential for diagnosing flaky tests.
Error Logs: Keep detailed logs of test failures, including error messages and the environment in which they occurred.
Failure Reports: Create reports summarizing test failures over time. This makes it easier to spot patterns.
Documenting failures helps you go back and analyze why a test failed, providing clues to whether it's flaky or not. This information is invaluable when you try to reproduce a flaky test.
Diagnosing Flaky Tests
Understanding how to reproduce a flaky test is just the beginning. Once you identify a flaky test, you need to diagnose the root cause. Let's look at some common tactics and advanced techniques for diagnosing flaky tests.
Common Diagnosis Tactics
Reviewing Error Messages for Initial Clues: Error messages often provide the first hint about what's going wrong. Look for specific details in the error message. For instance, does it mention a timeout? A dependency failure? These clues can point you in the right direction.
Inspecting Test Code for Complexity and Potential Flakiness Causes: Complex test code is more likely to be flaky. Examine the test for long setup procedures, multiple dependencies, or any randomness. Simplify the test where possible to reduce the chances of flakiness.
Checking Application Code for Sources of Non-Determinism: Flaky tests often stem from non-deterministic behavior in the application code. Look for code that relies on timing, external systems, or random number generation. These are common sources of flakiness.
Importance of Understanding the Test and Application Code Context: To diagnose a flaky test, you must understand what the test is doing and how it interacts with the application code. This means reading both the test and the relevant parts of the application code carefully. Knowing the context helps you spot potential issues more easily.
Advanced Diagnosis Techniques
Adding Diagnostic Information: Sometimes, you need more information to understand why a test is flaky. Add print statements or exceptions to the test or the application code to log additional details when the test runs. This can help you pinpoint where things go wrong.
Using Binary Search Debugging to Locate the Source of the Issue: Binary search debugging is a powerful technique. It involves systematically narrowing down the code to find the exact point where the problem occurs. Start by disabling half of the test or code and see if the problem persists. If it does, disable half of the remaining code, and so on, until you isolate the issue.
Investigating Tests That Run Before the Flaky Test for Leaked State: Sometimes, a test fails because of state left over from a previous test. This is known as "leaked state." Check the tests that run before the flaky test. Look for any shared resources or states that might not be reset properly between test runs.
These tactics and techniques can help you diagnose even the most elusive flaky tests. By systematically narrowing down the potential causes, you can identify and fix the root issues.
Taking Control of Testing
Taking control of flaky tests starts with reliable detection and prevention. Trunk is building a tool to conquer flaky tests once and for all. You’ll get all of the features of the big guy's internal systems without the headache of managing it. With Trunk Flaky Tests, you’ll be able to:
Autodetect the flaky tests in your build system
See them in a dashboard across all your repos
Quarantine tests with one click or automatically
Get detailed stats to target the root cause of the problem
Get reports weekly, nightly, or instantly sent right to email and Slack
Intelligently file tickets to the right engineer
If you’re interested in getting beta access, sign up here.