What is an example of a Flaky Test?
Time-Dependent Issue
You might wonder, "What is an example of a flaky test?" Flaky tests are tests that sometimes pass and sometimes fail, even when there haven't been any changes to the code. This inconsistency can create problems for developers who rely on these tests to ensure their code works correctly. Let's look at a real-world example involving a time-dependent issue.
Case Study Description: Statistics Computation Failing at the Beginning of Each Month
In one project, developers noticed that their continuous integration (CI) system broke during the first five days of each month. The culprit was a test for computing statistics. Specifically, the test would fail because the statistics were computed based on the current date. This issue persisted for years because the problem was deemed low-priority.
Root Cause: Date-Related Input Causing Inconsistency
The root cause of this flaky test stemmed from date-related inputs. The test gave multiple dates as input, including some dates relative to the current date. For example, the test would include a date five days before the current date. At the beginning of each month, this setup led the test to fail because the calculation mistakenly grouped data into the wrong month.
Initial Attempts to Fix: Dependency Injection and Sandboxing
To fix this issue, the team first considered dependency injection. This would allow the compute_stats function to ask for a specific month rather than relying on the current date. However, implementing this change proved difficult due to the large amount of code dependent on this feature.
Another approach involved sandboxing the test execution to control the current date. This method would ensure that the test always used a fixed date, eliminating the variability caused by the real current date. However, the project used Arrow for date manipulation, which made it hard to apply common sandboxing tools.
Final Solution: Using a Central Method and Patch Decorator to Control the Current Date
Fortunately, the project had a central method for providing the current date, initially designed to manage time zones. By leveraging this method and using Python's patch decorator from the unittest.mock library, the team solved the issue.
1@patch('project.module.get_current_date')2def test_compute_stats(mock_get_date):3 mock_get_date.return_value = arrow.get('2023-01-01')4 # Add data and test as usual5 assert compute_stats() == expected_value
By patching the method that provides the current date, the test could run with a consistent date each time. This ensured the test's reliability, regardless of when it was executed.
What is an example of a Flaky Test: Global State Interference
Flaky tests aren't just about time-dependent issues. They can also arise from interference in the global state. Let's explore another real-world example to better understand this.
Case Study Description: New Feature Causing Unrelated Test Failures
In another project, developers added a new feature, only to find that it broke tests unrelated to their changes. Imagine working hard on a new feature, running your tests, and then seeing failures in parts of the code you didn't touch. Frustrating, right? That's exactly what happened here.
Root Cause: Global Configuration State Being Modified
The root cause lay in a global configuration state. This global state was shared across multiple tests and features. When the new feature modified this global state, it inadvertently caused other tests to fail. The existing tests depended on this global state being in a particular condition, and the new feature disrupted that condition.
Initial Investigation: Running Tests in Isolation to Detect the Issue
To identify the problem, the team began by running the tests in isolation. This step was crucial. By running each test separately, they could see which tests passed and which failed without interference from other tests. The isolated testing revealed that the existing tests only failed when run after the new feature's tests. This pointed to the global state being altered by the new tests.
Final Solution: Clearing Global State in the Test Suite Superclass to Ensure Clean State
To address this, the team decided to clear the global state in the test suite superclass. This meant that every test would start with a clean slate, eliminating any unintended dependencies on the global state.
Here's how they did it:
SetUp Method: Ensure each test initializes the necessary state.
class BaseTest(unittest.TestCase):
def setUp(self):
global config
config = {}TearDown Method: Clear the global state after each test.
class BaseTest(unittest.TestCase):
def tearDown(self):
global config
config.clear()Inheritance: Make all test cases inherit from this base test class.
class NewFeatureTestCase(BaseTest):
def test_new_feature(self):
self.assertTrue(our_new_feature())
By incorporating these steps, the team ensured that each test would run in a clean environment, free from any residual state left by previous tests. This method fixed the flaky tests and prevented similar issues in the future.
How to Fix Flaky Tests?
Flaky tests can be a developer's nightmare. They disrupt the workflow and make it hard to trust your test results. Fixing them involves several strategies. Let's dive into these methods to ensure your tests run smoothly.
Controlling the Test Environment: Ensuring Reproducible Execution
One of the most effective ways to fix flaky tests is by controlling the test environment. A controlled test environment ensures that tests produce the same results every time they run. This involves:
Consistent Test Data: Use the same dataset for each test run. Avoid relying on live data which can change over time.
Stable Network Conditions: If your tests depend on network requests, simulate network conditions to avoid variability.
Fixed System State: Ensure the system state (like file system or database state) is reset before each test. This can be done using setup and teardown methods.
Avoiding Global States: Minimizing Side Effects from Shared State
Global states can cause tests to interfere with each other. Avoiding global states helps ensure that one test doesn't affect another. Here’s how you can minimize the impact of global states:
Local Variables: Use local variables within your tests instead of global ones.
Dependency Injection: Pass dependencies directly to functions or classes, rather than relying on global variables.
Test Isolation: Ensure each test initializes its own state and cleans up afterward. This can be done using setup and teardown methods in your test framework.
def setUp(self):
self.local_state = {}
def tearDown(self):
self.local_state.clear()
Using Mocking and Stubbing: Isolating Tests from External Dependencies
External dependencies can introduce variability in your tests. Mocking and stubbing help isolate tests from these dependencies.
Mocking: Replace external services or components with mock objects that simulate their behavior. For example, use mock libraries like unittest.mock in Python to simulate database calls or API requests.
from unittest.mock import Mock
mock_service = Mock()
mock_service.some_method.return_value = "expected result"Stubbing: Provide predefined responses for certain calls. This is useful for methods that return complex objects or interact with external systems.
def stub_method():
return "stubbed response"
Randomizing Test Execution: Detecting Hidden Dependencies Between Tests
Running tests in a random order can help detect hidden dependencies between them. If a test passes when run alone but fails when run after another test, there's likely a hidden dependency.
Random Test Order: Use tools to randomize the order of test execution. For instance, pytest-randomly can shuffle your tests each time you run them.
pytest --random-order
Independent Tests: Ensure each test is independent. This means no test should rely on the outcome or state of another test.
By implementing these methods, you can significantly reduce the flakiness in your tests. Each strategy addresses a different aspect of test reliability, ensuring your test suite remains robust and trustworthy.
Taking Control of Testing
Taking control of flaky tests starts with reliable detection and prevention. Trunk is building a tool to conquer flaky tests once and for all. You’ll get all of the features of the big guy's internal systems without the headache of managing it. With Trunk Flaky Tests, you’ll be able to:
Autodetect the flaky tests in your build system
See them in a dashboard across all your repos
Quarantine tests with one click or automatically
Get detailed stats to target the root cause of the problem
Get reports weekly, nightly, or instantly sent right to email and Slack
Intelligently file tickets to the right engineer
If you’re interested in getting beta access, sign up here.