The Definitive Guide to Eliminating Flaky Tests in Automated Testing

By The Trunk TeamFebruary 21, 2025

Automated testing has become an essential part of modern software development, ensuring that applications function as intended and meet required quality standards. However, the effectiveness of automated testing heavily relies on the quality and reliability of the test cases being executed.

Crafting effective test cases for automation requires a deep understanding of the software under test, as well as the goals and objectives of the testing process. It involves identifying the right scenarios to automate, designing test cases that cover critical functionalities, and ensuring that the automated tests are maintainable and scalable.

In this comprehensive guide, we will explore the best practices and techniques for writing effective automated test cases, with a particular focus on addressing one of the most persistent challenges in testing: flaky tests.

What are Flaky Tests?

In the realm of automated testing, flaky tests are the bane of every developer and QA engineer's existence. Flaky tests are automated tests that exhibit inconsistent behavior—sometimes they pass, and sometimes they fail, without any apparent changes to the code or the testing environment.

Unlike stable tests, which consistently produce the same results when given the same input, flaky tests introduce an element of unpredictability into the testing process. They can fail intermittently, even when the code being tested is functioning correctly, leading to false positives and eroding confidence in the test suite.

Addressing flaky tests is crucial to ensure the effectiveness and reliability of automated testing. Flaky tests can waste valuable time and resources, as developers spend hours investigating and debugging tests that fail due to non-deterministic factors rather than actual defects in the code. They can also mask real issues, allowing bugs to slip through undetected, and undermining the very purpose of automated testing.

Moreover, flaky tests can have a significant impact on the development process as a whole. They can slow down continuous integration and deployment pipelines, as teams may hesitate to merge code or release new features until the flaky tests are resolved. This can lead to delays, reduced productivity, and a general lack of confidence in the software being delivered.

Common Causes of Flaky Tests

Identifying the underlying causes of flaky tests is vital for implementing robust prevention measures. A primary contributor is the mismanagement of asynchronous operations and race conditions, which occur when processes execute in an unpredictable sequence due to insufficient synchronization.

For example, consider this flaky test in JavaScript that doesn't properly handle asynchronous operations:

1// Flaky test example with race condition
2test('user profile is updated after save', () => {
3 const user = new User();
4 user.save();
5 expect(user.isSaved).toBe(true); // May fail if save() hasn't completed
6});

The problem here is that save() might be asynchronous, but the test doesn't wait for it to complete before checking the result. A more reliable approach would be:

1// Fixed test with proper async handling
2test('user profile is updated after save', async () => {
3 const user = new User();
4 await user.save();
5 expect(user.isSaved).toBe(true); // Will now wait for save() to complete
6});

External Dependencies and Resource Management

Another major source of flakiness arises from reliance on external resources, such as third-party services or network connections. These dependencies introduce variables that are outside the control of the testing environment.

Consider this Python test that directly calls an external API:

1# Flaky test dependent on external service
2def test_user_data_retrieval():
3 response = requests.get('https://api.example.com/users/1')
4 assert response.status_code == 200
5 assert 'name' in response.json()

This test could fail due to network issues, API downtime, or rate limiting. A more stable approach uses mocking:

1# Stable test with mocked external dependency
2@patch('requests.get')
3def test_user_data_retrieval(mock_get):
4 mock_response = MagicMock()
5 mock_response.status_code = 200
6 mock_response.json.return_value = {'name': 'Test User'}
7 mock_get.return_value = mock_response
8
9 response = requests.get('https://api.example.com/users/1')
10 assert response.status_code == 200
11 assert 'name' in response.json()

Test Isolation and State Management

Test isolation failures often lead to non-deterministic behavior within test suites. Shared states or resources, if not properly managed, create hidden dependencies that can cause flakiness. Common scenarios include:

  • Database Modifications: Inadequate cleanup of shared database entries between test runs

  • Global State: Persistent static variables that overlap across tests

  • File System Usage: Incomplete cleanup of files altered during tests

  • Session Data: Residual browser cookies or cached data impacting subsequent tests

For example, this Java test might be flaky if another test modifies the same database record:

1// Potentially flaky test without proper isolation
2@Test
3public void testUserUpdate() {
4 User user = userRepository.findById(1);
5 user.setName("New Name");
6 userRepository.save(user);
7
8 User updated = userRepository.findById(1);
9 assertEquals("New Name", updated.getName());
10}

A better approach would create and clean up test data:

1// Improved test with better isolation
2@Test
3public void testUserUpdate() {
4 // Create test data
5 User user = new User();
6 user.setName("Original Name");
7 userRepository.save(user);
8
9 // Test logic
10 user.setName("New Name");
11 userRepository.save(user);
12
13 User updated = userRepository.findById(user.getId());
14 assertEquals("New Name", updated.getName());
15
16 // Clean up
17 userRepository.delete(user);
18}

The Impact of Flaky Tests

Flaky tests introduce a series of challenges that can undermine the integrity and reliability of software testing efforts. When developers encounter inconsistent test outcomes, it leads to a gradual decline in confidence in the test suite. This skepticism often results in overlooking test results, potentially compromising the assurance that quality control measures are meant to provide.

Efficiency and Time Management

Flaky tests impose a significant drain on efficiency, as valuable time is diverted from productive development activities to address these erratic failures. Developers find themselves entangled in repeated cycles of diagnosing and rerunning tests that fail sporadically. This time-consuming process detracts from meaningful progress, causing delays in feature delivery and increasing the workload on already stretched development teams.

Risk of Overlooked Issues

The presence of flaky tests can also obscure genuine testing outcomes, presenting a risk of overlooking critical software defects. As teams become desensitized to frequent test failures, there's a tendency to dismiss potentially valid failures as mere anomalies. This creates a window through which undetected defects might pass, posing significant risk in high-stakes environments where software reliability is paramount.

Furthermore, flaky tests can hinder the seamless operation of continuous integration pipelines. Their unpredictability can trigger unnecessary rebuilds and deployments, slowing down the entire development cycle. To maintain a streamlined workflow, it's crucial to implement strategies that address the root causes of flakiness and ensure a smooth, efficient CI/CD process.

How to Identify Flaky Tests

Uncovering flaky tests requires a strategic approach that combines vigilant monitoring with intelligent diagnostic tools. The initial step involves setting up a robust system to track test executions across various environments and configurations.

Recognizing Patterns and Anomalies

A detailed analysis of test results is essential for pinpointing flaky tests. Focus on identifying anomalies such as tests that only fail under specific conditions or those that demonstrate erratic behavior during particular test stages.

Here's an example of how you might use data analysis to detect flaky tests with Python:

1# Script to identify potentially flaky tests
2import pandas as pd
3
4# Load test results data
5results = pd.read_csv('test_execution_history.csv')
6
7# Group by test name and calculate failure rate
8test_stats = results.groupby('test_name').agg({
9 'result': ['count', lambda x: (x == 'FAIL').sum() / len(x) * 100]
10})
11
12test_stats.columns = ['total_runs', 'failure_percentage']
13
14# Filter tests that have run at least 10 times and have a failure rate between 1% and 99%
15potentially_flaky = test_stats[(test_stats['total_runs'] >= 10) &
16 (test_stats['failure_percentage'] > 1) &
17 (test_stats['failure_percentage'] < 99)]
18
19print("Potentially flaky tests:")
20print(potentially_flaky.sort_values('failure_percentage', ascending=False))

Key Indicators and Diagnostic Metrics

To effectively pinpoint flaky tests, several critical metrics should be monitored:

  • Inconsistency Frequency: Monitor how often a test's results deviate from expected outcomes without corresponding code changes.

  • Execution Discrepancies: Look for tests with notable variations in execution times, which may indicate underlying issues.

  • Resource Sensitivity: Identify tests that react differently under varying system loads or resource availability.

  • Conditional Failures: Detect tests that fail only when interacting with specific external systems or under particular network conditions.

Automated testing platforms offer sophisticated tools for detecting flaky tests by continuously evaluating these metrics and providing insights into test performance.

Techniques for Fixing Flaky Tests

Resolving flaky tests demands a systematic approach focused on addressing core instability factors. Enhancing synchronization involves leveraging condition-based waits utilizing event-driven models.

For example, in browser testing with Selenium, avoid fixed time waits:

1// Problematic approach with fixed wait
2function clickSubmitButton() {
3 // Hard-coded wait that might be too short or unnecessarily long
4 browser.sleep(5000);
5 element(by.id('submit')).click();
6}

Instead, use explicit waits that respond to state changes:

1// Better approach with explicit wait
2function clickSubmitButton() {
3 const EC = protractor.ExpectedConditions;
4 const submitButton = element(by.id('submit'));
5
6 // Wait for specific condition (button is clickable)
7 browser.wait(EC.elementToBeClickable(submitButton), 10000);
8 submitButton.click();
9}

Optimizing Test Environment Configuration

Stabilizing the test environment requires a shift from traditional dependency management to virtualization techniques. Containerizing services and using service virtualization tools can create isolated, consistent test conditions.

For Docker, you might define a test-specific environment:

1# docker-compose.test.yml
2version: '3'
3services:
4 app:
5 build: .
6 depends_on:
7 - db
8 - redis
9 environment:
10 - NODE_ENV=test
11 - DB_HOST=db
12 - REDIS_HOST=redis
13
14 db:
15 image: postgres:13
16 environment:
17 - POSTGRES_USER=test
18 - POSTGRES_PASSWORD=test
19 - POSTGRES_DB=testdb
20
21 redis:
22 image: redis:6

Code Structure and Maintenance

The architecture of test code plays a crucial role in maintaining stability. Consider these structural improvements:

  • Comprehensive Isolation: Utilize namespace isolation to ensure that each test operates within its own environment

  • Resource Allocation: Implement dynamic resource allocation to manage system resources effectively

  • Data Consistency: Utilize version-controlled datasets to guarantee consistent test data

  • Atomic Operations: Structure tests to perform discrete, independent operations

Here's an example of a well-structured test class in Java with JUnit 5:

1@TestInstance(TestInstance.Lifecycle.PER_METHOD)
2public class UserServiceTest {
3
4 private UserRepository userRepository;
5 private UserService userService;
6 private TransactionManager txManager;
7
8 @BeforeEach
9 void setup() {
10 // Fresh instances for each test
11 userRepository = mock(UserRepository.class);
12 txManager = new TransactionManager();
13 userService = new UserService(userRepository, txManager);
14 }
15
16 @Test
17 void createUser_withValidData_createsNewUser() {
18 // Arrange
19 UserDto userDto = new UserDto("test@example.com", "Test User");
20 when(userRepository.findByEmail("test@example.com")).thenReturn(Optional.empty());
21
22 // Act
23 User result = userService.createUser(userDto);
24
25 // Assert
26 assertNotNull(result);
27 assertEquals("Test User", result.getName());
28 verify(userRepository).save(any(User.class));
29 }
30
31 @AfterEach
32 void cleanup() {
33 // Explicit cleanup after each test
34 txManager.rollbackAnyActive();
35 }
36}

Best Practices for Preventing Flaky Tests

Ensuring robustness in automated tests starts with adopting principles that emphasize clarity and precision. Writing test cases that are independent and clearly defined helps in minimizing dependencies and potential points of failure.

Selecting Robust Frameworks

The choice of a robust automation framework is pivotal in ensuring the reliability of test executions. Frameworks equipped with advanced synchronization and environment control features help manage complexities inherent in modern applications.

Continuous Integration Strategies

A structured continuous integration strategy integrates several practices to enhance test resilience:

1# Example GitHub Actions workflow with flaky test handling
2name: Test with Flaky Test Detection
3
4on: [push, pull_request]
5
6jobs:
7 test:
8 runs-on: ubuntu-latest
9 steps:
10 - uses: actions/checkout@v3
11
12 - name: Set up Node.js
13 uses: actions/setup-node@v3
14 with:
15 node-version: '16'
16
17 - name: Install dependencies
18 run: npm ci
19
20 - name: Run tests with retry for flaky tests
21 run: |
22 npx jest --runInBand --detectOpenHandles --forceExit \
23 --retry 2 --maxConcurrency 1 \
24 --logHeapUsage --testTimeout 10000
25
26 - name: Generate flaky test report
27 if: always()
28 run: node scripts/generate-flaky-report.js
29
30 - name: Upload test results
31 if: always()
32 uses: actions/upload-artifact@v3
33 with:
34 name: test-results
35 path: test-results/

Routine test suite maintenance is crucial for sustaining test effectiveness. This involves conducting regular audits to eliminate redundant or obsolete tests, updating test scenarios to align with current system states, and refining test data sets for accuracy.

Dealing with Flaky Tests in CI/CD Pipelines

Effectively managing flaky tests within CI/CD pipelines requires strategic interventions that ensure both continuous delivery and software integrity. One approach is to use tools like Trunk to automatically detect and manage flaky tests.

Adaptive Retry Mechanisms

Deploying adaptive retry mechanisms enhances the reliability of CI/CD workflows. Here's an example configuration for Jest that implements reasonable retry logic:

1// jest.config.js
2module.exports = {
3 // Basic configuration
4 testEnvironment: "node",
5 testTimeout: 10000,
6
7 // Retry configuration
8 retry: 2,
9
10 // Custom reporters to track flaky tests
11 reporters: [
12 "default",
13 ["./node_modules/jest-junit", {
14 outputDirectory: "./test-results",
15 outputName: "junit.xml"
16 }],
17 ["./custom-reporters/flaky-test-reporter.js"]
18 ],
19
20 // Separate potentially flaky tests into their own group
21 projects: [
22 {
23 displayName: "stable",
24 testMatch: ["<rootDir>/tests/**/*.test.js"],
25 testPathIgnorePatterns: ["<rootDir>/tests/flaky/"]
26 },
27 {
28 displayName: "flaky",
29 testMatch: ["<rootDir>/tests/flaky/**/*.test.js"],
30 retry: 3 // More retries for known flaky tests
31 }
32 ]
33};

Comprehensive Test Oversight

Effective oversight of flaky tests in CI/CD environments involves several integral approaches:

  • Impact Assessment: Regularly evaluate the business impact of flaky tests on critical features

  • Automated Diagnostics: Utilize cutting-edge diagnostic tools to automatically classify failure types

  • Resource Utilization Analysis: Continuously monitor system resources during test execution

  • Simulation of External Dependencies: Implement advanced simulation techniques for consistent test conditions

To ensure sustained improvement, it's essential to set and track specific goals for reducing flakiness across the CI/CD pipeline.

Conclusion

As you embark on your journey to create effective automated test cases and combat flakiness, remember that the key lies in adopting a proactive and strategic approach. By leveraging the right tools, techniques, and best practices, you can significantly enhance the reliability and efficiency of your testing processes.

If you're looking for a comprehensive solution to streamline your testing efforts and minimize flaky tests, we invite you to check out our docs. Our platform offers advanced features and insights to help you navigate the complexities of automated testing and ensure the highest quality of your software. Join us in revolutionizing the way you test and deliver exceptional products to your users.

Try it yourself or
request a demo

Get started for free

Try it yourself or
Request a Demo

Free for first 5 users