Back to all articles

Managing Flaky Tests at Scale: Detection and Quarantining

By The Trunk TeamAugust 27, 2024
testing


Flaky test detection identifies tests that produce inconsistent results without changes to the code or test environment. Imagine a test that passes one moment and fails the next, even though nothing has changed. This inconsistency makes it hard to trust the test results.

Detecting flaky tests is crucial to maintaining a reliable test suite. When you can't rely on your tests, it undermines the whole testing process. Developers and testers begin to doubt the accuracy of every test, not just the flaky ones. This doubt leads to wasted time, as they must manually check whether failures are due to actual bugs or just test flakiness.

Flaky tests have a significant impact on software development processes. They slow down development because of the extra time spent rerunning tests and troubleshooting failures. Flaky tests can also cause delays in continuous integration (CI) and continuous deployment (CD) pipelines, resulting in slower release cycles. Worse, they can mask real issues, letting bugs slip into production because the test failures are dismissed as flakiness.

Examples of flaky test scenarios include:

  1. Concurrency Issues: Tests that fail when run in parallel due to interference between tests.

  2. External Dependencies: Tests that rely on third-party services or APIs, which may not always behave consistently.

  3. Timing Problems: Tests that fail due to incorrect timing assumptions or inadequate wait conditions.

  4. Non-deterministic Behavior: Tests using random data that change between runs.

  5. Test Environment Instability: Tests that fail in one environment but pass in another due to differences in software versions or configurations.

How to Identify Flaky Tests

Repeat Test Execution

One straightforward method to identify flaky tests involves repeatedly running the same tests under identical conditions. By doing so, you check for variability in test results. Imagine running a test ten times without changing any code or environment settings. If it passes half the time and fails the other half, you have a flaky test on your hands.

Why is this important? Consistency in test results is key for reliable detection. If a test produces different outcomes under the same conditions, it becomes difficult to trust any results from that test. This inconsistency makes it hard to pinpoint whether a failure is due to an actual bug or just the test itself being unreliable.

Use Specialized Tools

Specialized tools and plugins can make flaky test detection easier. These tools rerun tests and track their results over time, helping you identify which tests are flaky. 

  • Examples of Tools:

  • JUnit Flaky Test Handler: Automatically retries failed tests to distinguish between flaky and consistently failing ones.

  • Pytest-rerunfailures: A plugin for Python's pytest that reruns failed tests to spot flakiness.

  • TestNG: Offers built-in support for rerunning failed tests, assisting in identifying flaky tests.   Continuous Integration (CI) Systems:

  • Jenkins: Has plugins like the "Flaky Test Handler" to manage flaky tests within the CI pipeline.

  • GitLab CI/CD: Provides analytics that help identify patterns of test flakiness across multiple runs.

  • Buildkite: Allows automatic retrying of flaky tests with detailed test reports.

Test History Analysis

Reviewing historical data from test executions can also help identify flaky tests. By analyzing past test results, you can spot intermittent failures that may indicate flakiness.

  • Patterns to Look For:

  • Inconsistent Results: Look for tests that alternate between passing and failing without changes in code or environment.

  • Dependency Failures: Identify tests that fail when external services or APIs are unstable.

  • Timing Issues: Spot tests that fail due to incorrect timing assumptions or inadequate wait conditions.

By systematically reviewing test history, you can pinpoint tests that often fail under certain conditions, helping you address and fix the root causes of flakiness.

Common Causes of Flaky Tests

Identifying and understanding the common causes of flaky tests is crucial for effective flaky test detection. Here are some major reasons why tests might become flaky:

Concurrency Issues

Concurrency issues occur when parallel tests interfere with each other. For instance, two tests might try to access or modify the same resource simultaneously. This can lead to unpredictable outcomes.

Example: If Test A and Test B both need to write to the same database table at the same time, one might succeed while the other fails, leading to flakiness.

Solution: To avoid this, ensure tests run in isolation or implement proper synchronization mechanisms.

External Dependencies

Tests relying on external systems can become flaky due to the variability in those systems. Third-party services, APIs, or databases may not always behave consistently.

Example: A test that depends on a third-party API might fail if the API is down or slow to respond.

Solution: Mock or stub external dependencies during testing to minimize reliance on external systems.

Timing and Synchronization Problems

Timing and synchronization issues often occur when tests assume certain operations will complete within a fixed time. If the actual time differs, the test might fail.

Example: A test might expect a web page to load in 5 seconds, but due to network delays, it takes 10 seconds, causing the test to fail.

Solution: Use dynamic waits or polling mechanisms instead of fixed wait times to handle timing variability.

Non-deterministic Behavior

Non-deterministic behavior happens when tests use random data or depend on system states that change between runs. This unpredictability can lead to different outcomes each time the test runs.

Example: Using random numbers or dates in tests can lead to inconsistent results if not controlled.

Solution: Use fixed seed values for random data or ensure system states are consistent before running tests.

Test Environment Instability

Differences in software versions, configurations, or available resources can cause tests to behave differently. This instability can make tests flaky.

Example: A test might pass on a developer's local machine but fail in the CI environment due to different software versions.

Solution: Standardize test environments using containerization tools like Docker to ensure consistency.

By addressing these common causes, you can significantly reduce the number of flaky tests in your test suite, leading to more reliable and robust testing processes.

How to Analyze Test Failures for Flakiness

Proper flaky test detection requires a thorough analysis of test failures. Here are some steps to help you identify and analyze flaky tests:

Isolating the Test

Running tests in isolation can help determine if a test consistently produces the same result. 

  1. Run the test independently: Execute the test alone without other tests running simultaneously.

  2. Multiple runs: Repeat the test multiple times under the same conditions to check for consistent outcomes.

Takeaway: Isolated tests can help identify if failures are due to test flakiness or interference from other tests.

Reviewing Logs and Outputs

Detailed logs and outputs provide valuable insights into why tests fail intermittently.

  • Log examination: Check the logs for error messages and patterns.

  • Output analysis: Look for specific conditions that occur when the test fails.

Takeaway: Consistent patterns or specific error messages in logs can indicate the root cause of flakiness.

Checking for External Dependencies

External dependencies can cause tests to fail unpredictably. Verifying their stability is crucial.

  • Dependency verification: Ensure all external systems, APIs, and databases are stable and available.

  • Mock external systems: Use mocks or stubs to isolate tests from external dependencies.

Takeaway: Stable external systems reduce variability in test results.

Evaluating Timing and Synchronization

Timing issues often cause flaky tests. Introducing flexible wait conditions can help stabilize tests.

  • Dynamic waits: Use waits that respond to specific conditions rather than fixed time periods.

  • Synchronization mechanisms: Implement methods to ensure operations complete in the expected order.

Takeaway: Adjusting timing mechanisms can prevent failures due to synchronization issues.

Comparing Environments

Running tests in different environments can reveal environment-specific issues.

  • Multiple environments: Execute the same tests across various environments (e.g., local, staging, production).

  • Environment consistency: Ensure configurations and software versions are consistent across environments.

Takeaway: Identifying environment-specific failures helps in standardizing test environments.

By following these steps, you can effectively analyze test failures and pinpoint the causes of flakiness. This approach leads to a more reliable and robust test suite, enhancing overall software quality.

Preventing Flaky Tests

Preventing flaky tests is essential for maintaining a reliable test suite. Here are some strategies to help you avoid introducing flaky tests into your testing process:

Isolate Tests

Isolating tests ensures that each test does not depend on the output or side effects of another test.

  1. Independent execution: Each test should run independently, without relying on or affecting other tests.

  2. Unique data: Use unique data sets for each test to avoid conflicts.

Takeaway: Isolated tests eliminate interference from other tests, leading to more reliable outcomes.

Make Tests Hermetic

Hermetic tests are self-contained and isolated from external influences and environmental variations.

  • Self-contained environments: Create isolated environments for each test.

  • Controlled dependencies: Use mocks and stubs to control external dependencies.

Takeaway: Hermetic tests produce consistent results regardless of external factors.

Avoid Hardcoded Timeouts

Hardcoded timeouts can lead to test flakiness, especially in varying execution environments.

  1. Dynamic waits: Implement waits that respond to specific conditions, such as the presence of an element or the completion of a task.

  2. Condition-based checks: Use condition-based checks instead of fixed time periods to ensure the necessary conditions are met before proceeding.

Takeaway: Dynamic waits adapt to varying conditions, reducing test failures due to timing issues.

Ensure Test Environment Stability

Maintaining consistent test environments is crucial for preventing flaky tests.

  • Containerization: Use containerization tools like Docker to create consistent test environments.

  • Virtualization: Employ virtualization to replicate identical test environments across different machines.

Takeaway: Consistent environments minimize variability, leading to more stable test results.

Employ Deterministic Inputs

Using consistent and predictable input values helps ensure stable test results.

  1. Fixed inputs: Use fixed, known input values for tests to produce predictable outcomes.

  2. Avoid randomness: Avoid using random or variable data that can introduce unpredictability into tests.

Takeaway: Deterministic inputs ensure that tests yield consistent results every time they are executed.

By implementing these strategies, you can significantly reduce the likelihood of flaky tests, leading to a more reliable and efficient testing process.

Taking Control of Testing

Taking control of flaky tests starts with reliable detection and prevention. Trunk is building a tool to conquer flaky tests once and for all. You’ll get all of the features of the big guy's internal systems without the headache of managing it. With Trunk Flaky Tests, you’ll be able to:

  • Autodetect the flaky tests in your build system

  • See them in a dashboard across all your repos

  • Quarantine tests with one click or automatically

  • Get detailed stats to target the root cause of the problem

  • Get reports weekly, nightly, or instantly sent right to email and Slack

  • Intelligently file tickets to the right engineer

If you’re interested in getting beta access, sign up here.

Introduction to Merge Queues: What You Need to Know

In software development, managing code changes can get tricky. Merge queues help with this process. They organize and streamline merging code changes into the main project. Here’s what you need to know about them.

What Are Merge Queues?

Definition and Purpose of Merge Queues

A merge queue is a tool that automates the merging of pull requests into a main branch. Its main purpose: ensure the main branch stays stable and passes all tests. Merge queues help avoid breaking changes that could disrupt the entire project.

How Merge Queues Manage Pull Request Order

Merge queues use a first-in-first-out (FIFO) system to manage pull requests. Here's how it works:

  1. Queueing: When you add a pull request, the merge queue places it in line.

  2. Temporary Branches: The queue creates temporary branches to test changes.

  3. Validation: It checks if the pull request can merge without conflicts.

  4. Merging: If all checks pass, the request merges into the main branch.

Benefits of Using Merge Queues

Using merge queues offers several advantages:

  • Streamlined Development: Automates the merging process, saving time.

  • Reduced Conflicts: Ensures all code changes are compatible before merging.

  • Consistency: Keeps the main branch stable and functional.

Comparison to Traditional Merging Methods

Traditional merging methods often involve manually updating branches and running tests. This can lead to:

  • More Conflicts: Manual merges may not catch all issues.

  • Time-Consuming: Developers spend extra time resolving conflicts.

  • Instability: Main branch might break if incompatible changes merge.

In contrast, merge queues automate these tasks, offering a more efficient and reliable solution.

How Merge Queues Work

Key Mechanisms of Merge Queues

Merge queues operate using several key mechanisms. These mechanisms ensure that code merges smoothly and without issues.

1. First-in-First-out Order Maintaining:   Merge queues follow a first-in-first-out (FIFO) system. This means that the first pull request added to the queue will be the first one processed. This order helps keep the process fair and organized.

2. Creation of Temporary Branches for Validation:   When you add a pull request to the merge queue, it creates a temporary branch. This branch includes the changes from the pull request and the latest version of the main branch. The queue uses this branch to run tests and checks, ensuring that the new code doesn't break anything.

3. Grouping of Pull Requests with Merge_group:   Sometimes, the merge queue groups multiple pull requests together using a feature called merge_group. This means it tests several pull requests at once to see if they can all merge without issues. If they pass the tests, the merge queue merges them into the main branch as a group.

4. Ensuring Required Checks Are Satisfied Before Merging:   Before merging any pull request, the merge queue ensures that all required checks are satisfied. These checks might include automated tests, code reviews, or other validation steps. If any check fails, the pull request won't merge, and the queue will notify the developer to fix the issues.

Key Takeaways:

  • FIFO System: Keeps the merge process organized.

  • Temporary Branches: Used for testing changes safely.

  • Merge_group: Allows grouping and testing of multiple pull requests.

  • Required Checks: Ensures stable and functional code merges.

These mechanisms work together to maintain a smooth and efficient merging process, reducing the chances of conflicts and keeping the main branch stable.

Managing and Optimizing Merge Queues

To make the most of merge queues, you need to manage and optimize them effectively. This involves several steps and options that ensure your pull requests merge smoothly and efficiently.

Adding Pull Requests to the Merge Queue

1. How to Add a Pull Request:   To add a pull request to the merge queue, someone with write access to the repository clicks the "Add to merge queue" button. This action places the pull request in line, following the FIFO system. The queue then starts the process of creating a temporary branch and running the necessary checks.

Handling Successful and Failing CI Checks

2. Successful CI Checks:   When a pull request passes all required CI checks, the merge queue moves it forward. The changes from the pull request then merge into the main branch. This ensures that only code that has passed all tests gets merged, keeping the main branch stable.

3. Failing CI Checks:   If a pull request fails any required checks, the merge queue removes it from the line. The developer receives a notification explaining the issues. They must then fix the code and re-add the pull request to the queue. This process ensures that only reliable and tested code makes it to the main branch.

Options for Jumping to the Top of the Queue

4. Jumping to the Top:   Sometimes, you might need a pull request to merge quickly. In such cases, you can use the option to jump to the top of the queue. However, be cautious with this feature. Jumping to the top causes all in-progress pull requests to rebuild, potentially slowing down the entire process.

Configuring 'Only Merge Non-Failing Pull Requests' Setting

5. Non-Failing Pull Requests:   The "Only merge non-failing pull requests" setting is crucial for maintaining code quality. When enabled, it ensures that only pull requests passing all required checks get merged. If disabled, even pull requests with failing checks can merge, as long as the last one in the group passes. This setting helps manage intermittent test failures but should be used carefully.

Key Takeaways:

  • Add to Merge Queue: Simplifies the process of managing pull requests.

  • Handle CI Checks: Ensures only tested and stable code merges.

  • Jump to Top Option: Offers quick merges but use sparingly.

  • Non-Failing Setting: Maintains high code quality standards.

By managing and optimizing merge queues effectively, you can streamline your development process, reduce conflicts, and maintain a stable main branch.

Troubleshooting Merge Queue Issues

Managing merge queues isn't always smooth sailing. Sometimes, issues arise that need quick resolution to keep your workflow efficient. Let’s dive into some common problems and their solutions.

Common Reasons for Pull Request Removal from the Queue

1. Failing Required Checks:   A pull request might fail required status checks. This failure removes it from the merge queue. Reasons include failing tests or conflicts with the base branch. When this happens, developers must fix the issues and re-add the pull request.

2. Timeout:   Merge queues have a set timeout for how long they wait for CI results. If the results don't come in time, the system assumes failure and removes the pull request.

3. Manual Removal:   Sometimes, a user might manually remove a pull request from the queue. This can happen due to changes in project priorities or recognizing issues that need addressing outside of the queue.

4. Branch Protection Failures:   If the merge queue encounters branch protection rule conflicts, it removes the pull request. These rules ensure that all changes meet certain criteria before merging.

Handling Intermittent Test Failures

Intermittent test failures can be frustrating. They cause false negatives that remove pull requests unnecessarily. Here’s how to handle them:

1. Investigate and Stabilize Tests:   Identify flaky tests by running them multiple times. Stabilize these tests to reduce false negatives.

2. Use the 'Only Merge Non-Failing Pull Requests' Setting:   Disabling this setting allows merging of pull requests even if some tests fail, provided the last one in the group passes. This helps manage intermittent failures without blocking the queue.

3. Retry Mechanism:   Implement a retry mechanism for failed tests. This can automatically rerun tests a few times before declaring a failure.

Adjusting CI Configuration for Merge Queues

Proper CI configuration is essential for smooth merge queue operation. Here are some adjustments you might need:

1. Triggering Checks:   Ensure your CI system triggers checks for merge_group events. This ensures that every pull request in the queue gets validated with the latest code changes.

2. Updating Workflows:   Update your CI workflows to include merge_group events as triggers. This involves modifying your CI configuration files to listen for these events, ensuring checks run when needed.

3. Build Concurrency:   Adjust the build concurrency setting to control how many merge_group webhooks dispatch simultaneously. This helps manage the load on your CI system and speeds up the process.

Monitoring and Resolving Branch Protection Failures

Branch protection rules are vital for maintaining code quality. However, they can sometimes cause issues in the merge queue. Here’s how to monitor and resolve them:

1. Regular Monitoring:   Keep an eye on the branch protection status. Use tools or scripts to alert you when there are issues.

2. Resolve Conflicts:   When a branch protection rule fails, investigate the cause. It could be due to failing tests, outdated code, or conflicts. Resolve these issues promptly to maintain queue efficiency.

3. Communication:   Maintain open communication with your team about branch protection rules. Ensure everyone knows the requirements and how to address failures.

Key Takeaways:

  • Common Issues: Understand why pull requests get removed.

  • Intermittent Tests: Implement strategies to handle flaky tests.

  • CI Configuration: Properly configure your CI for merge queues.

  • Branch Protection: Monitor and resolve protection rule failures.

By addressing these troubleshooting steps, you can maintain an efficient and reliable merge queue, reducing downtime and keeping your development process on track.

Best Practices for Using Merge Queues

Using merge queues effectively ensures a smooth and efficient development process. Here are some best practices to follow:

Ensuring All Required Checks Are Defined and Up-to-Date

1. Define Comprehensive Checks:   List all necessary checks that each pull request must pass. These might include unit tests, integration tests, and linting checks. Ensure these checks cover all critical aspects of your codebase.

2. Regular Updates:   Periodically review and update these checks. As your codebase evolves, new checks might become necessary, and old ones might need adjustments. This ensures that your checks remain relevant and effective.

3. Automate Updating Checks:   Use scripts or CI tools to automate the updating process. This helps in keeping checks consistent across various branches and pull requests.

4. Documentation:   Maintain clear documentation of all required checks. This helps team members understand what is expected and reduces misunderstandings.

Regularly Reviewing and Adjusting Merge Queue Settings

1. Schedule Reviews:   Set regular intervals for reviewing merge queue settings. This could be weekly or monthly, depending on the activity of your repository. 

2. Analyze Performance:   Look at the performance and efficiency of your merge queue. Are there frequent bottlenecks or delays? Use this data to make informed adjustments.

3. Adjust Settings:   Modify settings such as build concurrency, merge methods (merge, rebase, squash), and status check timeouts based on your analysis. This helps in optimizing the merge queue for your team’s needs.

4. Feedback Loop:   Create a feedback loop with your team. Encourage them to report issues and suggest improvements. This collaborative approach ensures that the merge queue settings evolve with the team's requirements.

Collaborating with Team Members to Manage the Queue Efficiently

1. Open Communication Channels:   Use tools like Slack or Microsoft Teams to keep communication lines open. Discuss the status of the merge queue regularly in stand-up meetings or dedicated channels.

2. Assign Roles:   Designate roles for managing the merge queue. This could include a merge queue manager who oversees the process and resolves issues promptly.

3. Training and Onboarding:   Provide training sessions for new team members on using the merge queue. Ensure they understand the best practices and the importance of following the queue’s rules.

4. Collaborative Problem Solving:   When issues arise, collaborate to solve them. This might involve pair programming, code reviews, or brainstorming sessions to address conflicts or test failures.

Using Status Check Timeouts to Avoid Long Wait Times

1. Set Reasonable Timeouts:   Define timeouts for status checks to prevent them from running indefinitely. This ensures that the merge queue progresses smoothly without getting stuck.

2. Monitor Timeout Efficiency:   Regularly monitor the effectiveness of your timeout settings. Adjust them if builds often time out before completing or if there are frequent delays in the queue.

3. Notify Developers:   Automatically notify developers when their pull requests time out. This enables them to investigate and resolve the issue quickly, minimizing delays.

4. Balance Speed and Thoroughness:   Find a balance between the speed of merging and the thoroughness of checks. While it's important to avoid long wait times, ensure that the checks are still comprehensive enough to maintain code quality.

Key Takeaways:

  • Define Checks: Ensure all required checks are comprehensive and up-to-date.

  • Regular Reviews: Periodically review and adjust merge queue settings.

  • Team Collaboration: Collaborate with team members for efficient queue management.

  • Timeout Settings: Use status check timeouts to avoid long wait times.

Implementing these best practices helps maintain an efficient merge queue, ensuring a smooth and reliable development process.

Take Control of Your Merge Process

Trunk has built a better way to manage your merge process, eliminating long wait times and reducing CI costs. With Trunk Merge Queue, you’ll be able to:

  • Autotest your PRs against the latest commit on your main branch

  • Dynamically create parallel queues for faster merges

  • Merge PRs automatically once all requirements are met

  • View detailed stats and performance metrics for your CI pipelines

  • Seamlessly integrate with your existing CI/CD workflows

If you’re ready to streamline your PR process and speed up your development, try it out for free.

Try it yourself or
request a demo

Get started for free

Try it yourself or
Request a Demo

Free for first 5 users