What is a Flaky Test in Cypress?
A flaky test is a test that produces inconsistent results. It may pass during one run and fail on another without any changes to the code. This lack of reliability can cause major headaches for developers. Test results can no longer be trusted and the non-deterministic nature of flaky tests means they can be difficult to reproduce and time-consuming to fix.
Flaky tests are common in integration and end-to-end (E2E) testing and are especially common when E2E testing async web applications.
Cypress is often used for E2E testing of web applications.
You can probably see where we’re going with this - if you work in Cypress for long enough, you will probably encounter flaky tests at some point in time. Thankfully, there are ways to avoid and debug test flakes when working with Cypress, and tools can be used to detect and manage flaky tests, like Trunk Flaky Tests or Cypress Cloud Flaky Test Management.
Examples of Flaky Tests
A simple example of a flaky test: you are using Cypress to select and test a button in a web UI. This button only appears after some user action that requires a variable amount of time, so a cy.wait(1000)
is used to pause test execution.
1cy.get('[data-cy="open-form"]').click();2cy.wait(5000); // wait for form data to load3cy.get('[data-cy="submit-button"]').click();
The test passes when run locally. Test runs also pass in the CI/CD pipeline for the PR and the merge into main
. But the next time someone merged into main
and the test was run, the test failed. The test then passed when it was re-run in the pipeline.
It seems like a flaky test has been introduced into the project’s test suite. The test might pass or fail on any CI run, and the test’s results can no longer be trusted.
Impacts of Flaky Tests on CI/CD Pipelines
Flaky tests in CI/CD pipelines are frustrating for the entire development team. Merging changes into main
becomes a game of chance, you never know when your feature or fix will be deployed.
Flaky tests negatively impact development by:
Taking engineering time away from feature work: Developers need to spend time debugging and fixing test flakes instead of building new features for users.
Delaying deployments: Prevents users from getting access to features and fixes.
Causing friction between developers and dev teams: “Why are your flaky tests blocking our feature?!?”
Erode confidence in your test suite: Devs may ignore legitimate failures or will not write additional tests because the suite is already flaky.
Increasing CI time due to constant re-runs: Developers default to re-running even if a failure is legitimate.
Because of the non-deterministic nature of flaky tests, it can take a lot of dev time to hunt down the root cause of flaky tests, especially when they are not reproducible locally.
All this leads to the major impact of flaky tests: a lower-quality product.
Why Do Flaky Tests Occur?
Flaky tests in Cypress can be frustrating, but understanding why they occur helps prevent them. Several factors contribute to test flakiness. These include uncontrollable environmental events, race conditions, bugs and anti-patterns in test logic, and the influence of external dependencies and network issues.
Race Conditions
Race conditions occur when the timing of events affects the outcome of the test. In web applications, race condition flakes often happen due to:
Asynchronous Operations: Multiple operations happen at the same time, leading to unpredictable results.
Dynamic Content: Pages that update content dynamically can cause tests to pass or fail based on the timing of these updates.
For example, if your test checks for the presence of a button that appears after an API call, the test might fail if the button appears slightly later than expected.
Bugs and Anti-Patterns in Test Logic
Sometimes, the test code itself is the problem. Common issues include:
Hard-Coded Wait Times: Using fixed wait times (like
cy.wait(5000)
) can lead to flakiness, as the actual time needed may vary.Poorly Written Selectors: Selectors that depend on fragile elements like class names or IDs that change frequently can cause failures.
These issues often stem from not following best practices when writing test code.
Influence of External Dependencies and Network Issues
E2E tests may rely on external systems like databases, APIs, or third-party services. Problems can arise for any of the following reasons:
APIs are Unstable: If an API used in the test is down or slow, the test will fail.
Database State: Inconsistent or incorrect database states can lead to unreliable test results.
Third-Party Services: Dependencies on third-party services can introduce variability, especially if those services have rate limits or downtime.
Network issues, such as slow or unstable connections, can also cause tests to fail intermittently. Cypress has built-in functionality to help mock out some of these services when appropriate.
Uncontrollable Environmental Events
Flaky tests can also occur when the testing environment is not stable. Some examples of an unstable test environment include:
Server Load: When the server is under heavy load, it can slow down, causing timeouts or delays.
Network Latency: Fluctuations in network speed can lead to inconsistent results.
Resource Availability: Sometimes, the machine running the tests may not have enough resources (CPU, memory) to execute them reliably.
These factors can lead to tests failing sporadically, even if the underlying code is correct.
Understanding these causes helps you take steps to avoid flaky tests in Cypress. By addressing these factors, you can make your tests more reliable and your CI/CD pipeline more robust.
Taking Control of Flaky Tests in Cypress
Reliable detection and prevention are the keys to controlling flaky tests. Large, complex software suites being tested with Cypress will almost inevitably have some flaky tests. Dealing with these flakes efficiently is important to optimize developer time and reduce their impact on your CI/CD pipeline.
To deal with flaky tests in Cypress, you will want to:
Avoid writing flaky tests in the first place.
Detect any flaky tests, automatically.
Quarantine flaky tests to minimize impact.
Fix important test flakes to make them stable and reliable.
How to Avoid Writing Flaky Tests in Cypress
Debug Your Tests Locally
Local debugging is crucial for avoiding flaky tests in Cypress. It allows you to catch many issues early, making your tests more reliable before they reach the CI/CD pipeline.
Importance of Local Debugging
Running tests locally helps you understand how they behave in a controlled environment. It lets you identify issues that might not appear immediately in a CI/CD pipeline. Local debugging allows you to tweak and refine tests until they are stable.
Setting Up a Local Cypress Environment
To start debugging Cypress locally, you can use the following command to open the Cypress App:
npx cypress open
The Cypress App will open a browser that allows you to interact with and run your E2E or component tests.
Running Tests Multiple Times and Enable Test Retries
You may need to run your tests multiple times to detect flaky tests locally. Flaky tests are inconsistent; they might pass once and fail the next time. Repeated runs help you spot these inconsistencies.
This isn’t perfect, especially when testing manually. Flaky tests with a low flake rate could take many runs to reproduce. Luckily, Cypress has test retires built-in, making it easier to detect test flakes.
Using Cypress App for Detailed Error Logs
The Cypress App provides detailed error logs along with additional built-in tools that make it easier to debug flaky tests:
Command Log: Shows each command executed during the test.
Error Details: Provides information like error name, message, stack trace, and a link to relevant documentation.
Screenshots and Videos: Automatically captures screenshots and video recordings of failed tests, helping you understand what went wrong.
Select Elements Via Custom Data Attributes
Choosing the correct elements to interact with in your tests is key to avoiding flaky tests when E2E testing. Default selection strategies using Cypress often lead to instability.
Problems with Default Selection Strategies
Default methods like selecting elements by tag name, class, or ID can be problematic:
Tag Names: Too generic and likely to change.
Classes: Often tied to styling, which can change frequently.
IDs: Though unique, they can still change during development.
These strategies can cause tests to break when the underlying HTML structure changes.
Best Practices for Writing Resilient Selectors
To write stable selectors in Cypress, follow these best practices:
Avoid Tag and Class Selectors: These are too tightly coupled to the presentation layer.
Minimize Text-Based Selectors: Text content can change, especially in dynamic applications.
Use Custom Data Attributes: Custom data attributes (data-*) are less likely to change and provide a clear, stable way to select elements.
Using Data-* Attributes for Isolation
Custom attributes like data-cy
offer a robust way to select elements:
Isolated from Styling: They are independent of CSS and JavaScript changes.
Clear Intent: Using attributes like
data-cy="submit-button"
makes it clear what the element is for.
For example, if you have a button with the custom data attribute data-cy
:
1<button data-cy="submit-button">Submit</button>
you can use data-cy
to select the button in your Cypress tests:
1cy.get('[data-cy="submit-button"]').click();
This approach ensures that your selectors remain stable, even as the application evolves, reducing the likelihood of flaky tests.
By focusing on local debugging and using resilient selectors, you can significantly reduce the number of flaky tests in Cypress. This results in more reliable tests and a smoother CI/CD pipeline.
How to Detect Flaky Tests in Cypress
Detecting flaky tests involves running tests multiple times or tracking historical CI runs to see if they produce consistent results. Different things can be done to track down flaky tests that appear locally or as part of a CI/CD pipeline.
Some of the following are common ways to attempt to detect flaky tests:
Run Tests Multiple Times: Similar to debugging flaky tests, you can execute the same test several times in a row. If it fails sometimes and passes other times, it's flaky.
Use Retry Mechanisms: Configure your test runner to retry failed tests automatically. If a test passes on a retry, it might be flaky.
Check Logs and Reports: Look at error logs and test reports to see if the failures are random.
Unfortunately, many of these techniques suffer from drawbacks.
Running tests multiple times locally may not produce flaky results because the machines used for testing in CI use different environments. Flaky tests with a 5-10% flake rate could also take a long time to reproduce.
Retry mechanisms can help CI pass, but don’t do anything to address test flakes. Instead, flaky tests are just ignored.
Finally, logs and reports can be inconsistent depending on the nature of the flaky tests, and it can take time and effort to sort through different failures to collect all the required logs.
Flaky test detection tooling, like Cypress Flaky Test Management or Trunk Flaky Tests, is often a better way to find test flakes. These tools help detect flaky tests in a CI/CD pipeline, collect all flaky test data and stack traces in a single place, and some, like Trunk Flaky Tests, also help manage flaky tests with quarantine mechanisms built-in.
Quarantine flaky tests in Cypress
Fixing every single flaky test as soon as it is detected would be nice. It could also be incredibly time-consuming. While it is important to track flaky tests and fix any flakes on important tests, it often isn’t practical to hunt down and fix every flaky test as soon as it is detected.
Quarantining is the best way to deal with flaky tests.
A quarantined flaky test will continue to run as part of a CI/CD pipeline, but failures for quarantined tests will not stop the pipeline from completing successfully. Flaky tests are still tracked and run, but PRs are not blocked from merging into main
.
Another advantage of quarantining is that flaky test history can easily be tracked. Initial attempts to fix flaky tests are often unsuccessful. Historical logs and information are important for future debugging efforts. This potentially useful information would be lost if the test is disabled or deleted.
There isn’t an out-of-the-box way to quarantine flaky tests with Cypress Cloud and Cypress Flaky Test Management. Instead, you can use a tool like Trunk Flaky Tests. Automatically quarantining tests at runtime is important so that quarantine status doesn’t need to be manually managed. Flaky tests can automatically be quarantined, and when fixed, can be automatically unquarantined without any external intervention.
Learn more about quarantining with Trunk Flaky Tests.
How to Debug and Fix Flaky Tests in Cypress
Debugging and fixing flaky tests in Cypress involves several steps that help you identify and resolve inconsistencies. Here are some techniques to help you get started.
Use Tooling to Detect and Manage Flaky Tests
A Cypress Cloud subscription will give you access to Cypress Flaky Test Management, a tool that helps you detect and debug flaky tests. With Flaky Test Management, you get:
Flaky test detection: Flaky tests will be flagged and marked by Cypress when test results are inconsistent. From the Cypress docs: “Any failure across multiple test run attempts triggered by test retrying will result in a given test case to be flagged as flaky.”
Flaky test analytics: Get an overview of the flaky status of your test suite, including the total number of flaky tests, a rating of the overall flakiness of your project, and a filterable log of all flaky tests.
Flaky test alerts: Send notifications to a messenger or source control provider to keep the entire team up to date on the status of test flakes in individual CI workflows.
There are two major limitations to Cypress Flaky Test Management:
Flakes are only detected using test retries: CI time is wasted while tests are re-run in the hope that they produce inconsistent results.
No mechanism for dealing with test flakes: Once a flaky test is detected, you still need to manually fix the test to unblock CI. There is nothing built-in to automatically quarantine flaky tests.
Trunk Flaky Tests doesn’t use retries to detect test flakes, and will automatically quarantine flaky tests to speed up CI and unblock developers. This is in addition to providing users with analytics information and enabling custom notifications with flaky test webhooks.
Enabling Detailed Event Logging
Detailed event logging is essential for understanding why a test is flaky. By enabling verbose logging, you gain insight into the sequence of events leading up to a test failure.
Enable Logging: In the browser's console, execute:
localStorage.debug = 'cypress:*'
Reload the Page: Turn on verbose mode and reload the page to see all Cypress event logs in the console.
With detailed logs, you can trace each step of your test and see where things go wrong.
Using the debug()
and pause()
Methods
Cypress offers powerful methods like debug()
and pause()
to help you pinpoint issues.
debug()
Method
The debug()
method sets a breakpoint in your test, allowing you to inspect the state of the application at that point.
Usage:
1it('button is clicked', () => {2 cy.get('[data-cy="submit-button"]')3 .click({force: true})4 .debug();5});
Benefit: It logs the current state and pauses execution, letting you examine the DOM and variables.
pause()
Method
The pause()
method stops the test execution, giving you the freedom to manually inspect the application.
Usage:
12it('pauses the test', () => {3 cy.get('[data-cy="action-canvas"]')4 .click(80, 75)5 .pause()6 .click(170, 75)7});
Benefit: Allows for manual inspection of the DOM, network requests, and local storage.
Capturing Screenshots and Video Recordings
Screenshots and video recordings provide visual evidence of what went wrong during a test. They are invaluable for debugging flaky tests.
Automatic Capturing: Cypress can automatically capture screenshots of test failures.
Enable in Configuration:
1const { defineConfig } = require('cypress');23module.exports = defineConfig({4 screenshotsFolder: 'cypress/screenshots',5 videosFolder: 'cypress/videos',6 video: true,7});
Manual Capturing: Use cy.screenshot()
to take screenshots at specific points.
Usage:
1cy.screenshot('custom-screenshot');
These visual aids help you see the exact state of the application at the time of failure.
Analyzing Error Messages and Stack Traces
Error messages and stack traces offer clues about why a test failed. Analyzing them can lead you to the root cause of flaky tests.
Error Message Details: Look for key information such as error type, message, and relevant documentation links.
Stack Trace: Examine the stack trace to identify where the error occurred in your code.
By following the stack trace, you can locate the exact line of code causing the issue and address it directly.
Using these methods, you can effectively debug and fix flaky tests in Cypress. Detailed logging, strategic use of debug()
and pause()
, visual evidence from screenshots and videos, and thorough analysis of error messages and stack traces will help you create more stable and reliable tests.
Best Practices for Maintaining Test Stability
Maintaining test stability in Cypress is crucial for reliable and consistent test results. Implementing the following best practices can help ensure your tests remain stable and trustworthy.
Regularly Review and Update Tests
Regularly reviewing and updating your flaky tests is essential for maintaining their relevance and accuracy. Important tests that are flaky should be fixed! The following can be done to review quarantined flaky tests:
Scheduled Reviews: Set a schedule to review your flaky tests periodically. This could be monthly or quarterly, depending on how frequently your application updates.
Update Assertions: Ensure that all assertions still align with the current state and behavior of your application.
Refactor Code: Refactor test code to improve readability and maintainability. Simplify complex test logic and remove redundant steps.
Isolate Tests from External Dependencies
Tests that rely on external dependencies like third-party APIs or databases can become flaky due to factors beyond your control. Isolating your tests from these dependencies can help maintain stability.
Mock External Services: Use tools to mock external API calls and services. This ensures that your tests do not fail due to issues with third-party services.
Example:
1cy.intercept('/api/users', {2 body: [3 { id: 1, name: “Carl” }4 ]5});
Use Stubs and Spies: Employ stubs and spies to replace real functions and methods with controlled implementations.
Example:
1let listenersAdded = false;23cy.stub(util, 'addListeners').callsFake(() => {4 listenersAdded = true5});67App.start();8expect(listenersAdded).to.be.true;
Simplify Test Logic and Flows
Complex test logic can introduce errors and make tests harder to maintain. Simplifying your test logic and flows can mitigate these issues.
Break Down Tests: Divide large, complex tests into smaller, more manageable units. Each test should focus on a single functionality.
Avoid Conditional Logic: Minimize the use of conditional logic within your tests. Tests should be straightforward and predictable.
Clear Naming Conventions: Use clear and descriptive names for your test cases and steps. This improves readability and makes it easier to understand what each test is verifying.
Continuous Monitoring and Analysis of Test Results
Continuous monitoring and analysis of your current and historical test results help identify patterns and issues that may not be apparent from individual test runs.
Automated Reports: Set up automated reporting to get detailed insights into test runs. Tools like Cypress Dashboard or Trunk Flaky Tests can provide valuable data for test performance and flakiness.
Trend Analysis: Regularly analyze test trends to identify recurring issues. Look for patterns in test failures to pinpoint problematic areas in your application or test suite.
Immediate Action: Address flaky tests and failures as soon as they are detected. Automatic quarantining handles this for you. The longer a flaky test remains unquarantined, the more it can undermine the reliability of your test suite.
By implementing these best practices, you can significantly improve the stability of your Cypress tests. Regular reviews, isolation from external dependencies, simplified test logic, and continuous monitoring will help ensure your tests remain robust and reliable.
Manage Flaky Tests the Right Way
Whether you are dealing with flaky tests in Cypress or are fighting them in another test framework, it is important to familiarize yourself with the tooling and methods available to painlessly deal with flakes in your CI pipeline.
Trunk is building a tool to conquer flaky tests once and for all. With Trunk Flaky Tests, you’ll be able to:
Autodetect the flaky tests in your build system
See them in a dashboard across all your repos
Quarantine tests with one click or automatically
Get detailed stats to target the root cause of the problem
Get reports weekly, nightly, or instantly sent right to email and Slack
Intelligently file tickets to the right engineer
Not to mention, Trunk’s approach to flaky tests will work across CI providers and test frameworks, so you can use them on any project from web apps to massive monorepos.