Back to all articles

How to avoid and detect flaky tests in playwright?

By The Trunk TeamSeptember 10, 2024
testing

How to avoid and detect flaky tests in playwright?

Playwright is a great tool for automated testing, but flaky tests can be a big problem. Flaky tests are tests that sometimes pass and sometimes fail without any changes in the code. This can cause headaches, especially in Continuous Integration (CI) pipelines. You need to understand what flaky tests are, why they happen, and how to avoid them to keep your testing smooth.

What Are Flaky Tests in Playwright?

Flaky tests produce different results each time you run them. One day, they pass; the next day, they fail even though you haven't changed anything in the code. This inconsistency can make it hard to trust your test results.

In Playwright, flaky tests are particularly tricky because Playwright handles browser interactions. Browsers are complex and can behave differently based on various factors such as network speed or system load. When a Playwright test passes one time but fails the next, you likely have a flaky test on your hands.

Impact of Flaky Tests on CI Pipelines

Flaky tests cause major problems in CI pipelines. CI systems, like Jenkins or GitHub Actions, rely on consistent test results to verify code changes. When a flaky test fails, it can block your pipeline, delaying deployments:

  • Unreliable Builds: Flaky tests make it hard to know if a build is truly broken.

  • Wasted Time: Developers spend extra time rerunning tests to make sure they pass.

  • Blocked Deployments: A flaky test can prevent important updates from going live.

Identifying Flaky Tests Using Commit SHAs and Test Retries

To identify flaky tests, you can use commit SHAs and test retries. A commit SHA is a unique identifier for a specific commit in your version control system, like Git. If a test passes for one commit SHA but fails for the same SHA later, it’s likely flaky. Here’s how you can detect them:

  • Commit SHAs: Track test results across multiple runs of the same commit SHA.

  • Test Retries: Configure Playwright to retry tests automatically. If a test fails the first time but passes on retry, it’s flagged as flaky.

By understanding flaky tests and their impact, you can start to tackle them effectively.

Why Do Flaky Tests Occur?

Flaky tests are a common issue in automated testing, and understanding why they happen can help you avoid them. Here’s a closer look at the main causes of flaky tests in Playwright.

Race Conditions Caused by Concurrent Operations

Race conditions occur when two or more operations run simultaneously and interfere with each other. This can happen in Playwright when several browser actions or scripts run at the same time. For example, if a script tries to click a button while another script is still loading the page, the test might fail inconsistently. This type of error is hard to predict and resolve because it depends on the timing of the operations.

Slowdowns Due to Machine Performance Variability

The performance of the machine running your tests can vary due to many factors, such as CPU load or available memory. These variations can cause tests to behave differently each time they run. If your test expects a page to load within 5 seconds, it might fail if the machine is under heavy load and takes longer than expected. This can lead to flaky test results that are difficult to reproduce.

Bugs in Test Scripts, Such as Non-Robust Node Locators

Bugs in your test scripts can also cause flakiness. One common issue is using non-robust node locators. For instance, if your script uses a CSS selector that is too specific, it might fail if the page layout changes slightly. Playwright offers more reliable locator methods, such as page.getByRole() or page.getByText(), to help you avoid this problem. Using these methods can make your tests more resilient to minor changes in the page structure.

Collective Contribution of Multiple Factors to Test Flakiness

Often, it’s not just one thing that makes a test flaky but a combination of several factors. For example:

  • A race condition might be exacerbated by a slowdown in machine performance.

  • A bug in the test script might only show up when the server responds slower than usual.

These combined factors make it challenging to pinpoint the exact cause of a flaky test. You need to consider all potential issues and address them collectively to improve test reliability.

Understanding these common causes can help you prevent flaky tests in Playwright. By being aware of race conditions, machine performance variability, and script bugs, you can take proactive steps to make your tests more stable.

How to Avoid Writing Flaky Tests in Playwright

Creating stable and reliable tests in Playwright requires specific practices. Here’s how you can avoid writing flaky tests:

Running and Debugging Tests Before Committing Them

Testing your code locally before committing is crucial. Local testing helps catch issues early, reducing the risk of flaky tests. Use a machine that closely matches your production environment to ensure consistency. This way, you can avoid discrepancies between your local and CI environments.

  • Importance of Local Testing: Running tests locally allows you to identify and fix issues before they reach the CI pipeline. This saves time and reduces the likelihood of random failures.

  • Consistent Environments: Make sure your local test environment mirrors the CI environment as closely as possible. Differences in browser versions, operating systems, or hardware can lead to unexpected test behavior.

  • Playwright’s HTML Report: After running your tests, Playwright generates an HTML report. This report provides detailed information about each test, making it easier to identify flaky tests. Access it by running npx playwright show-report.

  • Automatic Test Retries: Configure Playwright to automatically retry tests. This helps identify flaky tests by showing tests that pass on subsequent attempts after failing initially. Use the retries option in playwright.config.ts to set up automatic retries.

Use Locators, Not Selectors

Locators in Playwright offer a more reliable way to identify elements on a web page compared to traditional CSS or XPath selectors. They are designed to be more robust, reducing the chances of flaky tests.

  • Advantages of Locators: Locators are less likely to break when the page structure changes. They are more resilient to minor changes in the DOM, making your tests more stable.

  • Recommended Locator Functions:

    • page.getByRole(): Locates elements based on their roles, such as buttons or links.

    • page.getByText(): Finds elements containing specific text.

    • page.getByLabel(): Targets form controls by their associated labels.

    • page.getByPlaceholder(): Locates input fields by their placeholder text.

    • page.getByAltText(): Finds images by their alt text.

    • page.getByTestId(): Uses custom test IDs to locate elements.

  • Example of Using Locators: Instead of using a CSS selector like .btn-primary.submit, use:

    page.getByRole("button", { name: "Submit" });

    This locator is more robust because it targets the button by its role and visible text, which are less likely to change.

Never Rely on Hard Waits

Fixed time delays, known as hard waits, are a common source of flaky tests. They can make tests unreliable and slow.

  • Problems with Fixed Time Delays: Hard waits cause tests to pause for a set amount of time, regardless of whether the condition being waited for has been met. This can lead to tests passing in some environments but failing in others due to differences in execution speed.

  • Example of a Test Using Hard Waits:

    const { test, expect } = require('@playwright/test');
    test('"Load More" button loads new products', async ({ page }) => {
    await page.goto('https://localhost:3000/products');
    const loadMoreButton = await page.getByRole('button', { name: 'Load More' });
    await loadMoreButton.click();
    await page.waitForTimeout(10000); // Hard wait
    const productNodes = await page.locator('.product').count();
    expect(productNodes).toBe(20);
    });

    This test waits 10 seconds regardless of whether the products have loaded.

  • Replacing Hard Waits with Web-First Assertions: Use Playwright’s built-in web-first assertions to wait for conditions dynamically:

    const { test, expect } = require('@playwright/test');
    test('"Load More" button loads new products', async ({ page }) => {
    await page.goto('https://localhost:3000/products');
    const loadMoreButton = await page.getByRole('button', { name: 'Load More' });
    await loadMoreButton.click();
    await expect(page.locator('.product')).toHaveCount(20, { timeout: 10000 });
    });

    This approach waits up to 10 seconds for the products to appear but doesn’t delay the test if they load sooner.

By following these practices, you can minimize the occurrence of flaky tests in Playwright, ensuring more stable and reliable test results.

How to Detect Flaky Tests in Playwright

Detecting flaky tests early is essential for maintaining a reliable test suite. Here’s how you can identify and address flaky tests in Playwright:

Running Tests Multiple Times

Flaky tests often fail randomly. To catch these inconsistencies, run your tests multiple times. By doing so, you can observe which tests pass sometimes and fail other times.

  • Multiple Executions: Execute the same test several times in different environments. Watch for tests that don’t consistently pass.

  • Pattern Recognition: Note any patterns in the failures. Are they related to specific actions or conditions?

Using Playwright's Built-In Debugging Tools

Playwright offers powerful debugging tools to help you understand why a test might be flaky. Use these tools to investigate and resolve issues.

  • Debug Mode: Launch tests in debug mode using the --debug flag. This will open a browser window and pause execution at breakpoints.

    npx playwright test --debug

  • UI Mode: Use Playwright’s UI mode to visually inspect and interact with your test as it runs. This can help pinpoint where and why a test might be failing.

Analyzing HTML Reports

Playwright generates detailed HTML reports that provide insights into your test results. Use these reports to filter and analyze flaky tests.

  • Filtering Flaky Tests: HTML reports categorize tests based on their outcomes. Tests that fail initially but pass on retry are marked as flaky.

  • Detailed Insights: Examine the logs and screenshots provided in the HTML report. Look for inconsistencies or unexpected behaviors.

Setting Up Automated Test Retries

Automated retries can help categorize and manage flaky tests. By configuring retries, you can ensure that a single failure doesn’t disrupt your CI pipeline while still identifying unreliable tests.

  • Configuring Retries: Enable retries in your playwright.config.ts file:

    import { defineConfig } from '@playwright/test';
    export default defineConfig({
    retries: 2, // Retry failing tests twice
    });

  • Interpreting Results: After setting up retries, categorize the test results:

    • Passed: Tests that pass on the first execution.

    • Flaky: Tests that pass on a retry after failing initially.

    • Failed: Tests that fail even after multiple retries.

By following these steps, you can effectively detect flaky tests and take action to improve the reliability of your Playwright test suite.

How to Fix Flaky Tests in Playwright

Once you’ve detected flaky tests, the next step is to fix them. Here are some practical strategies to stabilize your Playwright tests:

Isolating Tests to Prevent External Dependencies

Tests that rely on external factors can become flaky due to changes or issues outside your control. Isolating tests helps ensure they only fail when there’s a genuine problem with the code being tested.

  • Mocking External Services: Use mocks and stubs to simulate interactions with external services. This way, your tests aren’t affected by network issues or downtime.

  • Independent Test Data: Ensure each test has its own data setup. Avoid shared state or data that can lead to unpredictable results.

Increasing Robustness by Handling Different Scenarios

Robust tests can handle a variety of scenarios and conditions. This reduces flakiness caused by unexpected changes or edge cases.

  • Dynamic Waits: Use Playwright’s built-in waiting mechanisms instead of fixed waits. For instance, use expect(page.locator('.product')).toHaveCount(20) to wait for elements to appear dynamically.

  • Condition Handling: Write tests to account for different possible states. If an element might or might not be present, handle both cases gracefully.

Simplifying Test Logic for Clarity and Correctness

Complex test logic can introduce errors and make tests harder to debug. Simplify your tests to make them more reliable and easier to maintain.

  • Clear Assertions: Use straightforward and clear assertions. Avoid chaining multiple assertions in a single test, as this can make it harder to pinpoint failures.

  • Modular Tests: Break down complex tests into smaller, more focused tests. This makes it easier to identify which part of the test is causing flakiness.

Configuring Appropriate Timeouts to Avoid Premature Failures

Timeouts are crucial in ensuring tests wait long enough for conditions to be met but not too long to waste time. Properly configured timeouts can help reduce flaky tests.

  • Test Timeout: Set an appropriate global timeout for your tests in playwright.config.ts:

    import { defineConfig } from '@playwright/test';
    export default defineConfig({
    timeout: 60000, // Set test timeout to 60 seconds
    });

  • Locator Timeout: Use specific timeouts for locators and assertions when needed:

    await page.locator('.product').waitFor({ timeout: 10000 });

  • Avoid Overly Long Timeouts: While it might be tempting to set long timeouts to avoid failures, this can mask underlying issues and lead to slower test runs.

By applying these strategies, you can fix flaky tests in Playwright and build a more stable and reliable test suite.

Configuring Playwright for Better Test Stability

Setting the Right Timeouts

Properly configuring timeouts is crucial for ensuring your Playwright tests run smoothly. Timeouts prevent tests from hanging indefinitely while waiting for conditions to be met. Setting the right timeouts helps avoid premature failures and ensures that tests wait just the right amount of time.

  • Importance of Test and Expect Timeouts: Test timeouts set a maximum time for the entire test to run. Expect timeouts define how long Playwright should wait for a specific condition to be met. Both are important for maintaining test reliability.

  • Configuring Global Timeouts: You can set global timeouts in your playwright.config.ts file. This ensures all tests adhere to a consistent timeout policy:

    import { defineConfig } from '@playwright/test';
    export default defineConfig({
    timeout: 30000, // 30 seconds for each test
    expect: {
    timeout: 5000, // 5 seconds for each expect
    },
    });

  • Avoiding Overly Long or Zero Timeouts: Setting excessively long timeouts can mask underlying issues, making it harder to detect flaky tests. On the other hand, zero timeouts could lead to immediate failures, not giving enough time for conditions to be met. Finding a balanced timeout value helps in identifying real problems.

Using Playwright's Automatic Retries

Automatic retries can significantly improve the stability of your test suite. Retries allow tests to run multiple times before being marked as failed, which can help identify flaky tests and ensure more reliable CI/CD pipelines.

  • Enabling Test Retries in Playwright Configuration: To enable test retries, add the retries option to your playwright.config.ts file:

    import { defineConfig } from '@playwright/test';
    export default defineConfig({
    retries: 2, // retry failed tests up to 2 times
    });

  • Benefits of Retries for CI Pipelines: Retries ensure that transient issues don't cause your CI pipeline to fail. This can be particularly useful for issues related to network instability or temporary server problems. By retrying tests, you can avoid unnecessary pipeline failures and keep the development process smooth.

Categorizing Tests Based on Retry Outcomes: Playwright categorizes tests based on their retry outcomes:

  • Passed: Tests that pass on the first run.

  • Flaky: Tests that fail initially but pass upon retry.

  • Failed: Tests that fail even after all retries.

This categorization helps you identify and focus on flaky tests, providing insights into which tests need further attention and improvement.

By configuring these settings, you can enhance the stability and reliability of your Playwright tests, ensuring a more robust testing process.

Taking Control of Flaky Tests in Cypress

Taking control of flaky tests starts with reliable detection and prevention. Trunk is building a tool to conquer flaky tests once and for all. You’ll get all of the features of the big guy's internal systems without the headache of managing it. With Trunk Flaky Tests, you’ll be able to:

  • Autodetect the flaky tests in your build system

  • See them in a dashboard across all your repos

  • Quarantine tests with one click or automatically

  • Get detailed stats to target the root cause of the problem

  • Get reports weekly, nightly, or instantly sent right to email and Slack

  • Intelligently file tickets to the right engineer

If you’re interested in getting beta access, sign up here.

Try it yourself or
request a demo

Get started for free

Try it yourself or
Request a Demo

Free for first 5 users