Vite is fast becoming one of the most popular and widely adopted bundlers, and Vitest, the Vite-native test runner, is also quickly gaining adoption. It natively leverages the Vite dev server already configured for your project to run tests, and its Jest-compatible API makes it a drop-in replacement. With the increasingly common practice of prioritizing few but high-quality E2E (end-to-end) tests, Vitest also comes with a native Browser Mode for E2E (end-to-end) testing in combination with Playwright or WebDriverIO. This allows you to run all your tests on Vitest conveniently, but E2E testing also means you’re likely to face flaky tests.
End-to-end testing and flaky tests
End-to-end tests are incredibly valuable because they give a higher degree of confidence that the code or application under test is working correctly. Since they test the entire system and not snippets of it, it gives you confidence that the system, as seen by an end user, will function as expected.
The problem with end-to-end testing is that it’s more prone to flakiness. A “flaky test” means a test that returns a non-deterministic result, meaning that even when no code has changed, running a test multiple times returns different results. These flaky tests are more common in E2E tests, partially because they’re more complex.
Many studies [1] [2] have pointed to issues such as environment, networking, infrastructure, and asynchronous operations as common causes for flaky tests, all of which are more prevalent in complex tests such as E2E and integration tests. Other studies have found that larger tests, such as E2E or integration tests, in general, tend to be flakier [3].
If you’re writing tests with Vitest, especially if you’re writing integration or E2E tests, flaky tests will appear in your test suites sooner or later, even if you take precautions to avoid them [2].
My tests are flaky, so what?
There’s good reason to be concerned about flaky tests. Flaky tests are rarely the result of bad test code alone. They could indicate flakiness in your infrastructure and production code. For many teams, this is the single biggest reason for investigating and fixing every flaky test. They’re not just an annoyance but could indicate underlying problems that can bubble up to end users.
You might also notice the impact of flaky tests on the reliability of your CI. You rely on CI as guardrails so you can ship code quickly without worrying about introducing regressions or breaking existing features. Flaky tests can both ruin your trust in CI results and seriously slow down your team. If your CI jobs fail 20% of the time, you have to choose between rerunning your tests or ignoring flaky failures, forcing you to choose between slowing your team down or poisoning your test culture.
Flaky tests are poison. You can have the best tests in the world, the most sophisticated testing that finds the craziest bugs no one would think of, but if developers can’t trust the results, they can’t trust the results 99% of the time, it doesn’t matter.
Dealing with flaky tests
Flaky tests are inevitable if you write E2E tests at scale [2]. Your software is complex, so you need some complex tests to provide confidence and validate your changes. If your tests are complex, you will have some flakiness in your tests. It’s a matter of economics; you can make your tests complex and reliable, but that will require many engineering hours that you may be unable to justify. Instead, you should focus on reducing the impact of flaky tests and tackle high-impact flaky tests efficiently.
To effectively reduce the impact of flaky tests, you should do the following:
Avoid flaky tests by learning common anti-patterns
Detect flaky tests with automated tools and communicate them to your team
Quarantine known flaky tests to mitigate their impact on the team
Fix high-impact flaky tests for the bang for your buck
Let’s walk through each of these steps in more detail.
Avoiding flaky tests in Vitest
Never rely on hard waits
Async and I/O bound operations are the single most common cause of flaky tests [1]. While handling async operations has become much easier since the introduction of Promises
and async/await
operations, you’re still likely to run into situations where they’re not enough to guarantee process completion.
The most common source of flakes in web UI testing is snapshot testing. Since they work by taking a snapshot of the UI rendered and comparing it to past “correct” snapshots, these are very sensitive to minor differences in what’s rendered. All it takes is for some data, animation, or image not to complete rendering for the test to become flaky.
If you try to use hard awaits, there’s no guarantee that the components will finish rendering, which means flakiness. If the component loads faster than expected, you also waste time in CI by making tests run longer.
1it('renders user data correctly', async () => {2 global.fetch = vi.fn().mockResolvedValue({3 json: () => Promise.resolve({ ... })4 });5 const { container } = render(<UserProfile userId={1} />);67 // Don't do this!8 await new Promise(resolve => setTimeout(resolve, 100));9 expect(container).toMatchSnapshot();10});
Another good example is a background deletion job. In many cases, such as deleting a user and thousands of rows of associated data or sending out thousands of emails, APIs return immediately with a status code 202 to indicate that the request is accepted and being processed. You can’t await for a promise to resolve, and it’s very tempting to use a hard wait.
1describe('Delete User API', () => {2 it('should delete user and associated posts', async () => {3 // ... Setup4 // Call delete endpoint5 const response = await fetch(`/api/users/${userId}`, {6 method: 'DELETE'7 });89 expect(response.status).toBe(202);1011 await new Promise(resolve => setTimeout(resolve, 500));12 // Is the user's data also deleted?13 const userPosts = await getPostsByUserId(userId);14 expect(userPosts.length).toBe(0);15 // ... Other skipped assertions16 });17});
To make these tests reliable, use something like waitFor and waitUntil. These methods allow you to poll periodically with a timeout to see if a component has been rendered or a process has been completed. This lets you more reliably await things that don’t resolve cleanly in a promise.
1describe('UserProfile', () => {2 it('renders user data correctly', async () => {3 // ... Setup4 const { container } = render(<UserProfile userId={1} />);56 // Wait for some element to appear7 await waitFor(() => {8const element = await getDOMElementAsync() as HTMLElement | null9expect(element).toBeTruthy()10expect(element.dataset.initialized).toBeTruthy()11 return element12 }, {13 timeout: 500, // default is 100014 interval: 20, // default is 5015 });1617 // Take snapshot after we know the data has loaded18 expect(container).toMatchSnapshot();19 });20});
Just remember that if multiple components load independently, you might need to wait for each of them to ensure proper setup.
Avoid overly strict assertions
Overly strict assertions can often make tests a nightmare to maintain and flaky. The simplest example of overly strict assertions is with floating point numbers. This is due to the way floating point numbers are represented in memory, certain numbers like `0.3 are stored as a number very close to the original value (0.30000000000000004
), but not the exact same.
Take the following assertion, for example:
1expect(0.2 + 0.1).toBe(0.3) // 0.2 + 0.1 is 0.30000000000000004
This assertion looks correct but will fail. This is a common source of flakiness in apps dealing with money, coordinates, or floating point data.
You should use a less strict assertion instead:
1// 0.2 + 0.1 is 0.300002expect(0.2 + 0.1).toBeCloseTo(0.3, 5)
Similar issues can come from array assertions where array ordering might affect test results. For example, you may have a test that adds items to your cart 3 times. If the button clicks, it triggers some sync operation to add items to a list. You might occasionally have a list that’s out of order, but it's still correct. In these cases where order doesn’t matter, it’s better to match using arrayContaining
, which ignores order instead of a strict match.
1// Too strict2expect(cart).toEqual([{ id: 1 }, { id: 2 }, { id: 3 }]);34// Better5expect(cart).toEqual(expect.arrayContaining([{ id: 1 }, { id: 2 }, { id: 3 }]));
DateTime handling
Date strings and timestamps are also common sources of flakiness if you use strict assertions. Consider this example from “An Empirical Analysis of UI-based Flaky Tests”, since date times are precise to the millisecond, matching date times strictly will lead to flaky failures.
1const { node, onChange } = createFormComponent({2 schema: {3 type: "string",4 format: "date-time",5 },6 uiSchema,7});89Simulate.click(node.selector("a.btn-now"));10const formValue = onChange.lastCall.args[0].formData;11const expected = toDateString(parseDateString(new Date().toJSON(), true));12expect(comp.state.formData).toBe(expected);
You can avoid most of the date-time problems here by using the various date-mocking tools in Vitest. This lets you control the date for consistency between test runs.
Control your test environment
If your test environment changes from run to run, your tests are much more likely to be flaky. Your Vite E2E and integration tests depend on your entire system's state to produce consistent results. If you test on a persistent test environment or a shared dev/staging environment, that environment might differ each time the test is run.
This is particularly problematic since Vitest will default to running your tests in parallel across multiple CPUs. Awesome for performance, but it can introduce flakiness. Testing in parallel and in random order introduces opportunities for both inconsistent system states to cause flakiness and race conditions where concurrently running tests interfere with each other. This can be further complicated if you disable running tests in isolated environments, which can make your tests run faster, but might cause mutations across different parallel tests to affect each other.
The solution to these problems is to guarantee proper setup and tear down of your tests, as well as maintaining good separation between tests so they’re not likely to access the same resources. In Vitest, you can do this at the test suite level with beforeAll and afterAll, or at the test case level with beforeEach and afterEach.
Setup and tear down in Vitest is a good start, but even with these hooks to set up and tear down your tests in complex E2E suites, artifacts like orphaned processes, files, and caches can still cause flakiness. It’s just so easy to miss the hard-to-reach corners of your test environment. If you’re using a persistent testing environment, you must ensure that artifacts created by your tests, such as new database tables and rows, files created in storage, and new background jobs started, are appropriately cleaned up. If possible, your CI job should just run in a new environment. Leftover data from run to run might affect the results of future test runs.
Similarly, you should be careful when testing a development or staging environment. These environments might have other developers who create and destroy data as part of their development process. This constantly changing environment can cause flakiness if other developers accidentally update or delete tables and files used during testing, create resources with unique IDs that collide with a test, or use up the environment’s compute resources causing the test to timeout.
Limit External Resources
While it’s important to test all parts of your app fully, it’s important to limit the number of tests that involve external resources, like external APIs and third-party integrations. These external resources may go down, they might change, and you may hit rate limits. They’re all potential sources of flaky failures. You should aim to cover each external resource in a few tests, but avoid involving them in your tests excessively.
Mocking is a decent strategy for testing specific behaviors, but it should not be used as a replacement for end-to-end testing. You can’t entirely avoid external resources, but you should be mindful of which test suites involve these resources. Have some dedicated tests that involve external resources and mock them for other tests.
Automatic Reruns
If you only have a few flaky tests, rerunning tests on failure to clear flakiness can be effective. Vitest lets you easily retry tests through configs.
This approach does have its own tradeoffs. It will cost more CI resources and CI time, which, depending on the size of your test suite, can be a negligible or profound problem. If you have hundreds of thousands of tests, and a few thousand of them are flaky, this will be unbelievably expensive. You will also still catch flaky tests slipping through the cracks occasionally by pure chance.
A much bigger problem is reruns just cover up the problem. When you introduce new flaky tests, you may not even notice. This is bad because some flaky tests indicate real problems with your code, and the silent build-up of flaky tests can cause scaling problems within your CI system.
We recommend using retry in combination with automated detection, quarantining, and a proactive approach to squash flaky tests as they’re discovered. Retry is great as a final safety net for the low flaky tests that aren’t worth fixing or flake so rarely that they’re not detected and quarantined.
Detecting Flaky Tests in Vitest
If you only have a few flaky tests, a great place to start is by having a central place to report them, such as a simple spreadsheet. To identify flaky tests, you can use commit SHAs and test retries. A commit SHA is a unique identifier for a specific commit in your version control system, like Git. If a test passes for one commit SHA but fails for the same SHA later, it’s likely flaky since nothing has changed.

Running tests multiple times to find flaky tests is a great way to audit your existing repo or new suites of tests. You can automate a system where new tests introduced to your repo will be re-run 50 or 100 times to test for flakiness, but this approach isn’t bulletproof.
It turns out it’s incredibly difficult to have confidence that a few hundred test runs can guarantee a test isn’t flaky. If you’re examining a test with 1% flakiness, to have 95% confidence that a test isn’t flaky, you’d need to run it about 300 times. [4] It’s also been shown on real projects that tests are hard to reproduce by rerunning and “fixed” tests can become flaky again. [6] Not to mention that tests often become flaky over time as related systems change.
Another problem with using a method like this is that it relies on a “periodic cleanup” type workflow instead of monitoring continuously. Relying on finding time to clean up tech debt periodically rarely works. Other higher-priority tasks, typically anything but testing, will displace this workflow, and you’ll end up with a pileup of flaky tests again.
Instead of just preventing flaky tests when new tests are introduced, you also want to monitor your tests continuously. Continuously monitoring flaky tests also helps catch flaky tests due to changes in external resources, tool rot, and other sources of flakiness that happen due to drift and not code change. These problems are often easier to fix when caught early. Continuous monitoring also tells you if your fixes actually worked or not since most flaky tests aren’t fixed completely on the first attempt but will see reduced flakiness.
Collecting test results in CI and comparing results on the same commit is a great starting point for detecting flaky tests continuously and automatically. There are some nuances to consider for the different branches on which the tests are run. You should assume that the test should not normally fail on the `main` branch (or any other stable branch) but can be expected to fail more frequently on your PR and feature branches. How much you factor different branches’ flakiness signals will depend on your team’s circumstances.
If you’re looking for a tool that follows the same philosophies, Trunk Flaky Tests automatically detects and tracks flaky tests for you. Trunk also aggregates and displays the detected flaky tests, their past results, relevant stack traces, and flaky failure summaries on a single dashboard.
For example, you’ll get a GitHub comment on each PR, calling out if a flaky test caused the CI jobs to fail.

Quarantining Flaky Tests in Vitest
What do you do with flaky tests after you detect them? In an ideal world, you’d fix them immediately. In reality, they’re usually sent to the back of your backlog. You have project deadlines to meet and products to deliver, all of which deliver more business value than fixing a flaky test. What’s most likely is that you’ll always have some known but unfixed flaky tests in your repo, so the goal is to reduce their impact before you can fix them.
We’ve written in a past blog that flaky tests are harmful to most teams because they block PRs and reduce trust in tests. So once you know a test to be flaky, it’s important to stop it from producing noise in CI and blocking PRs. We recommend you do this by quarantining these tests.
Quarantining is the process of continuing to run a flaky test in CI without allowing failures to block PR merge. We recommend this method over disabling or deleting tests because disabled tests are usually forgotten and swept under the rug. We want our tests to produce less noise, not 0 noise.
It’s important to note that studies [5] have shown that initial attempted fixes for flaky tests usually don’t succeed. You need to have a historical record to know if the fix reduced the flake rate or completely fixed the flaky test. This is another reason why we believe you should keep running tests quarantined.
There’s no good implementation of excluding tests by filter in Vitests, which makes naive implementations of test quarantining much harder. You can skip tests using `describe.skip`, but we don’t recommend this because there’s no way to run them still separately to track results and generate noise. Skipping tests is usually a death sentence to the test; it generates no noise and will be forgotten.
Since it’s hard to quarantine at build time anyway, you can skip straight to the better approach to quarantine at runtime, which means quarantining failures without updating the code. As tests are labeled flaky or return to a healthy status, they should be quarantined and unquarantined automatically. This is especially important for large repos with many contributors. A possible approach to accomplish this is to host a record of known flaky tests, run all tests, and then check if all failures are due to known flaky tests. If all tests are from known flaky tests, override the exit code of the CI job to pass.
If you’re looking for a way to quarantine flaky tests at runtime, Trunk Flaky Test can help you here. Trunk will check if failed tests are known to be flaky and unblock your PRs if all failures can be quarantined.
Learn more about Trunk’s Quarantining.
Fixing Flaky Tests in Vitest
We’ve covered some common anti-patterns earlier to help you avoid flaky tests, but if your flaky test is due to a more complex cause, how you approach a fix will vary heavily. We can’t show you how to fix every way your tests flake; it can be very complex. Instead, let’s cover prioritizing which tests to fix and reproducing flaky tests.
When deciding on which tests to fix first, we first need a way to rank them by their impact. What we discovered to be a good measure of impact when we worked with our partners is that flaky tests blocking the most PRs should be fixed first. Ultimately, we want to eliminate flaky tests because they block PRs from being merged when they fail in CI. You can do this by tracking the number of times a known flaky test fails in CI on PR branches, either manually for smaller projects or automatically with a tool.
This also helps you justify the engineering effort put towards fixing flaky tests because with tech-debt, justifying the time invested is often a bigger blocker than the fix itself. When you reduce the number of blocked PRs, you save expensive engineering hours. You can further extrapolate the number of engineering hours saved by factoring in context-switching costs, which some studies show to be ~23 minutes per context switch for knowledge workers.
If you’re looking for a straightforward way to report flaky tests for any language or framework, see their impact on your team, find the highest impact tests to fix, and track past failures and stack traces, you can try Trunk Flaky Tests.

Learn more about Trunk Flaky Tests dashboards.
Need help?
Flaky tests take a combination of well-written tests, taking advantage of your test framework’s capabilities and good tooling to eliminate. If you write E2E tests with Vitest and face flaky tests, know that you’re not alone. You don’t need to invent your own tools. Trunk can give you the tools needed to tackle flaky tests:
Autodetect the flaky tests in your build system
See them in a dashboard across all your repos
Quarantine tests manually or automatically
Get detailed stats to target the root cause of the problem
Get reports weekly, nightly, or instantly sent right to email and Slack
Intelligently file tickets to the right engineer
Resources
Romano, A., Song, Z., Grandhi, S., Yang, W., & Wang, W. (2021). An Empirical Analysis of UI-based Flaky Tests. arXiv preprint. https://arxiv.org/pdf/2103.02669
Gao, Z., Liang, Y., Cohen, M. B., Memon, A. M., & Wang, Z. (2015). Making System User Interactive Tests Repeatable: When and What Should we Control? ICSE 2015: Proceedings of the 37th International Conference on Software Engineering. https://www.cs.umd.edu/~atif/pubs/gao-icse15.pdf
Listfield, J. (2017, April 17). Where do our flaky tests come from? Google Testing Blog. https://testing.googleblog.com/2017/04/where-do-our-flaky-tests-come-from.html
Haoyi, L. (2025, January 1). How To Manage Flaky Tests in your CI Workflows. Mill Blog. https://mill-build.org/blog/4-flaky-tests.html
Lam, W., Muşlu, K., Sajnani, H., & Thummalapenta, S. (2020). A study on the lifecycle of flaky tests. ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 1471-1482. https://doi.org/10.1145/3377811.3381749
Lam, W., Muşlu, K., Sajnani, H., & Thummalapenta, S. (n.d.). A Study on the Lifecycle of Flaky Tests. University of Illinois at Urbana-Champaign and Microsoft. https://mir.cs.illinois.edu/winglam/publications/2020/LamETAL20FaTB.pdf