Testing SvelteKit Image Pipelines: Vitest, Playwright, Visual Regression, Lighthouse CI

Tests Are What Keeps the Audit Green

Thirteen lessons of architecture and code, all of it pointed at the audit you will run in Lesson 15. The numbers in that audit are not static. Every PR that touches a component, every dependency upgrade, every refactor that looks innocuous can quietly drift Performance from 96 back to 78. Most of these regressions are not obvious in code review. They show up later, in production, after the team has moved on to other work.

The defence is tests that run in CI on every PR and refuse to let the audit slip. Not “tests” in the sense of “we have a __tests__ folder”, that floor has been the industry standard for years. The defence specific to an image pipeline has four distinct layers, each catching a different class of regression that the others cannot see.

This lesson is the testing strategy for the stack you have built. Vitest for the worker contract - the message types, the encode/decode handlers, the LQIP byte-budget. Playwright for the optimizer flow end-to-end - drop a file, watch the variants appear, verify the database row. Visual regression for the LQIP cross-fade and the layout-shift behaviour, the things that look right by sight but are easy to break invisibly. Lighthouse CI for per-route performance budgets that fail the build when LCP, CLS, or TBT regresses past your threshold.

By the end you will have a four-layer test stack that catches regressions at the cheapest possible level. The unit tests run in milliseconds; Playwright in seconds; visual regression on PRs that touch components; Lighthouse CI as the final gate before merge.

The Four Layers, in Order of Cost

Before any code, the framing. Tests have a cost-per-run and a coverage-per-cost ratio. The cheapest test that catches a regression is the right test. Running everything at the most expensive layer (Lighthouse on every PR) works but is wasteful and slow.

The four layers, ordered from cheapest to most expensive:

Vitest unit tests (~10ms per test). Pure functions, the worker message contract, the LQIP encoding pipeline. Run on every save in watch mode and on every commit in CI.
Playwright integration tests (~2–10s per test). The optimizer’s squash flow, the upload queue’s enforcement gate, the gallery’s rendering. Run on every PR.
Visual regression (~5–15s per snapshot). The LQIP blur-up, the comparison slider, the gallery layout. Run on PRs that touch components or styles, gated by a path filter so unrelated PRs do not pay the cost.
Lighthouse CI performance budgets (~30–90s per route). LCP, CLS, TBT against a deployed preview. Run on every PR that produces a deploy, with budget thresholds that fail the build on regression.

The principle is to push every regression to the cheapest layer that can catch it. A bug in sanitizeText belongs in a Vitest unit test; the LQIP cross-fade belongs in visual regression; an LCP regression caused by an accidentally-removed fetchpriority="high" belongs in Lighthouse CI. Catching the LCP regression in a unit test is impossible; catching the sanitizeText bug in Lighthouse is wasteful.

Layer 1: Unit-Testing the Worker Contract

The worker from Lesson 6 has a typed message contract: inbound messages are load, encode, encode-variants, encode-lqip; outbound messages are ready, encoded, variants, lqip, error. Every consumer in the codebase reads from those types. A typo in a type discriminant or a forgotten case in a switch is the kind of bug that compiles and ships without anyone noticing.

The test setup uses Vitest, which is already the default for SvelteKit projects. Workers themselves are tricky to instantiate inside a unit test (the WASM modules expect a real worker context), so the right move is to test the message handler as a pure function and use a mocked codec layer for the WASM call sites.

// src/lib/workers/image-optimizer.worker.test.ts
import { describe, it, expect, vi, beforeEach } from 'vitest'
import type { InboundMessage, OutboundMessage } from './image-optimizer.worker'

// Mock @jsquash before importing the handler under test. The mocks return
// synthetic ImageData and ArrayBuffers so the test runs in milliseconds
// without touching real WASM.
vi.mock('@jsquash/jpeg', () => ({
	decode: vi.fn(async () => ({
		data: new Uint8ClampedArray(40_000),
		width: 100,
		height: 100,
		colorSpace: 'srgb' as PredefinedColorSpace
	}))
}))

vi.mock('@jsquash/webp', () => ({
	decode: vi.fn(),
	encode: vi.fn(async () => new ArrayBuffer(1024))
}))

vi.mock('@jsquash/avif', () => ({
	encode: vi.fn(async () => new ArrayBuffer(2048))
}))

vi.mock('@jsquash/resize', () => ({
	default: vi.fn(async (data: ImageData, opts: { width: number; height: number }) => ({
		data: new Uint8ClampedArray(opts.width * opts.height * 4),
		width: opts.width,
		height: opts.height,
		colorSpace: 'srgb' as PredefinedColorSpace
	}))
}))

// Import the module after the mocks are registered. Refactor the worker so the
// handler logic is exported as a pure function: handleMessage(msg, postMessage).
import { handleMessage } from './image-optimizer.worker'

describe('image worker - message contract', () => {
	let posted: OutboundMessage[]
	const post = (msg: OutboundMessage) => {
		posted.push(msg)
	}

	beforeEach(() => {
		posted = []
	})

	it('emits ready after a successful load', async () => {
		const buffer = new ArrayBuffer(1024)
		// Pretend the buffer starts with the JPEG magic bytes the detector looks for.
		const view = new Uint8Array(buffer)
		view[0] = 0xff
		view[1] = 0xd8
		view[2] = 0xff

		await handleMessage({ type: 'load', buffer } satisfies InboundMessage, post)

		expect(posted).toHaveLength(1)
		expect(posted[0]).toEqual({ type: 'ready' })
	})

	it('rejects encode before load with a token-tagged error', async () => {
		await handleMessage({ type: 'encode', quality: 80, token: 7 }, post)

		expect(posted).toHaveLength(1)
		expect(posted[0]).toMatchObject({ type: 'error', token: 7 })
	})

	it('encode-lqip produces a data URL under 2 KB', async () => {
		// Load first.
		const buffer = new ArrayBuffer(1024)
		const view = new Uint8Array(buffer)
		view[0] = 0xff
		view[1] = 0xd8
		view[2] = 0xff
		await handleMessage({ type: 'load', buffer }, post)
		posted.length = 0

		await handleMessage({ type: 'encode-lqip', token: 1 }, post)

		expect(posted).toHaveLength(1)
		const msg = posted[0]
		expect(msg.type).toBe('lqip')
		if (msg.type !== 'lqip') return // narrow for TS
		expect(msg.dataUrl).toMatch(/^data:image\/webp;base64,/)
		expect(msg.dataUrl.length).toBeLessThan(2000)
	})

	it('encode-variants returns one entry per IMAGE_WIDTHS that fits in the source', async () => {
		const buffer = new ArrayBuffer(1024)
		const view = new Uint8Array(buffer)
		view[0] = 0xff
		view[1] = 0xd8
		view[2] = 0xff
		await handleMessage({ type: 'load', buffer }, post)
		posted.length = 0

		await handleMessage({ type: 'encode-variants', token: 1 }, post)

		const msg = posted[0]
		expect(msg.type).toBe('variants')
		if (msg.type !== 'variants') return
		// Source is 100×100; only 400w would not fit, so variants array has zero
		// entries (the worker skips widths > source.width). With a 1600-pixel
		// mock source, all three would fit.
		expect(Array.isArray(msg.variants)).toBe(true)
	})

	it('error responses on encode carry the request token, not -1', async () => {
		await handleMessage({ type: 'encode', quality: 80, token: 99 }, post)

		const msg = posted[0]
		expect(msg.type).toBe('error')
		if (msg.type !== 'error') return
		expect(msg.token).toBe(99) // not -1, which is the load-error sentinel
	})
})

Five tests, none of them longer than ten lines, covering the parts of the worker contract a refactor is most likely to break: the ready ack, the missing-load error path, the LQIP byte budget, the variant-array shape, and the token-discrimination between load errors and encode errors. Add similar coverage for the sanitizer helpers from Lesson 5 (these are pure functions and trivially testable) and the URL builder from Lesson 11.

The discipline that makes this scale: refactor the worker file so the dispatch logic lives in an exported handleMessage function rather than inlined in ctx.onmessage. The Vite worker entry stays a one-liner that wires ctx.onmessage to handleMessage(event.data, ctx.postMessage.bind(ctx)). This separation is the entire reason the unit tests are fast, they call handleMessage directly with no Worker overhead.

Layer 2: Playwright Integration Tests

The optimizer integrates many parts that work fine in isolation but can break at the seams. The DropZone passes a File to the optimizer, the optimizer dispatches to the worker, the worker returns variants, the page renders the comparison slider - every one of those handoffs is a place where TypeScript catches the obvious bugs and a Playwright test catches the rest.

// e2e/optimizer.spec.ts
import { test, expect } from '@playwright/test'
import { readFile } from 'fs/promises'
import { resolve } from 'path'

test.describe('Optimizer single-file flow', () => {
	test('drops a JPEG and produces a WebP under 50% of the original size', async ({ page }) => {
		await page.goto('/optimizer')

		const fixturePath = resolve(__dirname, 'fixtures/sample-photo.jpg')
		const fixtureBytes = await readFile(fixturePath)

		// Playwright's setInputFiles works against the hidden <input type=file>
		// inside the DropZone; the optimizer treats this exactly like a
		// drag-and-drop because both paths funnel through the same callback.
		await page.setInputFiles('input[type="file"]', fixturePath)

		// The page advances through statuses; wait for the comparison slider
		// to appear, which only renders when status === 'done'.
		const comparisonSlider = page.locator('[role="slider"]')
		await expect(comparisonSlider).toBeVisible({ timeout: 15_000 })

		// Pull the displayed sizes out of the stats panel and assert ratio.
		const originalSize = await page.locator('.comparison__stat:first-child strong').textContent()
		const optimizedSize = await page.locator('.comparison__stat:last-child strong').textContent()
		const originalKb = parseFloat(originalSize!.replace(' KB', ''))
		const optimizedKb = parseFloat(optimizedSize!.replace(' KB', ''))

		expect(originalKb).toBeGreaterThan(0)
		expect(optimizedKb).toBeLessThan(originalKb * 0.5)

		// Verify the original bytes never left the browser.
		// Playwright surfaces every network request on the page object.
		const requests = page.on('request', () => {})
		// (Real implementation: collect requests during the upload window and
		// assert no PUT carries fixtureBytes.length bytes.)
	})

	test('quality slider re-encodes without re-decoding', async ({ page }) => {
		await page.goto('/optimizer')
		await page.setInputFiles('input[type="file"]', resolve(__dirname, 'fixtures/sample-photo.jpg'))
		await page.waitForSelector('[role="slider"]')

		// Capture the time-to-result for the initial encode.
		const t0 = Date.now()
		await page.fill('input[type="range"]', '40')
		await page.locator('input[type="range"]').dispatchEvent('change')
		await page.waitForFunction(() => {
			const stat = document.querySelector('.comparison__stat:last-child .comparison__stat-label')
			return stat?.textContent?.includes('q=40')
		})
		const reencodeMs = Date.now() - t0

		// Cached-decode re-encodes consistently land under 500ms even on CI.
		// The first encode (which includes decode) typically takes 800–1500ms.
		// If the slider re-encode regresses past 500ms, the decode-cache has
		// likely been broken.
		expect(reencodeMs).toBeLessThan(500)
	})
})

test.describe('Upload queue alt-text gate', () => {
	test('submit button stays disabled until every queue item has alt text', async ({ page }) => {
		await page.goto('/upload')

		await page.setInputFiles('input[type="file"]', [
			resolve(__dirname, 'fixtures/sample-photo.jpg'),
			resolve(__dirname, 'fixtures/sample-photo.jpg')
		])

		const submit = page.locator('.upload-queue__submit')
		await expect(submit).toBeDisabled()
		await expect(submit).toContainText('Describe every image')

		// Fill alt for first item only.
		await page.locator('.upload-queue__item').first().locator('textarea').fill('A teapot')
		await expect(submit).toBeDisabled()

		// Fill alt for second item; gate releases.
		await page.locator('.upload-queue__item').last().locator('textarea').fill('A second teapot')
		await expect(submit).toBeEnabled()
		await expect(submit).toContainText('Upload 2 images')
	})
})

Three tests cover the highest-value end-to-end paths: that compression actually compresses, that the cached-decode optimisation from Lesson 6 still works (a 500ms re-encode budget is roughly 5× faster than a fresh decode, so the test fails loudly if the cache is broken), and that the alt-text enforcement gate from Lesson 12 still gates correctly.

The fixtures matter. Use real photographs at realistic sizes (~3MB, 4000×3000), not 100×100 synthetic test images. Bugs in the responsive-variant pipeline often only manifest above the smallest-variant threshold, and a 100-pixel fixture can never trigger them. Keep fixtures committed in e2e/fixtures/ with file sizes documented in the test names so a reader can immediately tell what each one exercises.

Layer 3: Visual Regression for the LQIP Cross-Fade

Layouts and image-loading transitions are the kind of code where “looks right” is the only test that matters. CSS that compiles and renders something is almost always going to compile and render something; the question is whether what it renders is what you want. Visual regression tests answer that question by capturing a screenshot, comparing against a baseline, and failing the build on pixel-level diffs.

The pragmatic tool in 2026 is Playwright’s built-in visual comparison, which uses the same browser as the integration tests and stores baseline screenshots in the repository.

// e2e/visual.spec.ts
import { test, expect } from '@playwright/test'

test.describe('LazyImage visual states', () => {
	test('LQIP placeholder renders before the full image loads', async ({ page }) => {
		// Throttle the network so the placeholder is visible long enough to capture.
		const cdp = await page.context().newCDPSession(page)
		await cdp.send('Network.enable')
		await cdp.send('Network.emulateNetworkConditions', {
			offline: false,
			latency: 1000,
			downloadThroughput: 100 * 1024, // 100 KB/s
			uploadThroughput: 100 * 1024
		})

		await page.goto('/products/sample-product')

		// Capture the page within 100ms of navigation, before the full image
		// has finished downloading.
		await expect(page.locator('.lazy-image').first()).toHaveScreenshot('lqip-placeholder.png', {
			maxDiffPixelRatio: 0.01
		})
	})

	test('full image replaces placeholder after onload', async ({ page }) => {
		await page.goto('/products/sample-product')

		// Wait for the loaded class to land.
		await page.waitForSelector('.lazy-image img.loaded', { timeout: 10_000 })

		await expect(page.locator('.lazy-image').first()).toHaveScreenshot('lqip-loaded.png', {
			maxDiffPixelRatio: 0.01
		})
	})

	test('comparison slider divider is visible at default 50%', async ({ page }) => {
		await page.goto('/optimizer')
		await page.setInputFiles('input[type="file"]', 'e2e/fixtures/sample-photo.jpg')
		await page.waitForSelector('[role="slider"]')

		// The divider is a CSS-driven element with var(--accent-primary-base);
		// regressions where the CSS var resolves to empty cause the divider
		// to vanish. Visual regression catches this; nothing else does.
		await expect(page.locator('.comparison')).toHaveScreenshot('comparison-default.png')
	})
})

The first test catches the most subtle category of bug: the LQIP placeholder failing to render at all. This typically happens when the database column is misconfigured (returning null), the component prop is renamed without updating the call site, or the CSS background-image declaration is stripped by a misconfigured PostCSS transform. None of these cause errors; they just make the placeholder silently disappear.

The third test guards the comparison slider divider, which depends on a CSS variable resolving to a real colour (the var(--accent-primary-base) from Lesson 4). A regression where someone removes the :root declaration in app.css makes the divider invisible without breaking any TypeScript or runtime check. Visual regression catches it on the first PR.

Run visual regression only on PRs that touch component files (src/lib/components/**) or stylesheet files (src/app.css, **/*.svelte). A path filter in the GitHub Actions workflow keeps these tests off PRs that change the README. Baseline updates happen via a maintainer running pnpm playwright test --update-snapshots and committing the new images.

Layer 4: Lighthouse CI Performance Budgets

The audit from Lesson 15 is the artefact your future self will run on every site. The CI version of it is Lighthouse CI: the same audit, run automatically against every PR’s deployment preview, with budget thresholds that fail the build on regression.

The setup is two pieces: a lighthouserc.cjs config that declares the budgets and the routes to audit, and a CI job that runs the audit and posts the results.

// lighthouserc.cjs
module.exports = {
	ci: {
		collect: {
			url: [
				'https://${PREVIEW_HOST}/',
				'https://${PREVIEW_HOST}/products/sample-product',
				'https://${PREVIEW_HOST}/blog/sample-post'
			],
			settings: {
				preset: 'desktop', // also run a 'mobile' job; budgets differ
				throttlingMethod: 'simulate',
				screenEmulation: { mobile: false, width: 1350, height: 940 }
			},
			numberOfRuns: 3 // average across runs to reduce flake
		},
		assert: {
			assertions: {
				// Per-metric budgets. The numbers reflect Lesson 15's "after" results
				// with a small grace margin. Tighten these as the site stabilises.
				'categories:performance': ['error', { minScore: 0.9 }],
				'categories:accessibility': ['error', { minScore: 1.0 }],
				'largest-contentful-paint': ['error', { maxNumericValue: 2500 }],
				'cumulative-layout-shift': ['error', { maxNumericValue: 0.1 }],
				'total-blocking-time': ['error', { maxNumericValue: 200 }],
				// Image-specific opportunities should never fire after Lesson 8 ships.
				'uses-responsive-images': 'error',
				'uses-optimized-images': 'error',
				'uses-webp-images': 'off', // we use AVIF first; this audit is misleading
				'image-aspect-ratio': 'error',
				'image-size-responsive': 'error',
				'unsized-images': 'error'
			}
		},
		upload: {
			target: 'temporary-public-storage' // or your LHCI server
		}
	}
}

The CI integration:

# .github/workflows/lighthouse.yml
name: Lighthouse CI
on:
  pull_request:
    paths:
      - 'src/**'
      - 'static/**'
      - 'package.json'
      - 'lighthouserc.cjs'

jobs:
  lhci:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v3
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: 'pnpm'
      - run: pnpm install --frozen-lockfile
      # Wait for the deploy preview to be ready. Replace this step with your
      # platform's preview-ready check (Vercel, Cloudflare Pages, Netlify all
      # publish a deployment-status event).
      - name: Wait for preview
        id: preview
        run: |
          # ...platform-specific preview URL discovery...
          echo "host=${PREVIEW_HOST}" >> $GITHUB_OUTPUT
      - run: pnpm dlx @lhci/cli@0.13.x autorun
        env:
          PREVIEW_HOST: ${{ steps.preview.outputs.host }}
          LHCI_GITHUB_APP_TOKEN: ${{ secrets.LHCI_GITHUB_APP_TOKEN }}

Three properties of this setup are worth pinning down.

Multiple URLs catch route-specific regressions. The home page and the product page have different LCP candidates, different gallery layouts, different priority-hint configurations. A regression on one route does not necessarily affect the others. Audit the three or four representative routes that cover the distinct shapes of your site, not just the homepage.

numberOfRuns: 3 reduces flake. Lighthouse’s headless-Chrome runs are noisy at the millisecond level. A single run can vary by 200–400ms on LCP for purely environmental reasons. Three runs averaged is much more stable than one; five is overkill for most projects.

The path filter on the workflow gates cost. Lighthouse CI takes 60–120 seconds per route per run; nine runs at 90 seconds each is fifteen minutes of CI time per PR. Without the path filter, every README typo PR triggers this. With the filter, only PRs that touch source code, static assets, or the LHCI config itself pay the cost.

The uses-webp-images audit is disabled in the config above for a specific reason: the audit was written against WebP as the most modern format and does not understand AVIF. With AVIF as the primary format and WebP as the fallback, the audit fires false positives on every page. Disable it; the uses-optimized-images and image-size-responsive audits cover the underlying intent without the false negatives.

What to Test, What Not To

Coverage is not the goal. The goal is catching regressions before users do. Some tests in image pipelines have a poor cost-to-coverage ratio:

Don’t test that <img> renders. The browser does. Asserting that an <img> tag has a src attribute does not catch any realistic bug.

Don’t unit-test third-party encoders. @jsquash/webp is tested by its maintainers. The tests in your repo should cover your code’s interaction with it (the message contract, the buffer transfer, the error surface), not the codec output itself.

Don’t snapshot-test long Svelte component renders. Svelte’s compiled HTML output is verbose and changes between versions for cosmetic reasons. Snapshot tests of full component HTML produce huge diffs on every Svelte upgrade and add no real coverage. Visual regression on rendered pages is the better tool for the same intent.

Do test the parts that have already broken. Every regression that ships to production is a future test. After every incident, ask “what test would have caught this?” and add it. Over a few quarters, the test suite shapes itself around the actual failure modes of the codebase rather than imagined ones.

Common Mistakes and Anti-Patterns

Running every test on every PR

The four layers exist precisely to avoid this. Vitest unit tests run on every commit. Playwright integration runs on every PR. Visual regression runs only on PRs that touch components or styles. Lighthouse CI runs only on PRs that produce a deploy preview. A flat “run everything always” workflow makes CI slow enough that developers learn to ignore it, which defeats the purpose.

Asserting exact byte counts in tests

expect(optimizedBlob.size).toBe(184_321) is the kind of assertion that fails on every codec upgrade. Compression algorithms tune themselves over minor versions; the exact byte count is rarely stable. Assert ratios (< 50% of original) or thresholds (< 200KB) instead.

Using fixtures the size of icons

A 64×64 fixture can never trigger the responsive-variant logic, can never produce a meaningful LCP measurement, can never verify the worker pool actually parallelises. Use realistic photographs (3000×2000 or larger, ≥ 1MB) so the tests exercise the same code paths as production traffic.

Skipping flake handling on Lighthouse CI

Lighthouse runs are noisy. A 5% flake rate becomes one false failure per 20 PRs, which is enough that developers start ignoring the check. numberOfRuns: 3 and tolerant thresholds (LCP ≤ 2500ms rather than 1500ms) keep the signal-to-noise ratio acceptable. Tighten the thresholds gradually as the site stabilises.

Visual regression baselines in the wrong git workflow

Visual regression baselines change for legitimate reasons all the time - design updates, font upgrades, layout refactors. The workflow that breaks is “the maintainer updates baselines on the PR, the next maintainer’s PR has a stale baseline”. Standardise on main-branch baselines: every PR diffs against the version on main, baselines are updated only on merge by the maintainer, and the --update-snapshots flag is never run on a feature branch.

Treating Playwright as a unit-test framework

Playwright tests are 100–1000× slower than Vitest unit tests. Anything that can be tested as a pure function (validators, sanitizers, URL builders) belongs in Vitest. Reserve Playwright for the parts where the browser context is genuinely required: the DOM behaviour, the upload form, the Worker lifecycle, the <picture>-element negotiation.

Performance and Scaling

The test suite itself has a budget. Aim for the full local suite (Vitest + Playwright) to run in under five minutes; the CI suite (everything including Lighthouse) under fifteen. Past those numbers, developers stop running tests locally and start trusting the CI green check, which is the wrong signal-to-noise ratio.

The cheapest scaling move is parallelism. Vitest runs tests across multiple workers by default; Playwright supports --workers=N to fan out across CPU cores. Lighthouse CI’s numberOfRuns parameter parallelises automatically per route. None of these are surprising, but they are the difference between a 12-minute CI run and a 40-minute one.

Visual regression baselines grow with the test count. Twenty pages × three viewports × two themes = 120 baseline images, easily 50–100MB in the repo. This is fine for most projects, but at scale move the baselines out of git and into LFS or a dedicated storage backend (Percy, Chromatic, or self-hosted). The trigger to migrate is when git clone of the repo takes more than a minute on a fresh laptop.

The fixture story scales similarly. A handful of test photographs at 1–3MB each is fine; hundreds is not. If your test suite needs many fixtures, generate them programmatically in a beforeAll or commit a small stable set and use synthetic compression-target images for variant testing.

Conclusion

The image pipeline you have built across this track is a system, and systems decay without active maintenance. The four-layer testing strategy above is the maintenance. Vitest catches contract regressions in milliseconds, Playwright catches integration bugs in seconds, visual regression catches the layout drift that pixel-perfect compositions are vulnerable to, and Lighthouse CI catches the performance erosion that everything else misses.

The single most valuable property of this stack is that it pushes every regression to the cheapest possible layer. A bug in sanitizeText fails a Vitest unit test in 12 milliseconds; an LCP regression caused by an accidentally-removed priority hint fails Lighthouse CI before merge. Catching the first bug in Lighthouse would be wasteful; catching the second in a unit test is impossible. The layered structure is what makes the tests cheap enough to leave running on every PR without slowing the team down.

Lesson 15 is the proof. The audit at the end of that lesson is the artefact this lesson’s tests are designed to keep green over time. The tests are not a substitute for the audit; they are the mechanism that makes the audit’s result hold across the lifetime of the codebase.

Key Takeaways

The four-layer testing strategy: Vitest (unit, milliseconds), Playwright (integration, seconds), visual regression (component renders, gated by path filter), Lighthouse CI (per-route performance budgets, on every preview deploy).
Push regressions to the cheapest layer that can catch them. A sanitizeText bug belongs in Vitest, not Lighthouse. An LCP regression belongs in Lighthouse, not a unit test. Mismatched layer assignment is wasteful.
Refactor the worker into a pure handleMessage function so unit tests can call it without a Worker context. The Vite worker entry stays a one-liner.
Test the parts that have already broken. Every production regression becomes a test; the suite shapes itself around real failure modes over time.
Use realistic fixtures. A 64×64 test image cannot trigger responsive-variant logic. Commit 3MB photographs to e2e/fixtures/ and document file sizes in test names.
numberOfRuns: 3 on Lighthouse CI averages out flake. Five is overkill; one is too noisy. Three is the operational sweet spot.
Path-filter expensive workflows. Visual regression and Lighthouse CI run only on PRs that touch the relevant files. README typos do not pay the CI cost.
Visual regression baselines belong on main. PRs diff against main; baselines update on merge only. Updating baselines on a feature branch produces stale-baseline drift across PRs.
Disable the uses-webp-images Lighthouse audit when AVIF is your primary format. The audit was written against WebP and produces false positives.
Assert ratios and thresholds, not exact byte counts. Codec output drifts across versions; < 50% of original is stable, === 184_321 bytes is not.

Four-Layer Strategy