The Image and Its Story Travel Separately From Now On

In previous lessons you built a working optimizer: drop a file, get a compressed WebP. The pipeline is clean and the output is smaller than the input. But something is missing from that output, and it was missing from the moment @jsquash/jpeg finished decoding the file.

Every GPS coordinate, camera model, exposure setting, ISO value, and timestamp that the camera wrote into the original JPEG is gone from the compressed result. Not as a side effect and not by accident. The compression codec never had access to that information in the first place.

This creates a real problem for any application where that context matters. A photo management app needs the timestamp and camera details. A property listing platform wants to know where the photo was taken. A media archive needs to preserve the shooting conditions alongside the image. If you compress first and ask questions later, you have already lost the data you needed to save.

The solution sounds obvious once you state it: extract before you squash. This lesson builds the pipeline that makes that split automatic, types every step end to end, and explains why every value that comes back from the EXIF parser must be sanitized before it touches your database, your storage keys, or your DOM.


Why @jsquash Has No Access to Your Metadata

To understand why the metadata disappears, you need to understand what @jsquash actually receives when you call decode.

In Lesson 3, you learned that @jsquash/jpeg’s decode function takes an ArrayBuffer of raw JPEG bytes and returns an ImageData object. ImageData is a Web API type with exactly three properties: width, height, and data. The data property is a Uint8ClampedArray where every four consecutive values represent one pixel in RGBA order. A 1000 by 800 pixel image produces a data array of exactly 3,200,000 bytes: one unsigned 8-bit integer per channel, four channels per pixel, 800,000 pixels total.

That is everything ImageData contains. There is no field for file format, no field for compression settings, and no field for embedded metadata. It is a pure, format-neutral grid of pixel values. The JPEG decoder reads the compressed bytes, reconstructs the pixel colours, and hands the result to the encoder as ImageData. The EXIF block in the original file is simply not part of that reconstruction; the codec reads the image segments it needs and discards the rest.

When @jsquash/webp’s encode function takes that ImageData and produces a WebP ArrayBuffer, it writes a valid WebP file containing exactly those pixel values at the quality you requested. There is nowhere in that process where EXIF data could be carried across, because ImageData never held it.

This is the correct behaviour. Squoosh was built to reduce file weight, and EXIF data has no effect on the visual output. But it does mean that any information you want to preserve from the original file must be extracted before decode is called. After that point, it is gone.


What EXIF Actually Contains

Now that you know why the data disappears, it is worth understanding exactly what you stand to lose, because the range is broader than most developers expect.

At the useful end of the spectrum you have:

  • Make and Model: the camera manufacturer and exact model name
  • DateTimeOriginal: the timestamp recorded by the camera at the moment of capture
  • GPSLatitude and GPSLongitude: the coordinates where the photo was taken, to within a few metres on modern smartphones
  • FNumber: the aperture (f/1.8, f/4, etc.)
  • ISO: the sensor sensitivity setting at the time of shooting
  • ExposureTime: the shutter speed expressed as a decimal fraction

At the more obscure end, EXIF can also contain the camera serial number, the exact lens model, white balance settings, flash status, and on some phones the altitude at the time of capture. A raw, unedited JPEG from a modern smartphone can carry 50 to 100 KB of embedded metadata without a single pixel of image data contributing to that size. The metadata block alone can be larger than the entire compressed output that @jsquash produces.

This is also why stripping EXIF improves file size independently of compression. A clean WebP has no metadata overhead at all. Across thousands of uploads, that adds up.


The Privacy Case for Keeping Metadata Out of Public Files

The GPS data deserves a separate discussion because the privacy implications are not always obvious until you think them through.

A photo taken at home with a modern smartphone encodes the latitude and longitude of that home to within a few metres. When that photo is uploaded to a web application and served publicly without processing, any visitor who downloads the image file can extract those coordinates using freely available tools. This is not a theoretical concern. It is a documented failure mode that has affected journalists, abuse survivors, and public figures who did not realise their photos were tagged.

The extract-to-database pattern prevents this at the architecture level. The application receives the raw image, reads the metadata it needs for its own purposes, then serves only the stripped and compressed output. The GPS data is available inside your system if you need it for a map view, for search, or for audit logging; it never travels to the user’s browser inside the image file. The user’s device is the only place the original data ever existed in full.

Never Re-inject Metadata Into Public-Facing Files

Once you have extracted the metadata and served a stripped image, do not put the data back. Even if you store GPS coordinates in your database for internal features, the image served to end users should remain clean. Re-injecting metadata into a public file removes the privacy guarantee you just created.

There is an important corollary: if your application surfaces coordinates in any user-visible feature such as a map view, location-based search, or proximity filters, your privacy policy must disclose this explicitly. In many jurisdictions including the EU under GDPR, location data extracted from user-uploaded content is classified as sensitive personal data and requires a lawful basis for processing.


exifr: Reading EXIF Without Reading the Whole File

exifr is a purpose-built JavaScript EXIF parsing library with a design that fits naturally into a browser-side pipeline. It is fast, it works natively in the browser without build configuration, and it is selective by default.

The key performance characteristic is pointer-based parsing. An image file’s EXIF block has a defined location and internal structure, and exifr navigates directly to the tags you request rather than reading the file sequentially. Parsing a handful of specific tags from a 10 MB JPEG takes single-digit milliseconds because exifr reads only the bytes it needs.

Install it:

pnpm add exifr

The primary function is parse:

import { parse } from 'exifr'

parse accepts a File object, a Blob, an ArrayBuffer, a URL, or an <img> element. It returns a Promise resolving to an object with the requested tag values, or undefined when no EXIF block exists.

const exif = await parse(file, {
	pick: ['Make', 'Model', 'DateTimeOriginal', 'ISO', 'FNumber']
})

// exif.Make      -> 'Apple'
// exif.Model     -> 'iPhone 15 Pro'
// exif.latitude  -> 50.2996  (pre-converted from DMS to decimal degrees)
// exif.longitude -> 14.3987

One distinction worth knowing: GPSLatitude and GPSLongitude (uppercase, with GPS prefix) are the raw degree-minute-second arrays from the EXIF block. latitude and longitude (lowercase, no prefix) are the decimal-degree values that exifr derives automatically. Use the lowercase versions; they are directly storable in a database without further conversion.

Always Pick the Hemisphere Refs Too

The derived latitude and longitude are only correct if exifr can read all four GPS tags: GPSLatitude, GPSLatitudeRef, GPSLongitude, and GPSLongitudeRef. The DMS magnitudes are unsigned; the Ref tags carry the 'N' | 'S' and 'E' | 'W' hemisphere letters, and exifr uses them to apply the sign. If your pick list omits the refs, photos taken in the southern or western hemispheres come out with the wrong sign and end up hundreds (or thousands) of kilometres away from where they were taken - for example, a New York photo lands in central Asia, a Sydney photo lands in the north Pacific.

Whenever you pick GPSLatitude / GPSLongitude, pick GPSLatitudeRef / GPSLongitudeRef too.

When parse encounters a file with no EXIF block (a programmatically generated PNG, a screenshot, a WebP that was already compressed), it returns undefined rather than throwing. Your code needs to treat this as a normal outcome, because a significant proportion of real-world uploads are files with no camera context at all.

parse is also untyped at runtime: even with pick, the library declares its return type as any because the EXIF spec allows almost any value to live behind any tag. That looseness becomes our problem in the next section.


EXIF Is User Input: Sanitize Before You Store

The values that come back from parse are not yours. They are whatever the camera, the photo-editor, or, in the worst case, a deliberate attacker chose to embed in the file. Before any of that data crosses into your database, your storage layer, your file names, or your DOM, it has to be filtered as carefully as a form submission.

The threats are mundane and routine:

  • Memory pressure. A crafted EXIF block can declare a Make field tens of kilobytes long. Multiply that by a few hundred uploads queued in the same session and a naïve pipeline runs the browser out of memory before compression even finishes.
  • Out-of-band characters. EXIF strings can contain NUL bytes, ANSI escape sequences, control codes, and arbitrary Unicode. They will silently truncate logs, break shell-based filename handling, corrupt CSV exports of your database, and inject control codes into terminal-based admin tooling.
  • Script and markup payloads. Nothing in the EXIF spec stops a Make field from containing <script>alert(1)</script>, javascript:fetch('//evil'), or a CSV formula like =cmd|'/c calc'!A1. These survive sanitizeText intentionally; the defence belongs at every render site, not at the input.
  • Path traversal in derived names. If a value is fed into buildStorageKey or any filename without filtering, an attacker can submit a Make like ../../../etc/passwd and try to influence your storage layout. UUIDs as the actual key prevent this, but the human-readable semantic name needs the same hygiene.
  • Out-of-range coordinates. Buggy firmware emits (0, 0) (“Null Island”), NaN, or values like latitude 91. Storing these without bounds checking corrupts geographic queries, breaks map URLs, and can be weaponised to influence proximity-based features.
  • Wrong types. The EXIF spec is loose enough that a single corrupted file can return a string where you expected a number. Without a typeof check before storage, the column types in your database start drifting and indexes silently degrade.

The pattern that solves all of this is the same one you already use for HTTP request bodies: a small set of sanitizer functions that take unknown, validate, and return a known type or null. Any value that fails sanitization becomes null in the output object. Compression continues unaffected.

// src/lib/image/metadata.svelte.ts (helper section, lives at the top of the file)

// Cap any text we'll store. Most legitimate Make/Model strings are under
// 32 chars; 128 leaves headroom for unusual lens names without giving an
// attacker room to allocate megabytes of UTF-16 in our state.
const MAX_TEXT_LENGTH = 128

function sanitizeText(value: unknown): string | null {
	if (typeof value !== 'string') return null
	// Strip Unicode "C" category: control, format, and surrogate code points
	// that have no visible representation but break logs, slugs, and SQL.
	const cleaned = value.normalize('NFC').replace(/\p{C}/gu, '').trim()
	if (!cleaned) return null
	return cleaned.length > MAX_TEXT_LENGTH ? cleaned.slice(0, MAX_TEXT_LENGTH) : cleaned
}

function sanitizeFiniteNumber(value: unknown, min: number, max: number): number | null {
	if (typeof value !== 'number' || !Number.isFinite(value)) return null
	if (value < min || value > max) return null
	return value
}

function sanitizeCoordinates(lat: unknown, lng: unknown): { lat: number; lng: number } | null {
	const latNum = sanitizeFiniteNumber(lat, -90, 90)
	const lngNum = sanitizeFiniteNumber(lng, -180, 180)
	if (latNum === null || lngNum === null) return null
	// Reject the (0, 0) "Null Island" sentinel; it is almost always a sensor
	// glitch or a default value rather than a real location.
	if (latNum === 0 && lngNum === 0) return null
	return { lat: latNum, lng: lngNum }
}

function sanitizeIsoDate(value: unknown): string | null {
	if (!(value instanceof Date)) return null
	if (!Number.isFinite(value.getTime())) return null // 'Invalid Date'
	const year = value.getUTCFullYear()
	// EXIF is full of 1970-01-01 sentinels written by buggy firmware, and
	// nothing useful predates 1980. The +1 on the upper bound tolerates
	// devices with a slightly fast clock.
	if (year < 1980 || year > new Date().getUTCFullYear() + 1) return null
	return value.toISOString()
}

Three things make this approach scale:

  1. Every helper returns T | null and never throws. The extractor’s caller does not need a try/catch for malformed values; it just sees null for that field and stores null in the database.
  2. The bounds are written down. ISO sensitivity has a real range (25–409600), apertures don’t go below f/0.5 or above f/64, exposures longer than an hour are physically possible but vanishingly unlikely. Encoding those numbers in the sanitizer makes the policy reviewable instead of hiding it inside ad-hoc checks.
  3. The output is always JSON-serialisable. No Date instances, no BigInt, no class wrappers. Whatever the database driver does with the row, it does not have to special-case EXIF.
Treat exifr Output Like a Form Body

Whatever schema validator you use for HTTP requests (Valibot, Zod, ArkType, tRPC, your own), the same library can replace the helpers above. The point is that the EXIF parser’s return value crosses a trust boundary the moment it hands you the object, exactly like an HTTP body.

Why <script>alert(1)</script> Is Allowed Through sanitizeText

sanitizeText strips Unicode “C” category code points, normalises, trims, and caps length. It deliberately does not strip <, >, &, ", ', or javascript: prefixes. A camera Make of <script>alert(1)</script> survives the sanitizer in full. That is the correct behaviour.

Stripping HTML at the input layer is the wrong defence for two reasons:

  1. It corrupts legitimate values. Lens names like EF 70-200mm f/2.8L IS USM <II> (yes, manufacturers do ship strings like that) get mangled.
  2. It only protects one output context. The same value also gets written to a database row, a CSV export, a JSON-in-script block, a filename, and a clipboard payload. HTML escaping is meaningless in any of those.

The defence is context-aware escaping at every render site, applied after the value leaves the trust-boundary helpers:

  • DOM interpolation. Svelte’s {value} syntax HTML-escapes <, >, &, ", ' automatically. {meta.info.camera} shows <script> as literal text. {@html meta.info.camera} would execute it; never use {@html} with EXIF.
  • Anchor href. Build URLs from a fixed scheme + encodeURIComponent‘d parameters, like mapsUrl does in the panel. Never assign an EXIF string directly to href; an attacker can supply javascript: if you do.
  • Storage keys / filenames. slugify strips everything outside [a-z0-9-], so a Make of <script> lands as script in the bucket. The < and > cannot survive into a path.
  • CSV / Excel exports. Prefix any cell that begins with =, +, -, @, or a tab/CR with a single quote so spreadsheet apps don’t evaluate it as a formula.
  • JSON embedded in HTML. When you write JSON.stringify(payload) inside a <script> tag (e.g. SvelteKit’s data hydration), escape </, <!--, and U+2028 / U+2029 so the JSON cannot break out of the tag.
Never Render EXIF With `@html` or Direct `href`

The two ways an EXIF script payload escapes the panel are: rendering it with {@html value} instead of {value}, and assigning it to an href without a scheme allow-list. Both are easy to introduce by accident; both are catastrophic. If a designer asks for “clickable camera names”, build a URL from sanitized parts the way mapsUrl is built, never from the raw string.


Designing the Logic Module

The extraction logic belongs in a .svelte.ts file: a module that uses runes for reactive state and exposes a typed factory function. Components call the factory, receive the returned reactive object, and read from it as state changes.

The extractor moves through three distinct conditions. Before extract is called it has no data. While parse is running it is loading. After parse resolves it either has normalised, sanitized metadata or has confirmed that the file had none.

// src/lib/image/metadata.svelte.ts
import { parse } from 'exifr'

// Public shape of every extracted record. Every field is nullable because
// any individual value can fail sanitization or be absent from the source.
export interface ExtractedMetadata {
	camera: string | null
	takenAt: string | null // ISO 8601
	coordinates: { lat: number; lng: number } | null
	aperture: number | null
	iso: number | null
	shutterSpeed: number | null // seconds, raw decimal
}

// (sanitizer helpers from the previous section live here)
const MAX_TEXT_LENGTH = 128

function sanitizeText(value: unknown): string | null {
	if (typeof value !== 'string') return null
	const cleaned = value.normalize('NFC').replace(/\p{C}/gu, '').trim()
	if (!cleaned) return null
	return cleaned.length > MAX_TEXT_LENGTH ? cleaned.slice(0, MAX_TEXT_LENGTH) : cleaned
}

function sanitizeFiniteNumber(value: unknown, min: number, max: number): number | null {
	if (typeof value !== 'number' || !Number.isFinite(value)) return null
	if (value < min || value > max) return null
	return value
}

function sanitizeCoordinates(lat: unknown, lng: unknown): { lat: number; lng: number } | null {
	const latNum = sanitizeFiniteNumber(lat, -90, 90)
	const lngNum = sanitizeFiniteNumber(lng, -180, 180)
	if (latNum === null || lngNum === null) return null
	if (latNum === 0 && lngNum === 0) return null
	return { lat: latNum, lng: lngNum }
}

function sanitizeIsoDate(value: unknown): string | null {
	if (!(value instanceof Date)) return null
	if (!Number.isFinite(value.getTime())) return null
	const year = value.getUTCFullYear()
	if (year < 1980 || year > new Date().getUTCFullYear() + 1) return null
	return value.toISOString()
}

/**
 * `raw` is the unfiltered exifr output. We expose it for the UI panel so the
 * reader can see what got dropped or capped during sanitization. It is
 * NEVER part of the upload payload; only `info` crosses the trust boundary.
 */
export interface MetadataExtractor {
	readonly info: ExtractedMetadata | null
	readonly raw: Record<string, unknown> | null
	readonly loading: boolean
	readonly error: string | null
	extract(file: File): Promise<ExtractedMetadata | null>
}

export function createMetadataExtractor(): MetadataExtractor {
	let info = $state<ExtractedMetadata | null>(null)
	let rawState = $state<Record<string, unknown> | null>(null)
	let loading = $state<boolean>(false)
	let error = $state<string | null>(null)

	async function extract(file: File): Promise<ExtractedMetadata | null> {
		loading = true
		error = null
		info = null
		rawState = null

		try {
			// exifr's return type is `any`. Cast to a record of unknowns so the
			// sanitizer helpers receive `unknown` and we can't accidentally
			// read a property without validating it first.
			//
			// GPSLatitudeRef and GPSLongitudeRef MUST be picked alongside
			// GPSLatitude/GPSLongitude. exifr signs the derived `latitude` /
			// `longitude` decimals using the N/S and E/W hemisphere refs;
			// without them, southern and western coordinates land in the wrong
			// hemisphere.
			const raw = (await parse(file, {
				pick: [
					'Make',
					'Model',
					'DateTimeOriginal',
					'ISO',
					'FNumber',
					'ExposureTime',
					'GPSLatitude',
					'GPSLatitudeRef',
					'GPSLongitude',
					'GPSLongitudeRef'
				]
			})) as Record<string, unknown> | undefined

			// undefined means no EXIF block was found; treat as a valid, empty result
			if (!raw) {
				info = null
				rawState = null
				return null
			}

			rawState = raw

			const make = sanitizeText(raw.Make)
			const model = sanitizeText(raw.Model)

			const result: ExtractedMetadata = {
				// If only one of make/model survives sanitization, return that
				// rather than dropping both.
				camera: make && model ? `${make} ${model}` : (make ?? model),
				takenAt: sanitizeIsoDate(raw.DateTimeOriginal),
				coordinates: sanitizeCoordinates(raw.latitude, raw.longitude),
				aperture: sanitizeFiniteNumber(raw.FNumber, 0.5, 64),
				iso: sanitizeFiniteNumber(raw.ISO, 25, 409_600),
				// ExposureTime is in seconds. Long exposures and ultra-fast
				// electronic shutters span eight orders of magnitude.
				shutterSpeed: sanitizeFiniteNumber(raw.ExposureTime, 1e-6, 3600)
			}

			info = result
			return result
		} catch (err) {
			error = err instanceof Error ? err.message : 'Extraction failed'
			info = null
			return null
		} finally {
			// loading stops whether the call succeeded, returned empty, or threw
			loading = false
		}
	}

	return {
		get info() {
			return info
		},
		get raw() {
			return rawState
		},
		get loading() {
			return loading
		},
		get error() {
			return error
		},
		extract
	}
}

Several decisions here are worth explaining.

The finally block is the correct place to reset loading. Putting loading = false only in the success path is a common mistake; if parse throws on a genuinely malformed file, loading stays true and any spinner in the UI spins forever.

ExtractedMetadata is exported so other modules (the upload pipeline, the database adapter, the display panel) can import it as a type without re-deriving the shape. This is the type contract for the boundary between “untrusted EXIF” and “trusted application data”.

The parse result is cast to Record<string, unknown> rather than any. That single decision is what forces every property access to go through a sanitizer: TypeScript will refuse to assign unknown to string | null without a typeof check, which is exactly the discipline we want.


Concurrent Extraction and Compression

The extraction and compression pipelines are completely independent. parse reads the EXIF block from the file header; the optimizer decodes the pixel data from a different region of the same file. Neither operation depends on the result of the other, and both can start at the same moment.

Promise.all is the right tool here:

// src/lib/image/upload.svelte.ts
import {
	createMetadataExtractor,
	type ExtractedMetadata,
	type MetadataExtractor
} from './metadata.svelte'
import { createOptimizer } from '$lib/optimizer.svelte'

type Stage = 'idle' | 'processing' | 'ready' | 'failed'

export interface ProcessResult {
	dbPayload: ExtractedMetadata | null
	optimizedBlob: Blob
}

export interface UploadPipeline {
	readonly stage: Stage
	readonly meta: MetadataExtractor
	readonly opt: ReturnType<typeof createOptimizer>
	process(file: File): Promise<ProcessResult | null>
}

export function createUploadPipeline(): UploadPipeline {
	const meta = createMetadataExtractor()
	const opt = createOptimizer()

	let stage = $state<Stage>('idle')

	async function process(file: File): Promise<ProcessResult | null> {
		stage = 'processing'

		// Both start immediately with the same File reference.
		// Neither awaits the other.
		const [dbPayload, optimizedBlob] = await Promise.all([meta.extract(file), opt.squash(file)])

		if (!optimizedBlob) {
			stage = 'failed'
			return null
		}

		stage = 'ready'

		// dbPayload     -> null or a sanitized ExtractedMetadata, save to your database
		// optimizedBlob -> the compressed WebP, upload to R2/S3/Supabase Storage
		return { dbPayload, optimizedBlob }
	}

	return {
		get stage() {
			return stage
		},
		get meta() {
			return meta
		},
		get opt() {
			return opt
		},
		process
	}
}
Loading diagram...

The two branches share the same File object as input but are otherwise completely separate. When extraction returns null because the file had no EXIF block, or because every individual field failed sanitization, that does not affect compression at all. The optimised blob still gets created, the upload proceeds, and the database record stores null in its metadata columns. That is the correct representation of an image that carries no usable camera context; it is not an error state.

This resilience matters because screenshots, web-downloaded images, and exports from design tools typically have no EXIF data. A pipeline that fails on those files is broken for a substantial proportion of real-world uploads.

Extraction Failure Is a Data Condition, Not an Error

A null result from meta.extract means the file had no metadata to extract, or none that survived sanitization. The image can still be compressed, uploaded, and stored. Design your database schema to expect nullable metadata columns from the start.


Displaying the Metadata

The component that renders the extracted information has three jobs: handle a null metadata object gracefully, format the raw values into strings a user can actually read, and (because this is a teaching exercise about trust boundaries) let the reader see the raw exifr output side-by-side with the sanitized payload.

Three small extras pay back on a tutorial page:

  • A “Show raw EXIF” toggle that renders both meta.raw and meta.info as JSON. The reader sees exactly what the camera embedded versus what your sanitizers chose to keep.
  • A “Copy as JSON” button that writes the sanitized payload to the clipboard. That is the value you would POST to your API; copying it makes the trust boundary tangible.
  • A “Download metadata.json” link, for the same reason: the reader walks away with a file that is exactly what would be persisted.

Because Svelte’s {value} interpolation auto-escapes by default, the sanitized text fields are safe to render directly. The raw JSON is rendered inside <pre> for the same reason: it never crosses into HTML.

<!-- src/lib/image/MetadataPanel.svelte -->
<script lang="ts">
	import type { MetadataExtractor } from './metadata.svelte'

	interface Props {
		meta: MetadataExtractor
	}

	let { meta }: Props = $props()

	let copyState = $state<'idle' | 'copied' | 'error'>('idle')
	let showRaw = $state<boolean>(false)

	const shutterDisplay = $derived.by<string | null>(() => {
		if (!meta.info?.shutterSpeed) return null
		const denominator = Math.round(1 / meta.info.shutterSpeed)
		return denominator > 1 ? `1/${denominator}s` : `${meta.info.shutterSpeed}s`
	})

	const apertureDisplay = $derived<string | null>(
		meta.info?.aperture != null ? `f/${meta.info.aperture}` : null
	)

	const dateDisplay = $derived.by<string | null>(() => {
		if (!meta.info?.takenAt) return null
		return new Intl.DateTimeFormat(undefined, {
			dateStyle: 'long',
			timeStyle: 'short'
		}).format(new Date(meta.info.takenAt))
	})

	const mapsUrl = $derived.by<string | null>(() => {
		const coords = meta.info?.coordinates
		if (!coords) return null
		return `https://maps.google.com/?q=${encodeURIComponent(coords.lat)},${encodeURIComponent(coords.lng)}`
	})

	// Pretty-printed JSON of the sanitized payload: what would actually be
	// stored in a database. Reused for both "Copy" and "Download".
	const sanitizedJson = $derived<string>(meta.info ? JSON.stringify(meta.info, null, 2) : '{}')

	// JSON view of the raw exifr output. exifr returns Date instances for
	// timestamps and Uint8Array for some binary tags; the replacer normalises
	// those so the view is always JSON-safe.
	const rawJson = $derived.by<string>(() => {
		if (!meta.raw) return ''
		try {
			return JSON.stringify(
				meta.raw,
				(_key, value) => {
					if (value instanceof Date) return value.toISOString()
					if (typeof value === 'bigint') return value.toString()
					if (value instanceof Uint8Array) return `<Uint8Array length=${value.length}>`
					return value
				},
				2
			)
		} catch (err) {
			return `// unable to stringify raw EXIF: ${err instanceof Error ? err.message : 'unknown error'}`
		}
	})

	async function copySanitized(): Promise<void> {
		try {
			await navigator.clipboard.writeText(sanitizedJson)
			copyState = 'copied'
			setTimeout(() => {
				copyState = 'idle'
			}, 1500)
		} catch {
			copyState = 'error'
			setTimeout(() => {
				copyState = 'idle'
			}, 1500)
		}
	}

	function downloadSanitized(): void {
		const blob = new Blob([sanitizedJson], { type: 'application/json' })
		const url = URL.createObjectURL(blob)
		const a = document.createElement('a')
		a.href = url
		a.download = 'metadata.json'
		a.click()
		setTimeout(() => URL.revokeObjectURL(url), 0)
	}
</script>

{#if meta.loading}
	<div class="panel" aria-live="polite">
		<span class="pulse">Reading camera data...</span>
	</div>
{:else if meta.error}
	<div class="panel panel--error" role="alert">
		<p>Could not read metadata: {meta.error}</p>
	</div>
{:else if !meta.info}
	<div class="panel">
		<p>No camera metadata found in this file.</p>
	</div>
{:else}
	<div class="panel">
		<div class="rows">
			{#if meta.info.camera}
				<div class="row">
					<span class="label">Camera</span>
					<span class="value">{meta.info.camera}</span>
				</div>
			{/if}

			{#if dateDisplay}
				<div class="row">
					<span class="label">Taken</span>
					<span class="value">{dateDisplay}</span>
				</div>
			{/if}

			{#if meta.info.coordinates}
				<div class="row">
					<span class="label">Location</span>
					<span class="value">
						<a href={mapsUrl ?? '#'} target="_blank" rel="noopener noreferrer">
							{meta.info.coordinates.lat.toFixed(4)}, {meta.info.coordinates.lng.toFixed(4)}
						</a>
					</span>
				</div>
			{/if}

			{#if apertureDisplay || shutterDisplay || meta.info.iso}
				<div class="row">
					<span class="label">Exposure</span>
					<span class="value">
						{#if apertureDisplay}{apertureDisplay}{/if}
						{#if shutterDisplay}
							· {shutterDisplay}{/if}
						{#if meta.info.iso}
							· ISO {meta.info.iso}{/if}
					</span>
				</div>
			{/if}
		</div>

		<div class="actions">
			<button type="button" class="action" onclick={copySanitized}>
				{copyState === 'copied'
					? 'Copied ✓'
					: copyState === 'error'
						? 'Copy failed'
						: 'Copy as JSON'}
			</button>
			<button type="button" class="action" onclick={downloadSanitized}>
				Download metadata.json
			</button>
			{#if meta.raw}
				<button
					type="button"
					class="action action--ghost"
					aria-expanded={showRaw}
					onclick={() => (showRaw = !showRaw)}
				>
					{showRaw ? 'Hide raw EXIF' : 'Show raw EXIF'}
				</button>
			{/if}
		</div>

		{#if showRaw}
			<div class="json-grid">
				<div class="json-col">
					<span class="json-title">Sanitized payload (saved to DB)</span>
					<pre>{sanitizedJson}</pre>
				</div>
				<div class="json-col">
					<span class="json-title">Raw exifr output (display only)</span>
					<pre>{rawJson}</pre>
				</div>
			</div>
		{/if}
	</div>
{/if}

<style>
	.panel {
		padding: 0.75rem 1rem;
		border: 1px solid var(--border-default, #333);
		border-radius: var(--radius-md, 6px);
		background: var(--surface-1, #1a1a1a);
		color: var(--text, #fff);
		display: flex;
		flex-direction: column;
		gap: 0.75rem;
	}

	.row {
		display: flex;
		gap: 1rem;
		padding-block: 0.25rem;
	}

	.label {
		color: color-mix(in oklch, var(--text, #fff) 60%, transparent);
		min-width: 6rem;
		flex-shrink: 0;
		font-family: var(--font-mono, ui-monospace, monospace);
		text-transform: uppercase;
		letter-spacing: 0.05em;
		font-size: 0.75rem;
	}

	.actions {
		display: flex;
		flex-wrap: wrap;
		gap: 0.5rem;
		padding-top: 0.25rem;
		border-top: 1px solid color-mix(in oklch, var(--border-default, #333) 60%, transparent);
	}

	.action {
		appearance: none;
		border: 1px solid var(--border-default, #333);
		background: transparent;
		color: var(--text, #fff);
		font-family: var(--font-mono, ui-monospace, monospace);
		font-size: 0.75rem;
		text-transform: uppercase;
		letter-spacing: 0.05em;
		padding: 0.4rem 0.75rem;
		border-radius: var(--radius-md, 4px);
		cursor: pointer;
	}

	.action:hover {
		border-color: var(--accent-primary-base, #ff5722);
	}

	.json-grid {
		display: grid;
		grid-template-columns: 1fr 1fr;
		gap: 0.75rem;
	}

	@media (max-width: 720px) {
		.json-grid {
			grid-template-columns: 1fr;
		}
	}

	pre {
		margin: 0;
		padding: 0.5rem 0.75rem;
		background: color-mix(in oklch, var(--bg, #000) 70%, transparent);
		border: 1px solid color-mix(in oklch, var(--border-default, #333) 60%, transparent);
		border-radius: var(--radius-md, 4px);
		font-family: var(--font-mono, ui-monospace, monospace);
		font-size: 0.75rem;
		line-height: 1.4;
		overflow: auto;
		max-height: 18rem;
	}

	.pulse {
		animation: pulse 1.5s ease-in-out infinite;
	}

	@keyframes pulse {
		0%,
		100% {
			opacity: 1;
		}
		50% {
			opacity: 0.4;
		}
	}
</style>

Three details worth calling out:

  • meta.raw is reactive state on the extractor, but the article’s earlier callout still applies: it is never part of the upload payload. Compare the two <pre> blocks under the toggle and note that fields you might expect to see (long Make strings with NUL bytes, latitude 0, year 1970-01-01, an ISO of 12800000) appear in the raw column and are absent or null in the sanitized column. That contrast is the entire point of the lesson.
  • sanitizedJson is a $derived value. Both the copy and download paths use it, so a re-render after the next file drop produces a fresh JSON without us having to manually invalidate anything.
  • Intl.DateTimeFormat with undefined as the locale formats to the browser’s language setting automatically. For production, consider whether showing precise GPS decimals in the DOM is appropriate; a static map thumbnail is often a better UX for the same data.

Wiring Metadata into the Optimizer Page

The /optimizer page you built in Lesson 4 already owns the DropZone, the quality slider, and the comparison slider. The change in this lesson is intentionally small: swap createOptimizer() for createUploadPipeline(), render <MetadataPanel> beneath the comparison slider, and leave everything else alone. Because the pipeline exposes the optimizer through pipeline.opt, every reactive read your page already does (o.status, o.originalUrl, o.optimizedUrl, o.appliedQuality, o.setQuality, o.quality) keeps working unchanged.

Update src/routes/optimizer/+page.svelte to look like this:

<!-- src/routes/optimizer/+page.svelte -->
<script lang="ts">
	import { createUploadPipeline } from '$lib/image/upload.svelte'
	import MetadataPanel from '$lib/image/MetadataPanel.svelte'
	import DropZone from '$lib/components/DropZone.svelte'
	import ComparisonSlider from '$lib/components/ComparisonSlider.svelte'

	// The optimizer page now uses the same concurrent metadata+compression
	// pipeline as the rest of the upload flow. The reader sees the extracted
	// metadata directly under the comparison slider on the demo route they
	// already know.
	const pipeline = createUploadPipeline()
	const o = pipeline.opt
	const meta = pipeline.meta

	// Mirror of o.quality used while the range input is being dragged so the
	// displayed % updates live without committing a re-encode on every pixel
	// of motion. We commit on `onchange` (release) by calling o.setQuality,
	// which also brings o.quality back into sync.
	let liveQuality = $state<number>(o.quality)
	$effect(() => {
		liveQuality = o.quality
	})

	async function handleFile(file: File): Promise<void> {
		await pipeline.process(file)
	}
</script>

<svelte:head>
	<title>Image Optimizer</title>
</svelte:head>

<main class="optimizer">
	<header class="optimizer__header">
		<h1>Client-Side Image Optimizer</h1>
		<p>
			Drop a JPEG, PNG, or WebP. Your image is compressed on your device. Nothing leaves your
			browser until you choose to save the result.
		</p>
	</header>

	<DropZone
		onfile={(file: File) => {
			void handleFile(file)
		}}
	/>

	<div class="optimizer__quality">
		<label for="quality" class="optimizer__quality-label">
			<span>WebP quality</span>
			<output for="quality">{liveQuality}%</output>
		</label>
		<input
			id="quality"
			type="range"
			min="1"
			max="100"
			step="1"
			value={liveQuality}
			oninput={(e) => (liveQuality = Number((e.currentTarget as HTMLInputElement).value))}
			onchange={(e) => {
				void o.setQuality(Number((e.currentTarget as HTMLInputElement).value))
			}}
			class="optimizer__quality-range"
		/>
	</div>

	{#if o.status === 'processing'}
		<div class="optimizer__status" role="status" aria-live="polite">
			<span class="optimizer__spinner" aria-hidden="true"></span>
			Compressing...
		</div>
	{/if}

	{#if o.status === 'error'}
		<div class="optimizer__error" role="alert">
			{o.errorMessage ?? 'Compression failed. Try a different file.'}
		</div>
	{/if}

	{#if o.status === 'done' && o.originalUrl && o.optimizedUrl}
		<section class="optimizer__results">
			<ComparisonSlider
				before={o.originalUrl}
				after={o.optimizedUrl}
				originalSize={o.originalSize}
				optimizedSize={o.optimizedSize}
				appliedQuality={o.appliedQuality}
			/>

			<MetadataPanel {meta} />

			<div class="optimizer__actions">
				<a href={o.optimizedUrl} download="optimized.webp" class="optimizer__download-btn">
					Download optimised image
				</a>
			</div>
		</section>
	{/if}
</main>

Three things are worth pointing out about this diff:

  • Status comes from o, not pipeline.stage. The pipeline’s own stage field exists for callers that don’t want to think about decode/encode lifecycle, but here you already render against o.status === 'done' from Lesson 4. Keep using it: it tracks the optimizer’s actual lifecycle (including the re-encode triggered by quality changes) and your {#if} blocks stay identical.
  • pipeline.process(file) replaces o.squash(file) in the DropZone callback. Internally it Promise.alls the metadata extraction with the squash, so dropping a file kicks off both branches at once.
  • <MetadataPanel {meta} /> is the only new piece of markup. It owns its own loading, empty, and error states; you don’t need to gate it on o.status because the pipeline runs extraction in parallel. By the time the comparison slider renders, meta.info is already populated (or correctly null).

The styles section from Lesson 4 stays exactly as-is. There is no new CSS to add to this page.


Deriving a Meaningful Filename

A practical bonus from the extracted metadata is a storage key that humans can actually read. A file uploaded as IMG_9923.jpg becomes apple-iphone-15-pro-2024-09-14.webp in your system, which matters for SEO (covered later) and for any tooling that needs to reason about image provenance.

Because both the camera string and the original filename are user-controlled, the slug logic must apply the same defensive filtering as the EXIF sanitizers; otherwise an attacker can influence object keys, generate confusing display names, or attempt path traversal.

// src/lib/image/upload.svelte.ts (additional helpers)
import type { ExtractedMetadata } from './metadata.svelte'

export interface StorageKey {
	storageKey: string // opaque, written to the bucket, never changes
	semanticName: string // for display, SEO, and tooling
}

const MAX_SLUG_LENGTH = 64

function slugify(value: string): string {
	return value
		.toLowerCase()
		.normalize('NFC')
		.replace(/\p{C}/gu, '') // strip control / format / surrogate code points
		.replace(/\s+/g, '-')
		.replace(/[^a-z0-9-]/g, '') // hyphens, lowercase letters, digits only
		.replace(/-+/g, '-') // collapse runs of hyphens
		.replace(/^-+|-+$/g, '') // trim leading/trailing hyphens
		.slice(0, MAX_SLUG_LENGTH)
}

function safeBaseName(name: string): string {
	// Some browsers preserve directory components when files are dragged from
	// a folder. Take only the last path segment, then strip the extension.
	const last = name.split(/[\\/]/).pop() ?? name
	const withoutExt = last.replace(/\.[^.]+$/, '')
	return slugify(withoutExt) || 'image'
}

export function buildStorageKey(
	originalName: string,
	dbPayload: ExtractedMetadata | null
): StorageKey {
	// UUID as the actual bucket key prevents collisions and never changes.
	// The semantic name is stored separately in the database.
	const uuid = crypto.randomUUID()
	const ext = 'webp'

	const parts: string[] = []

	if (dbPayload?.camera) {
		const slug = slugify(dbPayload.camera)
		if (slug) parts.push(slug)
	}

	if (dbPayload?.takenAt) {
		// Already validated as a real ISO 8601 string by sanitizeIsoDate;
		// the first 10 characters are guaranteed to be 'YYYY-MM-DD'.
		parts.push(dbPayload.takenAt.slice(0, 10))
	}

	const semanticName =
		parts.length > 0 ? `${parts.join('-')}.${ext}` : `${safeBaseName(originalName)}.${ext}`

	return {
		storageKey: `${uuid}.${ext}`,
		semanticName
	}
}

The separation between storageKey and semanticName is load-bearing. Storage keys must be opaque and collision-safe; UUIDs satisfy both requirements. Semantic names are for display, SEO, and tooling. Keeping them separate means you can update the display name or regenerate a better one without touching the object in the bucket.

The slugify function applies four overlapping safety nets: it lowercases, strips Unicode control characters, collapses internal whitespace, removes anything that is not [a-z0-9-], collapses runs of hyphens, and caps total length. Each step is cheap and each step blocks a different way an attacker could shape your storage keys.


Common Mistakes

Trusting raw exifr output

The default failure mode is to pass parse results straight into the database:

// Avoid: every value is whatever bytes the file had
const raw = await parse(file)
db.insert({ camera_make: raw.Make, taken_at: raw.DateTimeOriginal })
// Preferred: every value passes through a typed sanitizer
const info = await meta.extract(file) // returns ExtractedMetadata | null
db.insert({
	camera: info?.camera ?? null,
	taken_at: info?.takenAt ?? null
})

Even with a strict pick list, exifr cannot guarantee the type, length, or content of any field. The sanitizer layer is the place where untrusted EXIF becomes trusted application data.

Extracting after compression

The most fundamental mistake is calling opt.squash(file) and then meta.extract(file) sequentially. Once squash runs, the original File object is still accessible; the issue is not that the file disappears but that the habit of sequential processing makes it easy to accidentally clear the input or reassign the reference before extraction starts. The Promise.all pattern prevents this entirely because both operations receive the File object in the same synchronous tick.

Awaiting extraction before starting compression

Treating extraction as a prerequisite adds avoidable latency:

// Avoid: extraction finishes before compression even starts
const dbPayload = await meta.extract(file)
const optimizedBlob = await opt.squash(file)

// Preferred: both run at the same time
const [dbPayload, optimizedBlob] = await Promise.all([meta.extract(file), opt.squash(file)])

Extraction on a typical JPEG takes 20 to 80 milliseconds. Compression takes 100 to 600 milliseconds. In the sequential version, extraction time is pure overhead added to every upload. In the concurrent version, extraction almost always finishes during the time compression is already running.

Picking GPSLatitude/GPSLongitude without the hemisphere refs

This one is silent: the code runs without errors, the coordinates look like plausible numbers, and your map view shows pins. The pins are just in the wrong place.

// Avoid: derived `latitude` / `longitude` come out unsigned
const raw = await parse(file, {
	pick: ['GPSLatitude', 'GPSLongitude']
})
// raw.latitude  ->  40.713  (correct for New York)
// raw.longitude ->  74.006  (WRONG; should be -74.006)
// Preferred: include the hemisphere refs so exifr can apply the sign
const raw = await parse(file, {
	pick: ['GPSLatitude', 'GPSLatitudeRef', 'GPSLongitude', 'GPSLongitudeRef']
})
// raw.latitude  ->  40.713
// raw.longitude -> -74.006

The symptom in the field is photos showing up in completely different parts of the world: a Sydney photo (33.868°S, 151.209°E) appears in the north Pacific, a Cape Town photo (33.918°S, 18.422°E) appears in Iraq. The DMS magnitudes are right; only the sign is missing.

Storing the raw exifr output in the database

Calling parse without a pick list returns an enormous object. Cameras embed dozens of proprietary tags, and some manufacturer blobs are several kilobytes of binary data encoded as base64. Storing all of this in a jsonb column creates three problems: query performance degrades as the column grows, the schema becomes tightly coupled to whatever exifr returns in the current library version, and filtering by a specific field requires parsing the blob in the database rather than using an indexed column.

Normalise into typed columns at extraction time: camera_make, camera_model, taken_at, gps_lat, gps_lng, iso, aperture, shutter_speed. If you genuinely need the full raw EXIF for archival purposes, put it in a separate raw_exif column that does not participate in queries.

Rendering EXIF strings with {@html} or in a raw href

The sanitizer keeps <script> and javascript: payloads in the output on purpose; the panel relies on Svelte’s auto-escaping to neutralise them. A well-meaning template change can re-arm the payload:

<!-- Avoid: bypasses HTML escaping; <script> tags execute -->
<span class="value">{@html meta.info?.camera ?? ''}</span>

<!-- Avoid: javascript: URLs are accepted by the browser -->
<a href={meta.info?.camera}>{meta.info?.camera}</a>
<!-- Preferred: default interpolation auto-escapes -->
<span class="value">{meta.info?.camera}</span>

<!-- Preferred: build the URL from validated parts, encode user values -->
<a href={`https://example.com/cam/${encodeURIComponent(slugify(meta.info?.camera ?? ''))}`}>
	{meta.info?.camera}
</a>

The rule is one-line: every render site decides its own escaping, the sanitizer never tries to guess which contexts the value will end up in.

Destructuring a potentially null result

parse returns undefined for files with no EXIF block. The extractor normalises that to null. Downstream code that destructures the result without checking will throw:

// Avoid: throws when dbPayload is null
const { camera, takenAt } = dbPayload

// Preferred: null is a valid, expected outcome
const dbRecord = {
	image_url: storageUrl,
	camera_make: dbPayload?.camera ?? null,
	taken_at: dbPayload?.takenAt ?? null,
	gps_lat: dbPayload?.coordinates?.lat ?? null,
	gps_lng: dbPayload?.coordinates?.lng ?? null
}

A record with all metadata columns set to null is correct. It represents an image that was captured without camera context, or processed through a tool that stripped EXIF before upload. Failing the upload to avoid null values would reject screenshots and programmatically generated images.


Performance and Scaling

For single-image uploads, extraction cost is negligible. exifr’s pointer-based parsing means it reads only the bytes it needs; even a 20 MB file has its EXIF extracted in under 100 milliseconds on current hardware.

Batch uploads change the picture slightly but not in the way you might expect. When 50 images are queued simultaneously (the Worker Pool pattern in Lesson 7), running 50 concurrent parse calls is fine. exifr is lightweight and does not hold large buffers. The bottleneck in batch processing is always compression, not extraction. The Worker Pool throttles compression via navigator.hardwareConcurrency; extraction runs freely on the main thread because its footprint is too small to matter.

The sanitizer pass is similarly cheap. The most expensive operation in the entire helper file is the \p{C} Unicode regex, which on a 128-character cap completes in microseconds. Adding this layer does not measurably affect upload latency.

The one situation where client-side extraction is the wrong tool is a SvelteKit form action. Actions run on the server in a Node.js context. exifr works there too, but the .svelte.ts module pattern and runes do not; the logic belongs in the action function directly, not in a factory that returns reactive state.

exifr Works in Node.js Too

If your upload flow goes through a SvelteKit form action, parse accepts Buffer objects in Node.js exactly as it accepts File objects in the browser. The extraction code is identical; only the module structure changes. The sanitizer helpers are pure TypeScript and run unchanged on either side.


What Comes Next

The pipeline now produces two independent outputs from a single file drop: a clean, stripped, optimised blob ready for your storage bucket, and a typed and sanitised metadata object ready for your database. Neither waits for the other, and a failure in one (including a failure in any individual EXIF field’s sanitization) does not affect the other.

Lesson 6 adds the piece that makes this genuinely production-ready: moving the @jsquash encode step off the main thread and into a Web Worker. Right now, encoding a large file blocks the browser; the spinner stops, interactions freeze, and the user sees nothing for hundreds of milliseconds. The Worker pattern solves this without touching the metadata module or the upload pipeline you have built here.


Key Takeaways

  • @jsquash has no access to EXIF data because decode returns an ImageData object: a flat RGBA pixel grid with no metadata fields. The information is gone before encode is ever called.
  • Extract before you compress. Pass the same File object to both meta.extract and opt.squash at the same time using Promise.all.
  • Treat exifr output as untrusted user input. Run every field through a typed sanitizer that returns T | null and validates type, length, range, character set, and date validity before storing.
  • Defend against script and markup injection at the render site, not the input. sanitizeText keeps <script> and javascript: payloads in the value on purpose. Svelte’s {value} interpolation auto-escapes them in the DOM; slugify strips them in filenames; encodeURIComponent neutralises them in URLs. Never use {@html} or assign EXIF strings directly to href.
  • A null result from extraction is a valid data condition, not an error. Design your database schema to accept nullable metadata columns.
  • Cast the parse result to Record<string, unknown> instead of any. TypeScript will then refuse to assign properties to typed columns without explicit validation, which is exactly the discipline you want at the trust boundary.
  • Keep metadata in your database and serve stripped images publicly. Re-injecting metadata into a public file removes the privacy guarantee the pipeline creates.
  • Normalise EXIF into typed columns at extraction time. Storing the raw exifr output as a blob couples your schema to the library’s output format and makes field-level queries impractical.
  • Slug derivation from Make/Model and from File.name needs the same defensive filtering as the EXIF sanitizers. UUIDs as the real storage key, slugified names for display.
  • If your application exposes coordinates in any user-facing feature, your privacy policy must explicitly disclose this. In the EU and other jurisdictions, extracted GPS data requires a lawful basis for processing.

Further Reading

See Also