ClipClap - Transcribe your media and clip it using the words

Live Demo: travis-seng.fr/clipclap

A tiny proof of concept: drop a video, get a transcript generated locally with Whisper (via transformers.js), click words to define a sub‑segment, export the trimmed clip. No server. Everything happens in your browser.

Core Flow

Load a video (drag & drop / file picker)
Run model → Whisper transcribes locally (word timestamps)
Click words to mark start and end (UI turns the earliest + latest selection into boundaries)
Preview the subclip
Export: download trimmed video + transcript

Features

In‑browser Whisper (transformers.js) — on‑device inference, privacy friendly
Word level timestamps rendered as selectable chips
Instant boundary selection: first and last clicked word define trim range
Live preview of the clipped segment before export
Transcript download (raw text)
Video export of just the selected span
Generation time indicator for performance feedback
Language selector (multi‑language capable; defaults to English)

Tech Stack

transformers.js + Whisper (small / tiny model for faster load)
WebAssembly backends for on‑device inference
HTML5 Video + Canvas time mapping for precise trimming
Client side media slicing (no upload) using MediaSource / offscreen processing
Vanilla TS/JS UI (lightweight, experimental)

Why I Built It

Wanted to explore:

Learning about using transformers.js
Running speech‑to‑text fully client side (no API keys / latency)
Mapping word timestamps to frame‑accurate trim points
Minimal UX for creating short quote clips out of a longer source
Performance tradeoffs of Whisper in the browser vs native / server

Notes / Limitations

First load requires model download (cache persists subsequent runs)
Uses smaller Whisper model for speed — accuracy is acceptable, not perfect
Long videos: memory + processing time scale with duration
Simple trimming (one contiguous segment) — not an editor

Future Ideas

Multi‑segment selection → concatenated highlight reel
SRT / WebVTT export
Per‑word confidence shading
Option to choose larger model when bandwidth / patience allows

(POC stage. Feedback welcome.)