#ai

Every pillar, one tag. The range, not a skills list.

Musings

Voice that never leaves the device

Transcription, diarisation, and speech running entirely on Apple silicon, and why keeping voice local is a product decision before it is a technical one.

One voice note, five diaries

In NutriM8 you can mumble your whole day into your phone once. A background worker untangles it into sleep, weight, exercise, hydration and food, and resolves "a snack after lunch" to a real timestamp.

An LLM agent as an ETL pipeline

Nikis needed a clean dataset of kids' holiday activities across NSW. So I pointed an agent at the open web and made it behave like infrastructure: typed, checkpointed, and capped.

Who, what, when: why I stopped trusting one diarization pipeline

Diarization is three questions wearing one coat. Split them into specialists, never merge across a turn, and the failures stop hiding inside each other.

MLX or CoreML? Both, and here is the table

A 600M-parameter local model beats Whisper Large v3 on a Mac. The backend choice is per-model, decided by latency and memory and GPU contention, not ideology.

Snippets

Chunk audio with VAD before you transcribe

Feeding long silent audio to a transcriber wastes time and money. Split on speech first using webrtcvad, then send only the chunks that contain voice.

Tools

speech-swift

An on-device speech stack for Apple Silicon, in Swift.

Lab

whisper_schedule

A recording goes in, a speaker-labelled transcript comes out.

Semantic food search

Search food by meaning, not by exact product name.

Infinite exercises, verified

A model drafts maths questions against the component library, a verifier throws out the junk, and a clean one renders. Forever.