Every pillar, one tag. The range, not a skills list.
Transcription, diarisation, and speech running entirely on Apple silicon, and why keeping voice local is a product decision before it is a technical one.
In NutriM8 you can mumble your whole day into your phone once. A background worker untangles it into sleep, weight, exercise, hydration and food, and resolves "a snack after lunch" to a real timestamp.
Diarization is three questions wearing one coat. Split them into specialists, never merge across a turn, and the failures stop hiding inside each other.
Diarization needs to guess how many speakers are in a clip. The eigengap of the graph Laplacian gives you k without asking for it up front.
Feeding long silent audio to a transcriber wastes time and money. Split on speech first using webrtcvad, then send only the chunks that contain voice.
An on-device speech stack for Apple Silicon, in Swift.
A recording goes in, a speaker-labelled transcript comes out.
whisper_schedule