Timestamping a transcript so a local LLM can cite the podcast.

I have a little project that watches a few podcast RSS feeds, transcribes each new episode with Whisper, and hands the text to a local LLM to pull out any stocks the hosts talked about. It works, but the first version had an annoying gap: the model would tell me a host was bullish on NVIDIA, and I had no quick way to check it without re-listening to the whole episode.

What I wanted was a timestamp on every pick, a 00:14:32 I could scrub to and hear for myself. The problem is the LLM only ever sees text. It has no idea what minute of audio it’s reading. So I had to put the time into the text.

Whisper already knows the times

Whisper doesn’t just give you a wall of text. Under the hood it returns segments, and every segment comes with a start and end in seconds.

const { segments } = await transcribe(audioPath);
for (const seg of segments) {
  console.log(seg.start, seg.end, seg.text);
}
// 12.0  14.5  " I've been adding to my NVIDIA position."

A helper to make seconds readable

Whisper hands me a float like 872.4, so I need a helper to convert it to HH:MM:SS.

function formatTimestamp(seconds) {
  const h = Math.floor(seconds / 3600);
  const m = Math.floor((seconds % 3600) / 60);
  const s = Math.floor(seconds % 60);
  return [h, m, s].map((n) => String(n).padStart(2, "0")).join(":");
}

Dropping a marker every thirty seconds

I don’t want a timestamp glued to every single segment. That’s noisy and burns tokens I’m paying for in context. Once every thirty seconds of audio is plenty to pin a mention down. So I track the last time I printed a marker and only print a new one once thirty seconds have gone by.

let lastTs = -30; // force a marker on the very first segment
const lines = [];

for (const seg of segments) {
  const text = seg.text.trim();
  if (seg.start - lastTs >= 30) {
    lines.push(`[${formatTimestamp(seg.start)}] ${text}`);
    lastTs = seg.start;
  } else {
    lines.push(text);
  }
}

Now the transcript reads like this:

[00:14:00] ...so that's my take on the housing market.
I've been adding to my NVIDIA position every month.
[00:14:30] And I still think Oracle has room to run.

Telling the model they’re there

The markers do nothing on their own. The model has to be told they exist and what to do with them. Two sentences in the system prompt is all it takes.

The transcript includes timestamps in [HH:MM:SS] format.
You MUST include the timestamp of where each mention occurs in the podcast.

Then in the part of the prompt where I describe the JSON I want back, I ask for a timestamp on every pick: “the timestamp closest to where it was discussed so the user can jump to that point in the podcast to verify.” And that’s exactly what comes back:

{
  "ticker": "NVDA",
  "sentiment": "bullish",
  "timestamp": "00:14:32"
}

Putting it all together

Stripped down to just the timestamping, the whole thing is short:

function formatSegments(segments) {
  const lines = [];
  let lastTs = -30;

  for (const seg of segments) {
    const text = seg.text.trim();
    if (seg.start - lastTs >= 30) {
      lines.push(`[${formatTimestamp(seg.start)}] ${text}`);
      lastTs = seg.start;
    } else {
      lines.push(text);
    }
  }

  return lines.join(" ");
}

One pass over the segments, a marker dropped in every thirty seconds, and suddenly the model can footnote itself.

One caveat: Whisper’s timing drifts a little on long episodes, so a pick can land a few seconds early or late.