Looting The Battlefield

2025-05-16 (Mod: 2025-09-14) | 3 minutes

I’ve been experimenting for months on different ways to convert podcast content into a form I can use in Frankie. I’ve found several solutions, but along the way, I’ve left the resulting file tree in absolute chaos. I’ve got files in a dozen directories and an equal number of formats. There are fragments of scripts and clips and tools scattered throughout the hierarchy like bodies left to rot on a medieval battlefield, but there’s also no consistency, little documentation, and worse, still no definitive process.

Well it’s time to loot those bodies, gather the useful bits into a proper plan, and bury whatever is left.

◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇

Today, my process for ingesting a Norwegian podcast looks something like this:

Download the audio
Download the transcript
Split the transcript into sentences
Determine timecodes in the audio to match each sentence
Split the audio into sentence clips
Generate slow machine audio for the Norwegian sentences
Translate the transcript sentences into English
Generate machine audio of the English sentences
Pack everything into one of the file archive formats I’ve taught Frankie to understand

My recent road trip experiments have left me wondering if I should rethink Frankie’s default document format, but regardless of how I end up packaging the results, I need to get a handle on Steps 1 thru 8 and then clean up the file battleground.

I have scripts to help streamline every step except Step 4, which is still painfully slow and has to be done manually, using a subtitle editor to listen to the audio carefully and mark the start and end of every sentence. This takes approximately three times as long to do as the audio duration itself, making it the biggest hurdle to expanding my library of practice documents.

Finding a way to do this step faster has been the biggest reason for the carnage of files scattered around here, so solving it is the linchpin holding back any real progress in cleaning up these files.

Sentence Timing

Since I have the transcripts already split into sentences, I should be able to do what’s called “forced alignment” - which is an AI process that matches a transcript with the corresponding sections of audio.

I’ve done it before, with English audio, but I’m not sure if the AI models for Norwegian are available in a form I can work with. I also remember being disappointed by a lack of accuracy with the English version. The timecodes were in roughly the right spot, but often clipped off the beginning or end of the passages. So this is going to be more of another exploration, rather than a straightforward coding exercise.

I don’t have a clear plan yet, but I’ll set off in roughly that direction (points to the horizon) and see what I find. Stay tuned.

Looting The Battlefield

Sentence Timing

Read More

How Time Works In Plim

Over

Packaging Lesson Plans

Over

Random Body Parts Singing Together At Last

Over