Road Trip Trials

2025-05-13 (Mod: 2025-09-14) | 5 minutes

I’ve always enjoyed the peaceful thinking time afforded me by trips across the long, flat (and straight!) expanses of Canada’s prairie highway network, but as the family’s primary driver, my productivityy en route has always been limited to “thinking about stuff.” One of the dream use-cases for Frankie is to give me a second way to be productive.

So when plans formed last week to make one of those trips this week, I started scrambling. Frankie itself wasn’t quite ready, but the media files were. Could I whip up a way to test my proposed workflow without Frankie?

◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇

The objective is to be able to listen to some media (a novel, podcast, tv show, etc) one sentence at a time, be able to replay the sentence at will, and swap in a slowly enunciated version or an English translation, as necessary. And since I want to do this while driving, it also has to work hands-free. No distractions from having to read a screen or poke buttons on a GUI interface.

To solve the hands free part, I’ve been experimenting recently with this mini bluetooth game controller. It’s compact, fitting easily in the palm of my hand where I can reach all the buttons with my thumb. By assigning those to the various sentence replay options, I should be able to control things entirely by feel.

Following from my recent decision to put a full complement of sentence versions into every Frankie document, I had created some examples for testing and was integrating them into the code when I hit a roadblock. I haven’t got that sorted out yet, and there wasn’t time to fix it, but the media files were just sitting there. All I needed was a way to connect the controller to a playback script.

With ChatGPT’s help, I wrote a simple command line script that responds to the media buttons and plays audio files using simple shell subprocesses. It was almost too easy. I loaded everything onto my phone… and then watched it fail in exactly the same way Frankie had failed. The short document worked fine, but the novel seemed unable to find half of its audio clips.

This time, however, I didn’t have the complication of a full TUI infrastructure to debug - it was just a simple script and a directory full of audio files - so the problem was easy to pinpoint. Anybody want to guess how many sentences there are in a novel? Now multiply that number by 3 files - one for each of fast Norwegian, slow Norwegian, and English. Turns out, my novel datapack was composed of almost 60,000 audio files, which was slamming me up against the limit for the number of files a FAT32 file system can hold.

After scratching my head for a minute, I realized that I didn’t need to unpack all the files - I could just unpack the ones needed for the current sentence. Then, each time I advanced to a new sentence, I could unpack the new set of clips and overwrite the old ones.

I quickly modified my script to manage files that way, and when I transferred it to Android, it worked flawlessly. (This trick should also work for fixing Frankie, but I didn’t have time to implement and test that.)

So I spent 5 hours on the road yesterday testing the workflow, and the time just flew past. It was awesome. It worked as seamlessly as I had envisioned, and I made real progress on my ear training.

One of my documents was created from a Lær Norsk Nå podcast, so the primary audio was clean, clearly enunciated human Norwegian. The other was machine-generated from a novel I’m reading so that audio was a bit grainy, but that’s a temporary problem. A better Norwegian language model is expected later this year, and if it’s as good as the one I used for the English audio, it will be a huge improvement.

The only real disappointment in my test was the slow audio version. I generated it by slowing down the speed factor when generating the clip, but all that does is stretch the audio out. It didn’t improve the enunciation at all, but it did give me an idea.

When I arrived at my destination, I ran a quick test. Instead of using the speed factor, I tried inserting a comma after every word in the sentence, to force the synthesizer to add a clear pause between words. The results were way better. It turns out that making clear distinctions between the words is much more important than speaking slowly. So I regenerated all the slow sentences for both of my documents and tomorrow I’ll be able to test that improvement when I make the trip home again.

Follow-up:

This time, the slow audio sentences were much better. Comma separation gives me exactly the kind of clear word boundaries I’m looking for. Unfortunately, another issue interfered and I didn’t find the homeward practice session very helpful.

The sentences are too long.

My Norwegian ear is still at maybe an A2 fluency level, so I guess it’s no surprise that I’m having trouble following the longer, more complex sentence structures of a C1-level novel. The fact that I can read them and understand what I’m reading doesn’t offer any real benefit to the process of decoding the audio into words. And even when I could understand the individual words, I wasn’t able to construct the meaning of the sentences fast enough to keep up.

I think having full-complement documents is still the way to go, but my dream of being able to switch back and forth between reading and listening within the same document will have to wait until my hearing fluency catches up with my reading. (Duh!)

And to get me there, I’m going to have to ingest some simpler materials.

Follow-up:

Read More