Baby Steps FTW

2025-05-05 (Mod: 2025-09-14) | 3 minutes

It’s funny how hard it can be to defuse old habits. Historically, my approach to writing code has always been to plan out a good architecture and then implement the framework first, knowing that any time spent building a decent infrastructure first can pay off enormously over the life of the project. But as part of my new policy of leaning into my distractible attention span, I’m embracing a more “rapid prototyping” workflow. Unfortunately, somebody forgot to inform my subconscious about the new plan.

◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇

One of the big carrots I’ve been dangling in front of myself is the vision of being able to draw all of my Frankie exercises from the same material. Whether I’m reading Norwegian text to myself, doing audio-only “repeat after me” pronunciation drills, or narrating an English story to an imaginary Norwegian audience on the fly, I want each exercise to continuing moving me forward in the same book.

I’m actually pretty close now, but for the last two weeks I’ve been deferring the next step. I think it’s because, in my head, that next step involves setting up a server and API to provide text translations and audio TTS recordings.

I don’t plan to use the API on the fly, because Frankie is all about local-only tools when I’m studying. The goal will be to use that API when I first ingest a new book or podcast. After that, the material would be available on my phone whenever I needed it, no internet required.

But Old Jeff was getting stuck on having to build that API server so I’d be able to call it from my phone when I ingest a new document. And that meant I’d also have to implement the interface tools in Frankie to call that API and validate the responses and… Sigh. Is it any wonder that I haven’t done it yet.

Today I realized that I was doing the equivalent of pouring a concrete pad as step one of putting up a tent. I haven’t ever tried this multi-modal training idea before. What if I spend all that time and it sucks? Or what if I realize it needs a completely different workflow? Or find a way to do it using phone-based services and never end up using the API server at all? If only there was a faster way to just try it…

Like, if I’d already created tools for generating all the necessary files and integrating them into a single inflatable file, that already works on the phone?

Oh. Please excuse this minor episode of slow-brain.

Well, today my brain caught up and I actually tried it. I’ve only verified it on a single paragraph worth of text, but it worked beautifully. Emboldened by that success, I’m now generating the full chorus of sentence files for the novel I’m studying.

Who knows, by tonight I might be switching seamlessly between reading and listening modes for my evening read.