As discussed in Experiment 1, improving my pronunciation without a native speaker to correct me has been a challenge, but correction is not the only function a native speaker provides. They also provide something I’ll call adaptive modeling.
◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇◆◇
Movies and podcasts offer a never-ending stream of canned examples - fragments of speech I can study and try to emulate - but they’re static. If there’s a phrase or expression I want to examine that they haven’t said, I’m screwed.
For example, during my self-evaluation experiment, my attempts to say “Det regner mye her om høsten,” were transcribed by the dictation tool as “The line of me hot on Houston,” and “The Rain On me Harlem Houston.”
Clearly my pronunciation needs more work - especially the phrase “det regner mye her” - but since that phrase is one I wrote myself, I don’t have an example of a native speaker saying it that I can study for reference.
In the past, I’ve used Google Translate to provide such recordings. It isn’t horrible, but it doesn’t offer much inflection or melody either, and the more text you give it, the more obvious those limitations get. It’s also not designed to save recordings as an audio file, which makes capturing them for repeated study a pain.
What I need is a more adaptive speech modeler - a more flexible and expressive TTS engine that I can use to generate my custom practice pieces. Ideally one that can save the audio files to disk. And it should produce good norsk. Not just correct norsk, but norsk with a bit of character.
Here’s an example of a more expressive paragraph I’ve written to put that troublesome “det regner” passage into some kind of context. I’ll use this to demonstrate the quality of the different speech generation tools I’ve tried.
Stillheten er sulten. Vind og bølger roper fra steinene nedenfor, men skogen bak meg sluker alt, og lar meg være alene i denne hytta, balanserende på en klippe mellom skog og hav, mellom fortid og fremtid, mellom panisk ambisjon og desperat overlevelse. Det regner mye her om høsten, men selv det sammenstøtet mellom rasende himmel og bølge blir oppslukt av trærne, og etterlater ingenting annet enn fravær. Jeg er fullstendig alene. Og nå roper skogen navnet mitt.
English: The silence is hungry. Wind and waves call out from the rocks below, but the forest at my back devours it all, leaving me alone in this cabin, balanced on a cliff between the forest and the sea, between the past and the future, between frantic ambition and desperate survival. It rains a lot here in the fall, but even that clash of angry sky and wave gets swallowed by these trees, leaving nothing in its wake but absence. I am utterly alone. And now the forest is calling my name.
That text is a bit more dramatic than I actually need for study purposes, but in addition to testing basic prosody, I want to see if any speech generators are able to detect a sense of ambience in the text and render it into the audio performance.
Here’s how my usual tools handle it, listed in increasing quality of the output:
DeepL:
Fred: (The name I’ve given to the voice produced by the Piper TTS system using the talesyntese Norwegian model.)
Google Translate:
Eleven Reader:
Conclusion
Of the ones I’ve tested, Eleven Reader is clearly the most nuanced, natural sounding voice, and I would love to use it for my language training. Unfortunately, it’s the only one in the group that requires a subscription, and it’s also been designed to lock out audio capture tools, so I can’t record the audio for offline study. (I only managed to capture the above example by using three different computers and spending an hour setting everything up.)
Bottom Line: I’m not opposed to paying for decent software, but Eleven Labs has too many usability wrinkles to fit into my daily study workflow. Although, if I wanted to listen to norsk books instead of reading them, I’d be seriously considering the Eleven Reader, since it’s way cheaper than buying audio books.
For now, though, GTrans is still the best compromise between usability, cost, and quality, which means it will stay on as my daily driver. But I expect the landscape to change frequently, so I’ll revisit this decision from time to time, and update this post when I do.
In my continuing quest for creative ways to drill my language skills, the one area I struggle with most is finding ways to evaluate my speech.
I listen to lots of Norwegian audio, and I’ve started reading books and film scripts aloud, so I’m getting plenty of practice at both listening and speaking, but I don’t get any feedback on my pronunciation and diction.
One day soon, AI will be able to converse with me on random topics and gently correct my norsk as we go, the way a native speaker might do, but we’re not there yet. I do intend to make use of live human coaches online, but not until I feel I’ve gone as far as I can with my own crazy methods first. Speaking of which…