brin_bellway | The awkward transitional period of history we are currently living through

Entry tags:

The awkward transitional period of history we are currently living through

In which *some* but not *all* podcasts use auto-transcripts, so you can't immediately disregard something for being a podcast.

I was hoping Just Plain Wrong had a text version. :(

---

Listen Notes informs me that Google is willing to run their auto-transcriber on anything played through Google Chrome (not just Youtube videos), but 1: fuck Google Chrome, and 2: it sounds like it only *captions* rather than transcribing per se, so a 36-minute episode would require 36 minutes of hanging around watching for each new word/line to pop up (as opposed to dumping the audio into a processing queue, going off to do other things, and getting all the text at once later).

Flat | Top-Level Comments Only

I don't entirely endorse Explore FI Canada (though there's some interesting stuff in there), but I *do* entirely endorse their heavily automated transcript system.

...huh, come to think of it it looks like I probably *could* coax the cloud service Explore FI Canada uses into transcribing Just Plain Wrong for me, although it would be enough of a pain to do so while staying within the limitations of the free tier that I would pretty much have to be motivated by spite.

I think I'll bear that in mind: if enough uses for [transcription where I don't care if the data gets breached] pile up I might occasionally pay Otter USD$13 for a month's access and catch up on them, while I wait for people with more tech skills than me to invent a FLOSS auto-transcriber. (Or, hmm, it might actually be USD$6.50/month: apparently there's a 50% student discount if you sign up under a school email address. Otter might not realise that my school email is an alumni account.)

If you've got the long-term memory needed to keep that workflow organised then that seems like a pretty viable plan, yeah.

... I wonder how much trouble it would be to make a FLOSS auto-transcriber, it feels like (bad) spoken-language models are a well-solved problem these days. Depends how much fidelity you actually need (probably more than a bad model can get you), I guess, plus any bit of software is fairly costly to make in practice because of the need for options and bug-testing and stuff. ... also this kind of thing is easier and easier the narrower the scope, which means the thing that can do specifically "two guys talking podcasts" is probably orders of magnitude simpler than a general-purpose speech parser. You could even train it on podcasts that *do* offer transcriptions!

Flat | Top-Level Comments Only

The awkward transitional period of history we are currently living through

no subject

no subject