If you've got the long-term memory needed to keep that workflow organised then that seems like a pretty viable plan, yeah.
... I wonder how much trouble it would be to make a FLOSS auto-transcriber, it feels like (bad) spoken-language models are a well-solved problem these days. Depends how much fidelity you actually need (probably more than a bad model can get you), I guess, plus any bit of software is fairly costly to make in practice because of the need for options and bug-testing and stuff. ... also this kind of thing is easier and easier the narrower the scope, which means the thing that can do specifically "two guys talking podcasts" is probably orders of magnitude simpler than a general-purpose speech parser. You could even train it on podcasts that *do* offer transcriptions!
no subject
... I wonder how much trouble it would be to make a FLOSS auto-transcriber, it feels like (bad) spoken-language models are a well-solved problem these days. Depends how much fidelity you actually need (probably more than a bad model can get you), I guess, plus any bit of software is fairly costly to make in practice because of the need for options and bug-testing and stuff. ... also this kind of thing is easier and easier the narrower the scope, which means the thing that can do specifically "two guys talking podcasts" is probably orders of magnitude simpler than a general-purpose speech parser. You could even train it on podcasts that *do* offer transcriptions!