no subject

Updates after trying it on a couple real-world conversations (leaving them to crunch while I went to work):

It seems to be very prone to pareidolia and hallucination when processing wordless stretches, but, you know, it's v1 and not even the most accurate of the v1 accuracy/time+hardware tradeoff tiers. It's early days.

It correctly capitalises "Central Fresh" and even hyphenates "my pile of don't-forget stuff", but really struggles with the word "fortnight" (sport night? Fortnite?). That's fair, to be honest: I do get the impression I use the word "fortnight" a lot more than most people do.

I think it's probably over the threshold of "worth having these transcripts so I can index them and search for keywords when I'm trying to figure out which audio file has the thing I'm looking for", but not over the threshold of "usable for...I guess you could call it episodic-memory spaced-repetition" (which is something I tend to naturally do with text-native chat logs).

I will have a general rule of *not* updating any text manually: I *know* that at some point I'm going to replace this set of transcripts with a set made with higher-grade hardware and/or software, so it's not worth polishing up these temporary copies.

(If I had the ability to fine-tune my personal copy of Whisper's small.en model without having to be an AI specialist or even a programmer, though, I *would* do some of that. I'm pleased that it seems to already have a decent corpus of th-fronters, but stuff like, well, "weight the possibility that a word was 'fortnight' higher than you would for the general public" would be handy.)

(2 comments)

no subject

Post a comment in response: