brin_bellway | Entries tagged with that excuse for communication‚ speech

Comments on my own posts:

[N/A]

---

Comments on other people's posts:

[WordPress (Tumblr)] (OP by

tototavros) Recommendations for podcasts (and, more importantly, transcripts) about American history. (Update: AFAICT only the *later* episodes of Presidencies of the United States have transcripts. The earliest episode transcript I could find was "3.28 - The Calming Seas". Still, in medias res is something.) [one or two comments, depending on how you count]

[WordPress (Tumblr)] (OP by

moral-autism; partly in response to

analytically) Why use Spotify as a music player when you can use Youtube? (Not to even mention Quod Libet or Odyssey.)

[cw: illness] [Dreamwidth; Wayback] (OP by

mindstalk) Favourite masks.

[cw: medical, poverty] [Start of WordPress thread (Tumblr part 1; Tumblr part 2)] (OP by

rustingbridges) The financial intricacies of dental insurance.

[Blogspot; Wayback] (OP by Michael James) A gentle correction of one of the most important omissions I encountered on my wiki-walk through Canadian personal-finance blogs, RRSP reporting edition.

[Dreamwidth; Wayback] (OP by

mindstalk) Hypophantasia and dream accuracy.

[mild cw: food] [WordPress (Tumblr)] (OP by

TrinCyboid) From the people who brought you mystery melons, we also have 24 liveblogging. (Both are hilarious.) [one or two comments, depending on how you count]

[cw: food] [Dreamwidth; Wayback] (OP by

mindstalk) Egg-boiling techniques.

[cw: food, (arguably) poverty] [Dreamwidth; Wayback] (OP by

mindstalk) More on egg-boiling techniques, now with energy conservation.

[cw: poverty] [WordPress (Tumblr)] (OP by

moonlit-tulip) Holidays and the value of routine-breaking (or, perhaps, routine-keeping on a higher level).

[Dreamwidth; Wayback] (OP by

lunartulip) The wonders of "details" HTML tags.

[cw: poison, poverty] [Start of WordPress thread (Tumblr part 1; Tumblr part 2)] (OP by

moral-autism; partly in response to

humanfist) The importance of avoiding home radon exposure.

---

Links:

[cw: corporate bullshit] [WordPress (Tumblr)] (by

ponett) BreezeWiki: an adversarially interoperable Wikia frontend that is (unlike the official one) *actually usable*.

[strong cw: illness] [BBC; Wayback] (by Zaria Gorvett; h/t cvirtue) Measles: much worse than previously believed.

[Explain Shell] (by Idan Kamara; h/t Ilzolende) Command-line-to-English machine translation.

[Internet Archive] How to archive your tweets with the Wayback Machine. (The ingest-URLs-from-spreadsheet bit is intriguing regardless.)

(part 3)

---

Circa 2022-10-21:
holy shit

apparently there was a bug in Whisper that was causing it to run far more slowly than it had any right to, *especially* on CPU

it's fixed now

(also they've added an option to change how many CPU threads a Whisper instance uses, so if you have 4 threads you can just use 4 and not have to run a pair of Whispers to make use of it all)

I ran a quad-threaded Whisper overnight on 6.5 hours of audio, and nine hours later when I checked on it again *it was already done*

2022-11-19, 10:30 AM:
I haven't run direct head-to-head tests, but it seems as if quad-threaded Whisper is *more* than twice as fast as dual-threaded? It also seems--and *this* makes intuitive sense--that "--condition_on_previous_text=False" runs more slowly (but is more likely to notice when someone starts talking after a long stretch of background noise).

I made the mistake, 35 hours ago, of putting a second one on because it was nearly bedtime and the first one (which was not conditioning on previous text) only had about forty minutes left. The first one's not done, and the second one (conditioned on previous text) has only made it through about 80 minutes.

OTOH, that's making me wonder if my octa-threaded smartphone would be a better cruncher than I'd expected. Perhaps I'll try it at the end of the workweek.

2022-11-19, 11:00 AM:
Ah, you press *ctrl-Z* in a terminal window to pause its active process, and *fg* to resume. Good: it really seemed like there ought to be a way of doing that, but I was having trouble finding it.

I'll test it on Whisper once I've got a Whisper instance running that *wouldn't* lose hours of work if it went wrong. (Not the whole 35 hours--it writes the transcripts to disk at the end of each file, not at the end of the sequence of files--but still.)

2022-11-19, 11:30 AM:
That first one finished a file and moved on to the next. I tried pausing and resuming it. CPU monitor says it's running, but it's not outputting anything to the terminal. It's possible it just hasn't done anything worth outputting yet, though.

A bit of poking at the Termux wiki suggests that Termux isn't very conducive to using Whisper, in any case.

Hmm. It's not the only computer in the house with eight threads. Come to think of it, I wonder if Mom would let me run a password-protected VM inside *her* laptop...

2022-11-19, 11:47 AM:
Oh hey, it's started outputting. I guess I'll try poking Termux to see if it's easier than it looks from the wiki.

liveblog of Termux troubleshooting, click to open (even if you're viewing this post directly and not on a Dreamwidth /read page)

2022-11-19, 12:00 PM:
It can't install because...my version of Python is too *new*? Well, that's not the error message I expected.

Is my August Termux backup an old enough Python version...yes. Let's try reverting to that...

2022-11-19, 12:10 PM:
It still says it can't install torch, but no longer claims that the reason is because of having Python 3.11.

2022-11-19, 12:40 PM:
Turns out that you *can* install torch on Termux, but you have to run pkg install python-torch instead of going through pip. Now the error message is about missing Rust, which the Whisper readme discusses as a common error.

(Another 380 MB of disc space for Rust. Man, this shit is *expensive*. Well, what else was I using my remaining 4.5 GB of internal storage for; as for the external storage, Brother bought me a 512 GB microSD as a birthday present and it's in the mail.)

2022-11-19, 12:55 PM:
"CANNOT LINK EXECUTABLE 'rustc': library 'libLLVM-15.so' not found"

...what if I install libllvm?

2022-11-19, 1:50 PM:
...and aarch64-linux-android-ar...

...no, Termux apt doesn't recognise that one. Hmm. cargo install aarch64-linux-android-ar?

No, not that either. Binutils?

2022-11-19, 2:30 PM:
And once again I have run out of error messages but it's still failing. I wonder what happens if I do something about the warning that wheel isn't installed and it's having to use legacy setup...

2022-11-20, 11:30 AM:
(Meanwhile, on the TV-brain front: suspending the process, understandably enough, doesn't free up RAM. Therefore, pausability does not help.)

more troubleshooting

2022-12-01, 10:30 AM:
Upon closer inspection a few days back, I noticed there *was* another error message.

I got a chance to try the workaround late last night. Half an hour later it was still working at building a wheel for tokenizers, and I went to bed and left it to it.

This morning I checked on it, and...it says everything completed successfully.

Well then. Let's try it out.

2022-12-01, 11:50 AM:
Holy shit, just *unzipping an audio folder* so that Whisper would have something to chew on took like an hour. Not a great sign for how fast this is going to be, though there may have been extenuating circumstances (for one thing, the zip is on the external storage and the unzipped folder is on the internal storage).

Anyway, Material Files has finished that part now, so here we go.

cd /sdcard/2022-09-19 && whisper *.mp3 --model small.en --threads 8 --output_dir /sdcard/2022-09-19-v1

2022-12-01, 12:00 PM:
It...doesn't parse wildcards? Unlike on Linux?

Oh, it's because the unzipping left the folder more nested than I expected, and actually the cd should be /sdcard/2022-09-19/2022-09-19.

2022-12-01, 12:05 PM:
I checked Material Files and internal-storage folder 2022-09-19-v1 has been created, so that's a good sign.

2022-12-02, 12:30 AM:
We're twelve and a half hours in. The first speech in this clip occurs two and a half minutes in. It hasn't outputted any progress to the terminal yet.

I'll let it run overnight mostly out of curiosity, but currently I expect that this device cannot materially contribute to a grid computer. I think tomorrow I'll revert Termux to that August backup again and update from there, freeing up the large quantities of space currently being spent on Whisper and its prerequisites.

The good news is that the phone is not appreciably warm, so I'm not frying my battery.

2022-12-02, 12:30 PM:
We're now at 24.5 hours real-time, and...huh, 11.5 minutes. It did actually manage something.

Still, it looks like my initial hypothesis (before multi-threading) that the phone's effectiveness would be roughly on par with the TV brain was right, and I agree with yesterday-me that it's not worth including this in my available compute.

Neat proof of concept, though.

[arguably cw: embarrassment squick]

( Read more... )

(part 1)

---

Computers that I physically possess, for which I do not need permission before executing software on them, and on which nobody outside my household stakes a claim:

* My laptop
* Our TV's prosthetic brain (a thirteen-year-old ThinkPad running Linux Lite, with its A/V wired into the TV)
* My smartphone
* Possibly the other two remaining smartphones from my failed experiments with LineageOS, though they may no longer be usable enough: last I checked one of them had a broken USB port and could only be charged by swapping out its battery (which seems unsuitable for extended crunching projects), and I *think* the other one has started bootlooping though I'm not sure if it managed to pull out of that. (...oh hey, it *did* pull out of that. Interesting. I might be able to wring some more compute out of it if I strangle its fucking bloatware first. If it is to be one of my computers (particularly one crunching sensitive data), it will be *mine*, and AT&T gets no say in it whatsoever.)

((even if AT&T doesn't give a shit, it's the principle of the thing))

2022-09-25, 10:50 AM:
Successfully installed Whisper on the TV brain and got it to start crunching 2020-01-22, the first day. It's started outputting and everything! Looks like past-me is discussing electricity prices with her family.

I admit I did not especially understand the instructions I looked up on how to fix the "/home/netflix/.local/bin is not in PATH" errors, but I did successfully infer how to *circumvent* the problem of it not being in PATH:

/home/netflix/.local/bin/./whisper *.mp3 --model small.en --output_dir 2020-01-22-v1

(I realised pretty quickly that I should be labelling the transcript folders "v1", so that when I'm later upgrading them to higher-quality transcripts I'll know which days have been upgraded by how much.)

2022-09-25, 10:30 PM:
Me, last night: I'm pleased that it seems to already have a decent corpus of th-fronters

Whisper: We used to have 12 baddies, now we have a free.

("We used to have twelve bags, now we have three.")

Well, so be it. (And hey, I run into this problem with *people* sometimes too.)

2022-09-25, 10:45 PM:
Checked on the TV brain. Whisper does *run* on a thirteen-year-old laptop, and I appreciate that you're not *completely* locked out of using Whisper if that's all you have, but also it's been crunching for twelve hours and it's made it through about eight minutes. At this rate it's going to take...about four days to get through 2020-01-22, and that day was among the shorter ones.

The smartphones are loosely equal in specs to the TV brain, so I doubt they'd fare much better even if I can figure out an appropriate Termux setup. Definitely doesn't sound feasible to do on my real phone (even considering that I don't touch my phone many if not most days, there would be interruptions multiple times a week). Not sure it's worth trying to get it running on the other one.

...ooh, apparently (for compatibility reasons, I think) Whisper defaults to CPU and you have to explicitly tell it to use a GPU. I'll have to check what happens if I try that on my laptop, though I'll wait for one of the current two to complete rather than trying to throw in a third one. (I have *some* sort of dedicated graphics system, though it wasn't a priority of mine at the time so I don't have a clear sense of what (and, again, six-year-old business laptop, so whatever-it-is probably isn't that good: might be helpful vs CPU-only, though).)

2022-09-26, 8:04 PM:
I didn't leave my laptop running overnight, for...reasons best discussed in a separate post, but (unsurprisingly) the Whispers were undamaged by overnight suspension and one of the initial two has since finished.

I *think*, if I'm making the correct inferences from these error messages and help documents, that my GPU is incompatible because it's *both* old *and* AMD. New AMD (specifically, AMD with ROCm support) is slightly tricky but salvageable; old Nvidia might be salvageable; old AMD doesn't seem to have any salvage options.

Also, Mom says the TV brain is incapable of running Whisper and Netflix at the same time, which means an infeasible number of interruptions on *that* device too. And the AT&T smartphone seems to be locked down hard enough that I can't turn on USB debugging and strangle the bloatware. *God* I hate this phone.

I'll see if I can accumulate any castoff devices for Project House Community Grid, but at this rate I think I'm going to be starved for compute for the next...hmm...127 GB of data...28.8 MB/hour...implies an average of 4.5 hours of new data per day and about 4,400 hours to catch up on...three hours real-time to process one hour of data on my laptop if running two Whispers in parallel...three and a half years.

Well, with a figure like that I'll probably need a new laptop before then anyway. I will definitely be on the lookout for GPU specs next time I'm on the hunt for a primary computer.

In the meantime, I think I'll probably compromise by *not* attempting to catch up on the backlog, and just transcribe going forward for the time being. My poor computer deserves a chance to rest sometimes.

(...did Brother upgrade his gaming desktop's GPU a few months ago? He did some sort of desktop hardware upgrade; I think it might have been that. I wonder if he still has the old one, and what it would take to cobble it into a minimum viable desktop...)

---

(edit: part 3)

(well, September 21st was the day, but I found out about it this morning)

Automated! Offline! Open-source! Reputable-author! Transcription! That doesn't make you write your own interface software!

(you *can* write your own interface software, if you want, but you don't *have* to)

It even has multiple languages and the option for foreign-speech-to-English-text, though *currently* it sucks at the languages I'm likely to encounter. I'll stick with English-only for now.

Let's feed it the first minute of my standard test audio: me reading aloud the prologue of Ptolemy's Gate. For comparison, here's the real text:

Alexandria: 125 B.C.

The assassins dropped into the palace grounds at midnight, four fleet shadows dark against the wall. The fall was high, the ground was hard; they made no more sound on impact than the pattering of rain. Three seconds they crouched there, low and motionless, sniffing at the air. Then away they stole, through the dark gardens, among the tamarisks and date palms, toward the quarters where the boy lay at rest. A cheetah on a chain stirred in its sleep; far away in the desert, jackals cried.

They went on pointed toe-tips, leaving no trace in the long wet grass. Their robes flittered at their backs, fragmenting their shadows into wisps and traces. What could be seen? Nothing but leaves shifting in the breeze. What could be heard? Nothing but the wind sighing among the palm fronds. No sight, no noise. A crocodile djinni, standing sentry at the sacred pool, was undisturbed though they passed within a scale's breadth of his tail. For humans, it wasn't bad--

("it wasn't badly done", but the audio is cut off at the one-minute mark)

whisper voice-sample-speaking-age-25-1-minute-clip.mp3 --model small.en

[note: running this command consumed 2 CPU threads and about 2.5GB of RAM]

Output:
Alexandria, 125 B.C. The Assassins dropped into the palace grounds at midnight, four fleet shadows dark against the wall. The fall was high, the ground was hard, they made no more sound and impact in the pattern of rain. Three seconds they crouched there, low and motionless sniffing at the air. Then away they stole, through the dark gardens among the town risks and date palms, towards the quarters where the boy lay at rest. A tree on a chain stirred in its sleep. Far away in the desert, jackals cried. They went on pointed toe-tips, leaving no trace in the long, wet grass. Their robes flittered at their backs, fragmenting their shadows in the wisps and traces. What could be seen? Nothing but leaves, shifting in the breeze. What could be heard? Nothing but the wind sighing among the palm fronds. No sight, no noise. The crocodile genie, standing sentry at the sacred pool, was on the stirrup, though they passed through the scales, breathless his tail. For humans, it wasn't bad.

*Fuck yes*. Oh, it's not perfect, but it's *basically* intact. And, let's face it, my voice is *not* easy-mode, and neither is this vocabulary: I'm especially impressed that it understood "their robes flittered", when honestly if I didn't already know that's what it was I might not have transcribed it correctly myself.

Very slow, though: closer to six times *slower* than real-time than the claimed six times faster (on default settings). (Probably my hardware is underpowered: I'm running it on a six-year-old laptop aimed at businessfolk.) That is...slow enough that it may actually be unable to keep up with the amount of audio I produce, let alone work on the backlog. (Although, at 2 CPU threads and 2.5 GB of RAM, I could probably run two Whispers in parallel overnight, assigning each of them a different series of files. And I wonder if I could rope in the TV's prosthetic brain...)

Hmm...

whisper voice-sample-speaking-age-25-1-minute-clip.mp3 --model tiny.en

Alexander, 125 BC. The assassins dropped into the palace grounds at midnight, four fleet shadows darkened the wall. The fall was high, the ground was hard, they made no more sound on impact in the pattern of rain. Three seconds they crouched there, low and motionless sniffing at the air, then away they stole, through the dark gardens among the cameras and date palms, towards the corners where the boy lay at rest. A cheetah on a chain stirred in sleep, far away and desert jackals cried. They went on pointed toe tips leaving no trace in the long lit grass. The ropes were there at their backs, fragmenting their shadows in the wrists and traces. What could be seen? Nothing but leaves, shifting in the breeze. What could be heard? Nothing but the wind sighing among the palm fronds. No sight? No noise. A crocodile, genie, standing sentry at a quick sacred core, will end the story of the past of the skills best with his tail. For humans, it wasn't bad.

My inner 00's selves are impressed to have even made it *that* far, but it's not quite good enough to do actual work with. (Although it's interesting that *this* one correctly got "cheetah" when small.en didn't.)

whisper voice-sample-speaking-age-25-1-minute-clip.mp3 --model base.en

Alexandria, 125 BC. The assassins dropped into the palace grounds at midnight, four fleet shadows dark against the wall. The fall was high, the ground was hard, they made no more sound on impact in the pattern of rain. Three seconds they crouched there, low and motionless, the thing at the air. Then away they stole, through the dark gardens among the town risks and date poems, towards the quarters where the boy lay at rest. A cheetah on a chain stirred in its sleep, far away and desert jaggles cried. They went on pointed toe tips, leaving no trace in the long wet grass. Their robes swillied at their backs, fragmenting their shadows in the wisps and traces. What could be seen? Nothing but leaves shifting in the breeze. What could be heard? Nothing but the wind sighing among the palm fawns. No sight, no noise. The crocodile genie, standing sentry at the sacred pool, was undister of the late past when his scales pressed his tail. For humans, it wasn't bad.

*Slightly* better overall than tiny.en, but not by much, and not strictly superior (it makes some mistakes that tiny.en didn't).

---

If anyone has the hardware specs to pull off running "medium.en" or "large", I'd be interested to see what you get out of putting the clip through it. (Though I can't outsource most of my transcription needs to anyone else, for privacy reasons: this would just be for curiosity's sake, and to know what to look forward to when I someday get my hands on higher-grade hardware myself.)

---

Overall verdict: I am so fucking hyped for this development†, and especially about what can be pulled off if you combine it with Recoll indexing.

The transcription software we (I) have been hoping for is here, right now.

---

†my brain has never been good at generating excitement qualia, but intellectually I am so fucking hyped and I'm not *completely* incapable of emotional excitement ↩

---

(edit: part 2)

(previously on)

---

The Youtube auto-transcriber now gives you the option of seeing the entire transcript at once, rather than having to wait for each line of caption to pop up: click on the three-dot menu (next to "Save"), then click "Open transcript". And yes, that podcast I was looking at earlier has a Youtube mirror now.

The main downside is that it doesn't distinguish between different speakers. That's fine for one-person speeches, but gets a bit confusing with a three-host podcast. There's also still some issues with homophones, obscure proper nouns, and occasionally mistaking obscure words for more common ones that aren't *quite* homophones, but it's pretty easy to figure out what it meant on those.

It was interesting to note that the auto-transcriber *does* know how to spell "nanowrimo", and knows that a string of words occurring after "you can also find us on twitter or instagram at" should be run together into one word.

In which *some* but not *all* podcasts use auto-transcripts, so you can't immediately disregard something for being a podcast.

I was hoping Just Plain Wrong had a text version. :(

---

Listen Notes informs me that Google is willing to run their auto-transcriber on anything played through Google Chrome (not just Youtube videos), but 1: fuck Google Chrome, and 2: it sounds like it only *captions* rather than transcribing per se, so a 36-minute episode would require 36 minutes of hanging around watching for each new word/line to pop up (as opposed to dumping the audio into a processing queue, going off to do other things, and getting all the text at once later).

[mild cw: illness, food]

( Read more... )

We'll be formatting this slightly differently today. First, the comments/links that *aren't* COVID-19 related:

Comments on my own posts:

The one about how hard it is to control the exact phrasing of your speech

---

Comments on other people's posts:

[cw: food] [Tumblr; Wayback] (OP by

etirabys) Granola bars are a solution to many of life's problems.

[Tumblr; Wayback] (OP by

etirabys) Books within dreams.

---

Links:

[GitHub; Wayback] (by

gildas-lormeau; h/t Gwern Branwen) Tired of wrangling the file proliferation of saving webpages to HTML? Tired of the broken formatting of saving webpages to PDF? Try SingleFile! I've been testing it out on parts of my archive where I was previously making do with shitty PDFs or multi-file HTMLs, and it's been working great.

Laugh rule:
[Diaryland; Wayback] (by Grizzly and Ninja of the Clan)
Oh, and Jack liked the exercies. He thought they were fun and he likes the color yellow. He wants me to make sure I included that in this review. He likes the color yellow.

There is a blanket [cw: illness] on the rest of this post.

( Read more... )

(inspired by Book Review: The Seven Principles For Making Marriage Work)

It weirds me out when people talk about I-statements and other such speaking techniques.

Is this an autism thing? Like, are allistics actually capable of doing that anywhere close to reliably (assuming they choose to)?

I...basically, I am a Mass Effect character. I can generally control the *gist* of what I am saying (barring the occasional misclick), but for the most part I cannot control the exact words that come out of my mouth: I just have to hope that the phrasing doesn't have connotations or implications that I didn't intend. It's hard enough to make sure I always have the right number of *negations*, and you want me to use *I-statements*??

[might qualify as embarrassment squick?]

( Read more... )

[cw: amnesia, Maze Runner spoilers]

( Read more... )

---

(This post brought to you by watching the CinemaSins sporkings of the Maze Runner movies. I've been finding that CinemaSins are pretty good for jogging, because they come with subtitles so you don't have to be able to clearly hear what he's saying over the sound of the treadmill. I do wish, though, that he'd consistently subtitle what the characters are saying and not just what he's saying: occasionally he does, but mostly he doesn't. Fairly often he reacts to something and I didn't catch the thing he's reacting to.)

Profile

Brin

May 2025

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Page Summary

Comment and Link Roundup: February 1, 2023
Today's the day, part ~4
Today's the day, part ~3
Today's the day, part ~2: House Community Grid
Today's the day! It's happening!
Still in an awkward transitional period of history, but definitely further along now
The awkward transitional period of history we are currently living through
We have better options than vocal cords
Comment and Link Roundup: March 20, 2020
(no subject)
It's Three Real Estate
(no subject)

Syndicate

Style Credit

Style: Neutral Good for Practicality by timeasmymeasure

Expand Cut Tags

Page generated Jun. 7th, 2025 05:05 pm

Brinens and Things

now with 50% more Internet Archive

Entries tagged with that excuse for communication‚ speech

Comment and Link Roundup: February 1, 2023

Today's the day, part ~4

Today's the day, part ~3

Today's the day, part ~2: House Community Grid

Today's the day! It's happening!

Still in an awkward transitional period of history, but definitely further along now

The awkward transitional period of history we are currently living through

We have better options than vocal cords

Comment and Link Roundup: March 20, 2020

(no subject)

It's Three Real Estate

(no subject)

Profile

May 2025

Page Summary

Tags

Syndicate

Style Credit

Expand Cut Tags