brin_bellway: forget-me-not flowers (Default)
[personal profile] brin_bellway
[arguably cw: amnesia]


Okay, what the fuck? Why is --page-requisites not doing what it claims to do?

Embedded images are *not* being preserved, merely hotlinked! That is *not* acceptable!

(and even some of the *stylesheets* are fucked up!)

(*looks at wget-scrape collection* ...in hindsight, I should have noticed that Hyperbole and a Half compressed down to under 7 MB and there was no way in hell that was legitimate.)

*long sigh* I'm starting to run low on space in my 256 GB external phone storage (and 512s are still very expensive, with 400s not much better), and now it seems that some of these things are going to be much bulkier than I thought. (although I guess I could--ironically?--drop back down to imageless Wikipedia, which would buy me about 40 GB.) Also--as for other, more functional scraping methods I could use--WARC files are neither readable on mobile† nor indexable by Recoll†, and I've never been able to get any ZIM scrapers nor converters to work. (Not that that would solve the Recoll problem, but it would at least solve the mobile problem. Recoll is only really important for the scrape of my own blog, which I search fairly often: if necessary, I could do a wget scrape of that *just* for indexing purposes plus a grab-site (or some theoretical functional zim-scraper) scrape for actual backup.)

(hmm, I might be able to coax something out of SingleFile, which apparently has automatability/mass-download features I haven't experimented with...)

---

†Yet? But I do not remotely have the programming skills to actually make either of those happen myself.

---

(edit: part 2)

Date: 2021-07-19 11:54 am (UTC)
From: [personal profile] contrarianarchon
This is really annoying! Good luck finding a solution!

Profile

brin_bellway: forget-me-not flowers (Default)
Brin

May 2025

S M T W T F S
    123
45678910
11121314151617
18 192021222324
25262728293031

Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jun. 9th, 2025 06:58 am
Powered by Dreamwidth Studios