(
Edit Aug-05-2021: I've been having trouble lately with getting the --page-requisites flag to work correctly. Feel free to experiment with the method described below, but I personally have switched to
grab-site.)
Every post on this subject that I've found (a weirdly small number) seems to be so outdated as to be useless: none of their methods work anymore. So, I'm going to be the change I want to see in the world.
I think I've figured out how to make wget (a general-purpose web scraper) create a local copy of a Dreamwidth blog.
Here is how I did it (using Chrome on Ubuntu; adapt your approach accordingly if your browser and/or OS is different):
1. Install the "cookies.txt" Chrome extension.
2. Go to dreamwidth.org. (I used the homepage, but I think any Dreamwidth page will do. You must be logged into Dreamwidth, otherwise steps 1 - 3 won't help anything.)
3. Click on the cookies.txt button in your extension bar. Request an export of the cookies related to dreamwidth.org. The export arrives as a file called "cookies.txt" in the Downloads folder.
4. Open a command-line terminal.
(4a. wget was already installed. I don't remember if it comes pre-installed on Ubuntu, or if I grabbed it from Synaptic during a previous archiving experiment.)
(4b. I read the wget manual, since in order to experiment with which promising-looking flags to include I needed to know what the options were.)
5. After several rounds of trial-and-error, including one instance of accidentally attempting to download the entire Internet from my blog outward, settle on the following command:
wget brin-bellway.dreamwidth.org --adjust-extension --mirror --page-requisites --convert-links --restrict-file-names=windows --load-cookies ~/Downloads/cookies.txt
Explanation of the command:
"brin-bellway.dreamwidth.org": The URL of the blog you're looking to download.
"--adjust-extension": I did not originally think it would matter for this usecase, but when I added it a lot more of the page formatting started to function, including the search-by-tag pages that were previously a bunch of raw code.
"--mirror": Downloads the entire blog, not just that one page.
"--page-requisites": Downloads things like stylesheets and images, not just the main part of the page.
"--convert-links": Lets you navigate within your local copy by clicking its links: for example, if you start at "index.html" (your blog's main page), you can click on a post title to go to the local copy of that post (including its comments!).
"--restrict-file-names=windows": Re-names any files with characters Windows would freak out about (including, but not limited to, such innocuous characters as question marks and double-quotes). I have had too many problems in the past with attempting to access my files on Windows machines to not pre-emptively include this.
"--load-cookies ~/Downloads/cookies.txt": By giving wget that cookie file you exported, you allow it to present itself to Dreamwidth as being you. Without this, it won't be able to include access-locked posts.
(Note: except for "wget" being first, the parts can be written in any order.)
(
Edit Jun-13-2019: I was getting a bunch of variants of each post taking up space, so I've added this flag:
--reject "*edit=*","*mode=reply*","*replyto=*","*style=light*","*style=site*","*thread=*". Of course, feel free to reject or not reject whatever you see fit: perhaps you'd rather keep the ability to switch page styles or filter to a particular comment thread. (Note: rejecting unwanted variants will make your backup smaller in size, but will *not* make it take less time or bandwidth to prepare: wget downloads each page *before* deciding whether to get rid of it. I know, it's a rather wasteful way of doing things: maybe one day they'll make a better version, but I haven't the skills to help them.))
---
This command results in a folder in your home directory, "brin-bellway.dreamwidth.org", with a bunch of sub-folders and individual HTML files inside. The folder is 6.9 MB in size, and took about five minutes to prepare. (Note that my blog is pretty small, since I only just started actively posting here a few days ago. If you try this method on a well-established blog, feel free to comment and let me know about how long it took and how big the folder was. Even if it takes several hours, though, it's not like you have to actively tend to it: you can go read or play video games or make dinner, and just let the program run in the background.)
I'm not sure how frequently or under which circumstances Dreamwidth login cookies are invalidated, but probably the
worst-case scenario on that is having to re-export the cookies every time you do a new backup. Depending on how fragile the cookies are you might not be able to easily automate the backup process, but at least it's very easy to do manually once you know how.
(
Edit Jan-22-2019: The already-made cookie export worked for a while, but today's backup quietly reverted to not-logged-in mode. (The timing *suggests* "one month" is the cutoff, since my last backup was January 10th.) If the cookie export you're using isn't fresh, check to make sure the login worked properly.)
I have not experimented with running a new backup while the old backup's folder is still there. If you don't feel like experimenting either but don't want to delete the old backup yet, you can always move it somewhere other than the home directory.