Cool, good job. How did you do that? Did you use a web-spider on your computer that downloaded all the contents of the forum? How long did it take to complete, if so?
I have written my own program.
It first downloaded all forum pages from rd.net. All threads are numbered from 1 to the number of the newest thread. I knew the number X of the newest thread, so I just had to try to download the first page of numbers 1 to X. Then, after downloading, analyse the content to find out whether this is a "missing thread", a "thread I may not access" or "successfully downloaded page". In the latter case, analyse the content to find out whether there are more pages in that thread and if yes, download those further pages.
This took about three days (not running continuously, but only during the day). I deliberately made a short break between each page download and deliberately did not run multiple downloads in parallel, so as to not stress the server too much and to fly under the radar of any systems that might be there to prevent DOS attacks.
Second step was to analyse the DOM of the downloaded HTML pages to find out the single posts, their IDs, the topic title of the thread, the poster for each post, the parent forum of the thread and the usernames and avatar details of each user. (storing all the information in a Firebird database)
This went in parallel to the download, so it was also finished after three days.
The next step was to use the collected information to create tables of contents with lists of threads, users and forums. This then happened in less than an hour.
The next step was to use the collected information on avatars to download the avatars from rd.net. This took a few hours.
The final step was processing the originally downloaded pages and creating new better versions with:
- Internal links between pages fixed (using information from the Firebird database
- Replacing links to user pages to my own generated user pages
- Replacing image links to avatars to my downloaded versions of the avatars.
- Removing all unneeded content like banner, footer, post-buttons, reply-buttons, quote-buttons and so forth
This processing took about three days, again only running during the day.