Downloading full Website offline
-
So I have been playing around with downloading a site offline for archiving purposes. In this case I have written scripts for the below:
For a Full Website (This will download the whole site as it is)
wget -mkEpnp https://mangolassi.it
For a group of Posts in numerical order(This example downloads all the topics from Mangolassi)
#!/bin/bash for i in {1..2200000} do wget -mkEpnp https://mangolassi.it/topic/$i done
-
I wish that we had that many topics, ha.
-
@scottalanmiller said in Downloading full Website offline:
I wish that we had that many topics, ha.
#!/bin/bash for i in {1..2200000} do wget -mkEpnp https://community.spiceworks.com/topic/$i done
Example for Spiceworks.
-
Works really well. Testing against ML now. Not the most elegant way to get full content, kind of brute force. But it gets it all, and that's the important part. Gets all the media along with it, like images. So you end up with multiple copies of a lot of that stuff, I would imagine. Takes a while to run because it following the millions of links to gets everything related to a page, not just the page itself. But boy is it fast.
-
I also found this application for Windows that does the same
https://www.cyotek.com/cyotek-webcopy -
@dbeato I've got a cool script that tells you which threads on ML are popular. Kind of heavy, and only so useful alone, but it is trivial to modify it to track scripts that you care about. I'll make a post for it.
-
You can tell when you run that thing, both the NodeBB platform and CloudFlare really show the traffic shape change. One big thing is because it harvests old thread it hits content that is not cached. So the cache hit ratio just takes a beating.
-
Save Page WE
https://chrome.google.com/webstore/detail/save-page-we/dhhpefjklgkmgeafimnjhojgjamoafofExtension on chrome and firefox, saves single page using MHT and does that in good way, if you want single page
-
@scottalanmiller said in Downloading full Website offline:
@dbeato I've got a cool script that tells you which threads on ML are popular. Kind of heavy, and only so useful alone, but it is trivial to modify it to track scripts that you care about. I'll make a post for it.
That would be cool to see.
-
@dbeato said in Downloading full Website offline:
@scottalanmiller said in Downloading full Website offline:
@dbeato I've got a cool script that tells you which threads on ML are popular. Kind of heavy, and only so useful alone, but it is trivial to modify it to track scripts that you care about. I'll make a post for it.
That would be cool to see.
It has been posted.
-
@scottalanmiller said in Downloading full Website offline:
@dbeato said in Downloading full Website offline:
@scottalanmiller said in Downloading full Website offline:
@dbeato I've got a cool script that tells you which threads on ML are popular. Kind of heavy, and only so useful alone, but it is trivial to modify it to track scripts that you care about. I'll make a post for it.
That would be cool to see.
It has been posted.
I saw... I am slow
-
Why back it up that way vs the server or files and DB?
-
@Obsolesce said in Downloading full Website offline:
Why back it up that way vs the server or files and DB?
Because I don't have access to them at least on those two examples.
-
@dbeato said in Downloading full Website offline:
@Obsolesce said in Downloading full Website offline:
Why back it up that way vs the server or files and DB?
Because I don't have access to them at least on those two examples.
Oh, why are you supposed to back up ML without having access to the back-end?
And, how would you restore anything with those backups?
-
@Obsolesce said in Downloading full Website offline:
@dbeato said in Downloading full Website offline:
@Obsolesce said in Downloading full Website offline:
Why back it up that way vs the server or files and DB?
Because I don't have access to them at least on those two examples.
Oh, why are you supposed to back up ML without having access to the back-end?
And, how would you restore anything with those backups?
I am not, I was just downloading an offline version. It was a test. ML is pretty big and other forums are big so not a backup.
-
@Obsolesce said in Downloading full Website offline:
@dbeato said in Downloading full Website offline:
@Obsolesce said in Downloading full Website offline:
Why back it up that way vs the server or files and DB?
Because I don't have access to them at least on those two examples.
Oh, why are you supposed to back up ML without having access to the back-end?
And, how would you restore anything with those backups?
Its an emergency procedure for someone who worries that something might happen to the community and disappear. You could programtically reconstruct the community if you had to.
DB access is way better. Obviously.
-
@Obsolesce said in Downloading full Website offline:
@dbeato said in Downloading full Website offline:
@Obsolesce said in Downloading full Website offline:
Why back it up that way vs the server or files and DB?
Because I don't have access to them at least on those two examples.
Oh, why are you supposed to back up ML without having access to the back-end?
And, how would you restore anything with those backups?
RE: Restore
It builds a static version of the site that you could host.
-
So - is someone considering doing that in case another site fails? I wonder how much storage is needed?
-
@Dashrender said in Downloading full Website offline:
I wonder how much storage is needed?
For example ML took about 24 GB of two days downloading, I stopped it because I didn't need it.
-
@dbeato said in Downloading full Website offline:
@Dashrender said in Downloading full Website offline:
I wonder how much storage is needed?
For example ML took about 24 GB of two days downloading, I stopped it because I didn't need it.
lol, not the site I was talking about