Jump to content

Recommended Posts

  On 2/19/2025 at 4:44 PM, Ego said:

I'm just collecting the full page source for each page into an SQLite database. A bit wasteful but want to make sure we don't miss anything. 22GB and counting...

Later on we can consider parsing out all the actual posts and metadata. I also need to do a run to fetch all the images that were uploaded to the board.

So what are your plans for after this?  I'd first suggest taking the biggest copy of the database and cleaning it down to a minimal dataset then setting up a torrent that we can all seed for you.  Once the data's lifetime is secured, hosting it somewhere could become simpler.  Even just as a static html archive.  Doesn't need any javascript it can just be a static webpage.  Additionally for each version you could provide deterministic hashes of the database's contents cryptographically signed by pgp.  I suggest you post a public PGP key of yours here on WATMM as EgoWATMMArchive so we can be sure that all future correspondences are from you.  Just suggestions not trying to burden you

Edited by Ivy Zemura yvI oo ii oo

just let it go, its gone, you're just creating some sad carcass that will make you feel all of your life's regrets any time you dig into this data. imagine how sad that will be. nothing of any real value will be lost, just let it go. idm was a mistake

  On 3/3/2025 at 1:46 AM, MisterE said:

just let it go, its gone, you're just creating some sad carcass that will make you feel all of your life's regrets any time you dig into this data. imagine how sad that will be. nothing of any real value will be lost, just let it go. idm was a mistake

ok why did you log in

im not super sentimental about watmm i just like data hoarding and want to be able to access obscure idm knowledge when i need to

  On 3/3/2025 at 1:03 AM, Ivy Zemura yvI oo ii oo said:

So what are your plans for after this?  I'd first suggest taking the biggest copy of the database and cleaning it down to a minimal dataset then setting up a torrent that we can all seed for you.  Once the data's lifetime is secured, hosting it somewhere could become simpler.  Even just as a static html archive.  Doesn't need any javascript it can just be a static webpage.  Additionally for each version you could provide deterministic hashes of the database's contents cryptographically signed by pgp.  I suggest you post a public PGP key of yours here on WATMM as EgoWATMMArchive so we can be sure that all future correspondences are from you.  Just suggestions not trying to burden you

Expand  

Indexing the last 10% now. If I zip the SQLite database it will be under 1.5GB so what I'll do:

- Host the zipped SQLite database. Better to have multiple copies around.
- Setup an archive site on which you can just browse for now. Can add search later. I have some space and bandwidth available to host this for the foreseeable future.

I will still have to download all the images uploaded to the board after that. Not sure yet what kind of storage requirements to expect from that.

After that. Personally I think it's best to transfer as much content as possible over to the new board. Whatever we can do for continuity to convince as many WATMM members to follow to the new board seems like a good idea to me. I or someone else can write a parser to reformat the raw scraped data to whatever format Invision expects for importing.

I don't think we can reach agreement on doing any kind of judgement call on what to keep vs delete. I had to keep an eye on the crawling as the Chrome instance I use for scraping sometimes locks up and saw a lot of interesting old threads show up in all sub-forums.

Would anyone clever be able to automatically compile the whole contents of genre threads like noise, jazz etc. into one simple list of links or a playlist?

"Whoa! Check it out! RO-BIGH-DUHS!"

sigh.. "That's Ribena.."

  On 3/6/2025 at 12:12 PM, Ego said:

Here is the archive:
https://watmm-archive.com/

Let me know if you see something is missing.

holy shit dude thats amazing

holy shit Ego that's wild! absolutely great work, even preserving the page style? insane. many many thanks.

  On 3/6/2025 at 12:12 PM, Ego said:

Here is the archive:
https://watmm-archive.com/

Let me know if you see something is missing.

This is fantastic!!! Thank you so much!

  • 4 weeks later...
  On 3/6/2025 at 12:12 PM, Ego said:

Here is the archive:
https://watmm-archive.com/

Let me know if you see something is missing.

you are a king among mere men

  On 3/6/2025 at 12:12 PM, Ego said:

Here is the archive:
https://watmm-archive.com/

Let me know if you see something is missing.

holy shit. thanks!

edit: i dumped the link int he new forum w/credit to Ego. hopefullly that's coolllllll

Edited by ignatius

Releases

Sample LIbraries

instagram

Cascade Data 

Mastodon

  Reveal hidden contents

 

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   1 Member

×
×