Ivy Zemura yvI oo ii oo Posted March 3 Author Report Share Posted March 3 (edited) On 2/19/2025 at 4:44 PM, Ego said: I'm just collecting the full page source for each page into an SQLite database. A bit wasteful but want to make sure we don't miss anything. 22GB and counting... Later on we can consider parsing out all the actual posts and metadata. I also need to do a run to fetch all the images that were uploaded to the board. So what are your plans for after this? I'd first suggest taking the biggest copy of the database and cleaning it down to a minimal dataset then setting up a torrent that we can all seed for you. Once the data's lifetime is secured, hosting it somewhere could become simpler. Even just as a static html archive. Doesn't need any javascript it can just be a static webpage. Additionally for each version you could provide deterministic hashes of the database's contents cryptographically signed by pgp. I suggest you post a public PGP key of yours here on WATMM as EgoWATMMArchive so we can be sure that all future correspondences are from you. Just suggestions not trying to burden you Edited March 3 by Ivy Zemura yvI oo ii oo Thanks Haha Confused Sad Facepalm Burger Farnsworth Big Brain Like × Quote Link to comment https://forum.watmm.com/topic/105501-~scraping-watmm-project~/page/4/#findComment-3006759 Share on other sites More sharing options...
MisterE Posted March 3 Report Share Posted March 3 just let it go, its gone, you're just creating some sad carcass that will make you feel all of your life's regrets any time you dig into this data. imagine how sad that will be. nothing of any real value will be lost, just let it go. idm was a mistake colunga, ManjuShri, Freak of the week and 3 others 2 1 1 2 Thanks Haha Confused Sad Facepalm Burger Farnsworth Big Brain Like × Quote Link to comment https://forum.watmm.com/topic/105501-~scraping-watmm-project~/page/4/#findComment-3006764 Share on other sites More sharing options...
snue apnu Posted March 3 Report Share Posted March 3 On 3/3/2025 at 1:46 AM, MisterE said: idm was a mistake Big if true Thanks Haha Confused Sad Facepalm Burger Farnsworth Big Brain Like × Quote Link to comment https://forum.watmm.com/topic/105501-~scraping-watmm-project~/page/4/#findComment-3006765 Share on other sites More sharing options...
Ivy Zemura yvI oo ii oo Posted March 3 Author Report Share Posted March 3 On 3/3/2025 at 1:46 AM, MisterE said: just let it go, its gone, you're just creating some sad carcass that will make you feel all of your life's regrets any time you dig into this data. imagine how sad that will be. nothing of any real value will be lost, just let it go. idm was a mistake ok why did you log in im not super sentimental about watmm i just like data hoarding and want to be able to access obscure idm knowledge when i need to gnarlybog 1 Thanks Haha Confused Sad Facepalm Burger Farnsworth Big Brain Like × Quote Link to comment https://forum.watmm.com/topic/105501-~scraping-watmm-project~/page/4/#findComment-3006767 Share on other sites More sharing options...
Ego Posted March 3 Report Share Edit Hide Delete Posted March 3 On 3/3/2025 at 1:03 AM, Ivy Zemura yvI oo ii oo said: So what are your plans for after this? I'd first suggest taking the biggest copy of the database and cleaning it down to a minimal dataset then setting up a torrent that we can all seed for you. Once the data's lifetime is secured, hosting it somewhere could become simpler. Even just as a static html archive. Doesn't need any javascript it can just be a static webpage. Additionally for each version you could provide deterministic hashes of the database's contents cryptographically signed by pgp. I suggest you post a public PGP key of yours here on WATMM as EgoWATMMArchive so we can be sure that all future correspondences are from you. Just suggestions not trying to burden you Expand Indexing the last 10% now. If I zip the SQLite database it will be under 1.5GB so what I'll do: - Host the zipped SQLite database. Better to have multiple copies around. - Setup an archive site on which you can just browse for now. Can add search later. I have some space and bandwidth available to host this for the foreseeable future. I will still have to download all the images uploaded to the board after that. Not sure yet what kind of storage requirements to expect from that. After that. Personally I think it's best to transfer as much content as possible over to the new board. Whatever we can do for continuity to convince as many WATMM members to follow to the new board seems like a good idea to me. I or someone else can write a parser to reformat the raw scraped data to whatever format Invision expects for importing. I don't think we can reach agreement on doing any kind of judgement call on what to keep vs delete. I had to keep an eye on the crawling as the Chrome instance I use for scraping sometimes locks up and saw a lot of interesting old threads show up in all sub-forums. Uros, J3FF3R00, auxien and 4 others 1 6 Quote Link to comment https://forum.watmm.com/topic/105501-~scraping-watmm-project~/page/4/#findComment-3006788 Share on other sites More sharing options...
Ego Posted March 6 Report Share Edit Hide Delete Posted March 6 Here is the archive: https://watmm-archive.com/ Let me know if you see something is missing. t yst r, bluelechg6, zazen and 26 others 4 23 1 1 Quote Link to comment https://forum.watmm.com/topic/105501-~scraping-watmm-project~/page/4/#findComment-3007279 Share on other sites More sharing options...
hoggy Posted March 6 Report Share Posted March 6 Would anyone clever be able to automatically compile the whole contents of genre threads like noise, jazz etc. into one simple list of links or a playlist? Thanks Haha Confused Sad Facepalm Burger Farnsworth Big Brain Like × Quote Hide hoggy's signature Hide all signatures "Whoa! Check it out! RO-BIGH-DUHS!" sigh.. "That's Ribena.." Link to comment https://forum.watmm.com/topic/105501-~scraping-watmm-project~/page/4/#findComment-3007283 Share on other sites More sharing options...
zazen Posted March 6 Report Share Posted March 6 On 3/6/2025 at 12:12 PM, Ego said: Here is the archive: https://watmm-archive.com/ Let me know if you see something is missing. holy shit dude thats amazing bluelechg6 and t yst r 2 Thanks Haha Confused Sad Facepalm Burger Farnsworth Big Brain Like × Quote Link to comment https://forum.watmm.com/topic/105501-~scraping-watmm-project~/page/4/#findComment-3007284 Share on other sites More sharing options...
auxien Posted March 6 Report Share Posted March 6 holy shit Ego that's wild! absolutely great work, even preserving the page style? insane. many many thanks. Thanks Haha Confused Sad Facepalm Burger Farnsworth Big Brain Like × Quote Hide auxien's signature Hide all signatures / b c / m a s t o d o n / b l o t / Link to comment https://forum.watmm.com/topic/105501-~scraping-watmm-project~/page/4/#findComment-3007294 Share on other sites More sharing options...
psn Posted March 6 Report Share Posted March 6 I wanna buy you a beer. Thanks Haha Confused Sad Facepalm Burger Farnsworth Big Brain Like × Quote Link to comment https://forum.watmm.com/topic/105501-~scraping-watmm-project~/page/4/#findComment-3007296 Share on other sites More sharing options...
psn Posted March 6 Report Share Posted March 6 Agreeing to or rejecting the cookies dialog seems to lead to problems. Thanks Haha Confused Sad Facepalm Burger Farnsworth Big Brain Like × Quote Link to comment https://forum.watmm.com/topic/105501-~scraping-watmm-project~/page/4/#findComment-3007298 Share on other sites More sharing options...
t yst r Posted March 6 Report Share Posted March 6 On 3/6/2025 at 12:12 PM, Ego said: Here is the archive: https://watmm-archive.com/ Let me know if you see something is missing. This is fantastic!!! Thank you so much! Thanks Haha Confused Sad Facepalm Burger Farnsworth Big Brain Like × Quote Link to comment https://forum.watmm.com/topic/105501-~scraping-watmm-project~/page/4/#findComment-3007316 Share on other sites More sharing options...
th555 Posted March 6 Report Share Posted March 6 You're a hero, thanks so much. ManjuShri, Freak of the week and psn 1 1 1 Thanks Haha Confused Sad Facepalm Burger Farnsworth Big Brain Like × Quote Hide th555's signature Hide all signatures https://www.youtube.com/user/THkaas/videos https://thisjepisje.bandcamp.com/ https://soundcloud.com/th555 Link to comment https://forum.watmm.com/topic/105501-~scraping-watmm-project~/page/4/#findComment-3007330 Share on other sites More sharing options...
cruising for burgers Posted March 7 Report Share Posted March 7 On 3/6/2025 at 12:12 PM, Ego said: Here is the archive: https://watmm-archive.com/ Let me know if you see something is missing. shit man that's a lotta work, thanks so much! Thanks Haha Confused Sad Facepalm Burger Farnsworth Big Brain Like × Quote Hide cruising for burgers's signature Hide all signatures ig discogs mixcloud soundcloud Link to comment https://forum.watmm.com/topic/105501-~scraping-watmm-project~/page/4/#findComment-3007383 Share on other sites More sharing options...
Lane Visitor Posted March 7 Report Share Posted March 7 Noooo!!!! How did I just find out now? Is someone going to create a new watmm? Fuck man. t yst r 1 Thanks Haha Confused Sad Facepalm Burger Farnsworth Big Brain Like × Quote Link to comment https://forum.watmm.com/topic/105501-~scraping-watmm-project~/page/4/#findComment-3007385 Share on other sites More sharing options...
YEK Posted March 7 Report Share Posted March 7 This is great! Thanks Haha Confused Sad Facepalm Burger Farnsworth Big Brain Like × Quote Hide YEK's signature Hide all signatures Reveal hidden contents !:/music Link to comment https://forum.watmm.com/topic/105501-~scraping-watmm-project~/page/4/#findComment-3007387 Share on other sites More sharing options...
Freak of the week Posted April 2 Report Share Posted April 2 acid thread Thanks Haha Confused Sad Facepalm Burger Farnsworth Big Brain Like × Quote Link to comment https://forum.watmm.com/topic/105501-~scraping-watmm-project~/page/4/#findComment-3009325 Share on other sites More sharing options...
Ivy Zemura yvI oo ii oo Posted April 2 Author Report Share Posted April 2 On 3/6/2025 at 12:12 PM, Ego said: Here is the archive: https://watmm-archive.com/ Let me know if you see something is missing. you are a king among mere men Thanks Haha Confused Sad Facepalm Burger Farnsworth Big Brain Like × Quote Link to comment https://forum.watmm.com/topic/105501-~scraping-watmm-project~/page/4/#findComment-3009331 Share on other sites More sharing options...
ignatius Posted April 2 Report Share Posted April 2 (edited) On 3/6/2025 at 12:12 PM, Ego said: Here is the archive: https://watmm-archive.com/ Let me know if you see something is missing. holy shit. thanks! edit: i dumped the link int he new forum w/credit to Ego. hopefullly that's coolllllll Edited April 2 by ignatius Uros 1 Thanks Haha Confused Sad Facepalm Burger Farnsworth Big Brain Like × Quote Hide ignatius's signature Hide all signatures Releases Sample LIbraries instagram Cascade Data Mastodon Reveal hidden contents "All I know about you is what a knock off Autechre lite artist you are, how many you put out?> same with your fucking mindset, vanilla...........goodnight." - arti Link to comment https://forum.watmm.com/topic/105501-~scraping-watmm-project~/page/4/#findComment-3009332 Share on other sites More sharing options...
Recommended Posts