• Site is being failraided (they're DOSing us and it's not working)
    >3000 guests while about 30 users are active

should i scrape the soybooru?

yunglimabean

Guest
my motivation is that i am trans btw the sheer autism must be preserved and i have the storage or something, also the current shimmie scraper doesn't scrape comments from what i've heard so a custom scraper isn't too hard to make because all i gotta do is parse the dom, grab the image url and iterate through all the comments
as of writing there are 74092 submissions on the 'ru so i think that's a pretty large number but not large enough where it can't all be scraped within a day
 
once this scrape is completed i'll upload it on archive.org for generations to come to witness the pure unfiltered autism
 
over 70K images? I remember scraping the whole original booru when it had around 9K and thought I was mad. I don't see why not.
it's a shame I lost that folder though, it guaranteed a victory in every 'duel plus it had some jaks that I think are lost on the current booru
 
over 70K images? I remember scraping the whole original booru when it had around 9K and thought I was mad. I don't see why not.
it's a shame I lost that folder though, it guaranteed a victory in every 'duel plus it had some jaks that I think are lost on the current booru
there could be deleted images too within those 70k images so i will need to handle that in the event i find an id that does not work
 
my motivation is that i am trans btw the sheer autism must be preserved and i have the storage or something, also the current shimmie scraper doesn't scrape comments from what i've heard so a custom scraper isn't too hard to make because all i gotta do is parse the dom, grab the image url and iterate through all the comments
as of writing there are 74092 submissions on the 'ru so i think that's a pretty large number but not large enough where it can't all be scraped within a day
It's hard to scrape the 'ru independently these days because it has captcha. If you can then I might ask you to make a scraper for only my posts on there as well.

Will the data scaped preserve the tags and uploader of the images?
 
also the scraping process is almost complete i only need to scrape 2000 more images
 
Back
Top