Show HN: Web Scraping with Your Web Browser: Why Not?

joshdavham

> can you write a web scraper in your browser? The answer is: YES, you can! So why is nobody doing it?

Completely agree with this sentiment.

I just spent the last couple of months developing a chrome extension, but recently also did an unrleated web scraping project where I looked into all the common tools like beautiful soup, selenium, playwright, pupeteer, etc, etc.

All of these tools were needlessly complicated and I was having a ton of trouble with sites that required authentication. I then realized it would be way easier to write some javascript and paste it in my browser to do the scraping. Worked like a charm!

deisteve

is there anything that runs on WASM for scraping? the issue is that you need to enable flags and turn off other security features to scrape on your web browser and this is why its not popular but with WASM that might change

ljw1004

In my web-scraping I've gravitated towards the "cheerio" library for javascript.

I kind of don't want to use DOMParser because it's browser-only... my web-scrapers have to evolve every few years as the underlying web pages change, so I really want CI tests, so it's easiest to have something that works in node.

gabrielsroka

Why do you need a proxy or to worry about CORS? Why not just point your browser to rumble.com and start from there?

I've posted here about scraping for example HN with JavaScript. It's certainly not a new idea.

2020: https://news.ycombinator.com/item?id=22788236

welder

Neo already did that in the Matrix:

https://www.youtube.com/watch?v=sjoad6gcRzs

simlan

I also did something similar for my spring project. The idea was to buy a used car and I was frustrated with the BS the listing sites claimed as fair price etc..

I went the browser extension route and used grease monkey to inject custom JavaScript. I patched the window.fetch and because it was a react page it did most of the work for me providing me with a slightly convolute JSON doc everytime I scrolled. Getting the data extracted was only a question of getting a flask API with correct CORS settings running.

Thanks for posting using a local proxy for even more control could be helpful in the future.

dewey

I've read through that (hard to read, because of the bad formatting) but I still don't understand why you would do that instead of Playwright, Puppeteer etc. - The only reason seems to be "This technique certainly has its limits.".

datadrivenangel

I've been playing around with this idea lately as well! There are a lot of web interfaces that are hostile to scraping, and I see no reason why we shouldn't be able to use the data we have access to for our own purposes. CUSTOMIZE YOUR INTERFACES

chaosharmonic

> You can find plenty of tutorials on the Internet about the art of web scraping... and the first things you will learn about are Python and Beautiful Soup. There is no tutorial on web scraping with Javascript in a web browser...

Um... [0]

[0] https://bhmt.dev/blog/scraping