logoalt Hacker News

gabrielsroka10/01/20241 replyview on HN

Why do you need a proxy or to worry about CORS? Why not just point your browser to rumble.com and start from there?

I've posted here about scraping for example HN with JavaScript. It's certainly not a new idea.

2020: https://news.ycombinator.com/item?id=22788236


Replies

CharlieDigital10/01/2024

    > Why do you need a proxy or to worry about CORS? 
Not sure about OP, but you might want to point to a proxy depending on the site/content you are scraping and your location. For example, if you are in Canada but you want to scrape in USD, you might need to use a proxy located in the US to get US prices.

    > Why not just point your browser to rumble.com and start from there?
Some endpoints use simple web application firewall rules that will block IPs. In this case, a rotating proxy can help evade the blocks (and prevent your legitimate traffic from being blocked). Some domains use more sophisticated WAFs like Imperva and will do browser fingerprinting so you'll need even more advanced techniques to scrape successfully.

Source: work at a startup that does a lot of scraping and these are issues we've run into. Our entire office network is blocked from some sites due to some early testing without a proxy.