Sunday 27 September 2020

Avoiding censorship while web scraping : Testing , Buying and Setting up web proxies on AWS EC2 ft. proxy6.net PART 1

Ohk !!! So , you develop your amazing scrapping bot , test it on your local machine. Everything works well.

Now next logical step should be uploading it somewhere on the cloud. We chose amazon EC2 in this case. Upload it. And try to run. BAM!!!

The website that you were scrapping has blocked your EC2 public IP. Disappointment and depression hit hard.

This is a common scenario, AWS being one the most popular cloud service often get blocked by many websites. Here you can try to find some cloud provider which is not blocked, but you want your favorite cloud provider right?

I was trying to fetch few data points ( they were just 2 fetch in entire day ) using NSEpy python module , but NSE website (  https://www.nseindia.com/ ) has blocked EC2 IP.

So, let us talk about obvious solution: Setting Up Proxy.

Words like Proxy , VPN , TOR are popular when it comes to internet censorship.

So, what is Proxy?

Proxy is another computer which sits between your machine and outer internet world. So, when you access the internet, traffic goes through your proxy.

Proxy access the internet and you access the proxy. Even though you can be in India, but your proxy can be anywhere in the world.

For example let say if Tiktok is banned in India but legal in UK , so you can set up a proxy or buy a proxy which is located in UK and connect with that proxy and everything will work.

Read more about proxy here:

https://networkencyclopedia.com/proxy-server/



How to Buy a Private Proxy?

So, even though there are many public proxies i.e open public computers out there which you can use as proxy, if you are developing some decent software I would recommend to go with private proxy. This would really save your time. And later we will see how we can set this up on EC2.

In this example we will use this provider: https://proxy6.net/

They have great chat support and I have personally used them.


 

 

First, we want to test if proxy works or not.

 

Steps to test proxy:

 

1) Create account with proxy6 (sorry for referral 😊 ) :  https://proxy6.net/en/?r=240625

2) Now remember your goal: you need a proxy which is not blocked by your given website. In my case https://www.nseindia.com/.

3) Proxy6 have good chat support system, contact them on chat and ask for a TEST proxy. They will provide you a proxy for 15 minutes in your account. So, you have 15 minutes to test if it works for your purpose.



4) Now let us test this on windows. Go to your proxy setting enter your IP and PORT, turn on your proxy.



5) Now go to your browser, chrome in my case. Try to access your targeted website. It will ask your username and password. And then check the if you can access the website if yes, your proxy works well, if no you need to try different one.



Now once you verified this, you can use this proxy for your EC2, but remember you had this proxy for only 15 minutes. so, you need to renew this. Please read PART 2 for how to set this up on EC2


 

 

 

 

No comments:

Post a Comment