Ohk !!! So , you develop your amazing scrapping bot , test
it on your local machine. Everything works well.
Now next logical step
should be uploading it somewhere on the cloud. We chose amazon EC2 in this case.
Upload it. And try to run. BAM!!!
The website that you were scrapping has blocked your EC2 public
IP. Disappointment and depression hit hard☹.
This is a common scenario, AWS being one the most popular cloud
service often get blocked by many websites. Here you can try to find some cloud
provider which is not blocked, but you want your favorite cloud provider right?
I was trying to fetch few data points ( they were just 2
fetch in entire day ) using NSEpy python module , but NSE website ( https://www.nseindia.com/ ) has blocked
EC2 IP.
So, let us talk about obvious solution: Setting Up Proxy.
Words like Proxy , VPN , TOR are popular when it comes to internet
censorship.
So, what
is Proxy?
Proxy is another computer which sits between your machine
and outer internet world. So, when you access the internet, traffic goes
through your proxy.
Proxy access the internet and you access the proxy. Even
though you can be in India, but your proxy can be anywhere in the world.
For example let say if Tiktok is banned in India but legal
in UK , so you can set up a proxy or buy a proxy which is located in UK and
connect with that proxy and everything will work.
Read more about proxy here:
https://networkencyclopedia.com/proxy-server/
How to
Buy a Private Proxy?
So, even though there are many public proxies i.e open
public computers out there which you can use as proxy, if you are developing
some decent software I would recommend to go with private proxy. This would
really save your time. And later we will see how we can set this up on EC2.
In this example we will use this provider: https://proxy6.net/
They have great chat support and I have personally used them.
First, we
want to test if proxy works or not.
Steps to test proxy:
1) Create account with proxy6 (sorry for referral 😊
) : https://proxy6.net/en/?r=240625
2) Now remember your goal: you need a proxy which is not
blocked by your given website. In my case https://www.nseindia.com/.
3) Proxy6 have good chat support system, contact them on
chat and ask for a TEST proxy. They will provide you a proxy for 15 minutes in
your account. So, you have 15 minutes to test if it works for your purpose.
4) Now let us test this on windows. Go to your proxy setting
enter your IP and PORT, turn on your proxy.
5) Now go to your browser, chrome in my case. Try to access
your targeted website. It will ask your username and password. And then check
the if you can access the website if yes, your proxy works well, if no you need
to try different one.
Now once you verified this, you can use this proxy for your
EC2, but remember you had this proxy for only 15 minutes. so, you need to renew
this. Please read PART 2 for how to set this up on EC2
No comments:
Post a Comment