爬虫学习笔记--Tor隐藏Ip

因为在爬虫时,如果使用了默认的IP 可能导致自己的IP遭到封禁

所以就要隐藏自己的IP

事先说明 爬虫要有度 也要考虑服务器的压力

本篇基于win10

tor的原理

http://www.cnblogs.com/likeli/p/5719230.html

http://blog.csdn.net/whiup/article/details/52317779

https://www.deepdotweb.com/2014/05/23/use-tor-socks5-proxy/

1.安装tor浏览器

http://www.theonionrouter.com/projects/torbrowser.html.en

如果进不去网页 那么请自行解决

2.tor的配置请看这篇

https://jingyan.baidu.com/article/adc815137654fbf723bf73b1.html

这样就可以搭建好了tor

python 要安装库

pip install pysocks

pip install stem

import socks

import socket

import requests

socks.set_default_proxy(socks.SOCKS5,"127.0.0.1",9150)

socket.socket = socks.socksocket

a = requests.get("http://checkip.amazonaws.com").text

print a

通过访问

http://checkip.amazonaws.com

会得到一个ip 会发现这个ip已经是隐藏的ip了

controller.signal(Signal.NEWNYM)

切换ip

#coding=utf-8

from stem import Signal

from stem.control import Controller

import socket

import socks

import requests

import time

import sys

reload(sys)

sys.setdefaultencoding('utf-8')

controller = Controller.from_port(port=9151)

controller.authenticate()

socks.set_default_proxy(socks.SOCKS5,"127.0.0.1",9150)

socket.socket = socks.socksocket

total_scrappy_time = 0

total_changeIP_time = 0

for x in range(0,10):

a = requests.get("http://checkip.amazonaws.com").text

print ("第"+str(x+1)+"次IP:"+a)

time1 = time.time()

a = requests.get("http://www.santostang.com/").text

time2 = time.time()

total_scrappy_time = total_scrappy_time + time2-time1

print ("第"+str(x+1)+"次抓取花费时间:"+str(time2-time1))

time3 = time.time()

controller.signal(Signal.NEWNYM)

time.sleep(5)

time4 = time.time()

total_changeIP_time = total_changeIP_time + time4-time3-5

print ("第"+str(x+1)+"次更换IP花费时间: "+str(time4-time3-5))

print ("平均抓取花费时间:"+str(total_scrappy_time/10))

print ("平均更换IP时间:"+str(total_changeIP_time/10))

注明 本篇文章参考了 唐松老师的书《python网络爬虫从入门到实践》 请购买正版


更多精彩内容