因为在爬虫时,如果使用了默认的IP 可能导致自己的IP遭到封禁
所以就要隐藏自己的IP
事先说明 爬虫要有度 也要考虑服务器的压力
本篇基于win10
tor的原理
http://www.cnblogs.com/likeli/p/5719230.html
http://blog.csdn.net/whiup/article/details/52317779
https://www.deepdotweb.com/2014/05/23/use-tor-socks5-proxy/
1.安装tor浏览器
http://www.theonionrouter.com/projects/torbrowser.html.en
如果进不去网页 那么请自行解决
2.tor的配置请看这篇
https://jingyan.baidu.com/article/adc815137654fbf723bf73b1.html
这样就可以搭建好了tor
python 要安装库
pip install pysocks
pip install stem
import socks
import socket
import requests
socks.set_default_proxy(socks.SOCKS5,"127.0.0.1",9150)
socket.socket = socks.socksocket
a = requests.get("http://checkip.amazonaws.com").text
print a
通过访问
http://checkip.amazonaws.com
会得到一个ip 会发现这个ip已经是隐藏的ip了
controller.signal(Signal.NEWNYM)
切换ip
#coding=utf-8
from stem import Signal
from stem.control import Controller
import socket
import socks
import requests
import time
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
controller = Controller.from_port(port=9151)
controller.authenticate()
socks.set_default_proxy(socks.SOCKS5,"127.0.0.1",9150)
socket.socket = socks.socksocket
total_scrappy_time = 0
total_changeIP_time = 0
for x in range(0,10):
a = requests.get("http://checkip.amazonaws.com").text
print ("第"+str(x+1)+"次IP:"+a)
time1 = time.time()
a = requests.get("http://www.santostang.com/").text
time2 = time.time()
total_scrappy_time = total_scrappy_time + time2-time1
print ("第"+str(x+1)+"次抓取花费时间:"+str(time2-time1))
time3 = time.time()
controller.signal(Signal.NEWNYM)
time.sleep(5)
time4 = time.time()
total_changeIP_time = total_changeIP_time + time4-time3-5
print ("第"+str(x+1)+"次更换IP花费时间: "+str(time4-time3-5))
print ("平均抓取花费时间:"+str(total_scrappy_time/10))
print ("平均更换IP时间:"+str(total_changeIP_time/10))
注明 本篇文章参考了 唐松老师的书《python网络爬虫从入门到实践》 请购买正版