幻想过这样的两种能力,一是回到过去,二是预见未来。时间逆转回到过去,这更多的是在文艺作品中能够出现的情节。而预见未来,我们正在努力,希望可以更准确地预见更长时间内更多的细节。例如在瞬息万变的股票交易市场中,我们可能会利用NLP来判断股市舆情,或者借助机器学习的方法来预测股市行情大势,又或者可以通过大数据找出不同股票间的隐性关联,从而获取正确的投资策略。而实现这一切,都需要大量数据来支撑我们的试验,在这里我将介绍如何高效获取股票交易历史数据的方法,让大家都能快速获取数据完成各项试验:
下面我先定义一下代号和对应的证券交易所的名称:
Code | Stock Exchange |
---|---|
SHA | Shanghai Stock Exchange |
SHE | Shenzhen Stock Exchange |
HKG | Hong Kong Stock Exchange |
LON | London Stock Exchange |
NASDAQ | NASDAQ Stock Exchange |
NYSE | New York Stock Exchange |
AMEX | American Stock Exchange |
ASX | Australian Securities Exchange |
BVMF | Bovespa Stock Exchange |
CVE | Toronto TSX Ventures Stock Exchange |
TSE | Toronto Stock Exchange |
KSE | Korea Stock Exchange |
NSE | National Stock Exchange of India |
NZE | New Zealand Stock Exchange |
SGX | Singapore Exchange |
STO | NASDAQ OMX Stockholm |
TPE | Taiwan Stock Exchange |
TYO | Tokyo Stock Exchange |
现在我们下载StockData,完成后进入目录可以看到:
可以看到一个文件夹symbol, 还有三个Python文件, 分别是data.py, run.py和stock.py。我们试运行一下run.py:
由上可知,要获取任意证券交易所的交易信息,只需要一行命令便能实现。例如,以下我们要获取深圳证券交易所(SHE),其中的八只股票的全部交易信息,并存放在目录SHE_8下面:
可以看到,下载的文件全部保存为.csv格式,并且名字是所对应股票的股票代码。例如000001.csv中的000001便是深圳证券交易所下面平安银行的代码,现在输入命令查看$ cat SHE_8/000001.csv | less, 可以看到获取到的平安银行的交易信息是从1991年开始的:
现在我们试试将一个交易所全部股票的交易信息下载下来,例如将London Stock Exchange(LON)下载存放在LON_ALL目录下面, 运行命令$ python run.py LON_All LON:
进入目录LON_ALL, 可以看到从London Stock Exchange(LON)一共下载了4663只股票的数据:
实现简介,其中symbol中收集各个证券交易所的股票代码,data.py负责具体下载数据, stock.py负责对具体证券交易所股票的遍历,run.py负责接收命令行参数。其中stock.py的代码如下:
#stock.py
import json
import os
from data import DataReader
exchanges = {'SHE':'symbol/SHE.txt','AMEX':'symbol/AMEX.txt','NASDAQ':'symbol/NASDAQ.txt','NYSE':'symbol/NYSE.txt','SHA':'symbol/SHA.txt','KSE':'symbol/KSE.txt', 'TPE':'symbol/TPE.txt','LON':'symbol/LON.txt','SGX':'symbol/SGX.txt','TYO':'symbol/TYO.txt','TSE':'symbol/TSE.txt','CNSX':'symbol/CNSX.txt', 'CVE':'symbol/CVE.txt','NZE':'symbol/NZE.txt','ASX':'symbol/ASX.txt','BVMF':'symbol/BVMF.txt','HKG':'symbol/HKG.txt','NSE':'symbol/NSE.txt','BOM':'symbol/BOM.txt','STO':'symbol/STO.txt'}
def stock_data(path='stock_file',exchange='SHE',num = 10000):
com_num = 1
fh = open(exchanges[exchange])
lines = fh.readlines()
fh.close()
if not os.path.isdir(path):
os.mkdir(path)
print '\nCreate a folder: '+path+'\n'
for line in lines:
if com_num > num:
break
sym = json.loads(line)[0]
tmp = sym
if exchange=='SHE':
sym += '.SZ'
elif exchange=='SHA':
sym += '.SS'
elif exchange=='TPE':
sym += '.TW'
elif exchange=='KSE':
sym += '.KS'
elif exchange=='LON':
sym += '.L'
elif exchange=='SGX':
sym += '.SI'
elif exchange=='NZE':
sym += '.NZ'
elif exchange=='ASX':
sym += '.AX'
elif exchange=='HKG':
sym += '.HK'
elif exchange=='BVMF':
sym += '.SA'
elif exchange=='STO':
sym += '.ST'
elif exchange=='TYO':
pass
elif exchange in ['TSE','CNSX','CVE']:
sym += '.TO'
elif exchange in ['NSE','BOM']:
sym += '.BO'
elif exchange in ['AMEX','NASDAQ','NYSE']:
pass
try:
data = DataReader(sym, 'yahoo', start='5/20/1900').to_csv()
except:
print str(com_num)+': '+'\033[0;31mNot available\033[0m '+tmp+'.csv'+'\n'
continue
fh = open(path+'/'+tmp+'.csv','w')
fh.write(data)
fh.close()
print str(com_num)+': \033[0;32mDownloaded\033[0m '+tmp+'.csv'+'\n'
com_num += 1
print '\033[0;33mCongratulations! Downloaded '+str(com_num-1)+' files!\033[0m\n'
总结,一步下载全部股票信息的命令如下:
$ python run.py folder_path trading_market_code