斯坦福cs224课程笔记

2018/3/5 Uncategorized

CS224N: Natural Language Processing with Deep Learning

课程主页：
CS224n: Natural Language Processing with Deep Learning

http://web.stanford.edu/class/cs224n/syllabus.html

Lecture 6: Dependency Parsing

Lecture 7: Tensorflow Tutorial

Lecture 8: RNN

vanishing gradient：initialization + Relu

exploding gradient：clipping trick

beam search

bidirectional RNN：双向rnn

Lecture 9: Fancy Recurrent Neural Networks for Machine Translation

MT：machine translation

机器翻译原始模型：

改进：

Encoder和Decoder使用不同的W，不再共享
解码过程，最后一个时刻的c，要传送给decoder的每个时刻的ht和yt;

同时，当前时刻的输出y
t
也要传给下一时刻的h
t+1
和y
t
+1

因此，encoder的h
t
取决于c、h
t-1、
y
t-1

deep RNN with multiple layers
bidirectional encoder
把输入序列反转

Lecture 10: Midterm recap

SGD：

Lecture 11: machine translation and models with attention

相比于传统机器翻译，NMT的优点：

Attention

Global VS Local：

Beam Search

https://zhuanlan.zhihu.com/p/28048246

– 只用在test阶段，train和valid不用；

– 用在seq2seq生成模型中 decode阶段，要使生成序列的概率最大，取每个时刻上概率最大不可行；

– 贪心的思想；
在每个时刻，对每个已有序列，对词典中所有的词计算概率，取概率最大的前k个序列；

– 结果不一定是全局最优；

greedy search：
贪婪地选取当前最可能的那个单词

解决Large-vocab问题

（下一讲最后）

vocab太大，每次预测概率时求softmax太耗时

解决办法：

1.segment data：

2.Select candidate words

Lecture 12: Further topics in Neural Machine Translation and Recurrent Models

LSTM

Lecture 13: End-to-end models for Speech Processing

Lecture 14：Convolutional Neural Networks (for NLP)

Lecture 15：Tree Recursive Neural Networks and Constituency Parsing

treeRNN

Lecture 16：Coreference Resolution

指代消解

Lecture 17：Dynamic Neural Networks for Question Answering

Lecture 18：Issues in NLP and Possible Architectures for NLP

Lecture 19：Tackling the Limits of Deep Learning for NLP