## Highway Networks For Sentence Classification

2017 年 9 月 30 日 哈工大SCIR 刘宗林

1. Notation

(.) 操作代表的是矩阵按位相乘

sigmoid函数：

2. Highway Networks formula

1. Highway BiLSTM Networks  Structure Diagram

2. Highway BiLSTM Networks  Demo

pytorch搭建神经网络一般需要继承nn.Module这个类，然后实现里面的forward()函数，搭建Highway BiLSTM Networks写了两个类，并使用nn.ModuleList将两个类联系起来：

  class HBiLSTM(nn.Module):        def __init__(self, args):                super(HBiLSTM, self).__init__()                ......        def forward(self, x):                # 实现Highway BiLSTM Networks的公式                ......
  class HBiLSTM_model(nn.Module):         def __init__(self, args):                super(HBiLSTM_model, self).__init__()                ......                # args.layer_num_highway 代表Highway BiLSTM Networks有几层                self.highway = nn.ModuleList([HBiLSTM(args)           for _ in range(args.layer_num_highway)])                ......       def forward(self, x):                 ......                # 调用HBiLSTM类的forward()函数                for current_layer in self.highway:                    x, self.hidden = current_layer(x, self.hidden)

HBiLSTM类的forward()函数里面我们实现Highway BiLSTM Networks的的公式。首先我们先来计算H，上文已经说过，H可以是卷积或者是LSTM，在这里，normal_fc就是我们需要的H。

   x, hidden = self.bilstm(x, hidden)          # torch.transpose是转置操作          normal_fc = torch.transpose(x, 0, 1)

  source_x = source_x.contiguous()    information_source = source_x.view(source_x.size(0)                   * source_x.size(1), source_x.size(2))    information_source = self.gate_layer(information_source)    information_source = information_source.view(source_x.size(0),                   source_x.size(1), information_source.size(1))

  # you also can choose the strategy that zero-padding    zeros = torch.zeros(source_x.size(0), source_x.size(1),                   carry_layer.size(2) - source_x.size(2))    source_x = Variable(torch.cat((zeros, source_x.data), 2))

  # transformation gate layer in the formula is T    transformation_layer = F.sigmoid(information_source)    # carry gate layer in the formula is C    carry_layer = 1 - transformation_layer    # formula Y = H * T + x * C    allow_transformation = torch.mul(normal_fc, transformation_layer)    allow_carry = torch.mul(information_source, carry_layer)        information_flow = torch.add(allow_transformation, allow_carry)

References

[1] R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. arXiv:1505.00387, 2015.

[2] R. K. Srivastava, K. Greff, and J. Schmidhuber. Training very deep networks. 1507.06228, 2015.

[3] Julian Georg Zilly, Rupesh Kumar Srivastava, Jan Koutník, and Jürgen Schmidhuber. Recurrent highway networks. arXiv preprint arXiv:1607.03474, 2016.

[4] X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In AISTATS, 2010.

“哈工大SCIR”公众号

### 相关内容

Ilya Sutskever是OpenAI的联合创始人和首席科学家，之前是斯坦福大学博士后，研究领域是机器学习，神经网络。

The classification of sentences is very challenging, since sentences contain the limited contextual information. In this paper, we proposed an Attention-Gated Convolutional Neural Network (AGCNN) for sentence classification, which generates attention weights from the feature's context windows of different sizes by using specialized convolution encoders. It makes full use of limited contextual information to extract and enhance the influence of important features in predicting the sentence's category. Experimental results demonstrated that our model can achieve up to 3.1% higher accuracy than standard CNN models, and gain competitive results over the baselines on four out of the six tasks. Besides, we designed an activation function, namely, Natural Logarithm rescaled Rectified Linear Unit (NLReLU). Experiments showed that NLReLU can outperform ReLU and is comparable to other well-known activation functions on AGCNN.

Text Classification is an important and classical problem in natural language processing. There have been a number of studies that applied convolutional neural networks (convolution on regular grid, e.g., sequence) to classification. However, only a limited number of studies have explored the more flexible graph convolutional neural networks (e.g., convolution on non-grid, e.g., arbitrary graph) for the task. In this work, we propose to use graph convolutional networks for text classification. We build a single text graph for a corpus based on word co-occurrence and document word relations, then learn a Text Graph Convolutional Network (Text GCN) for the corpus. Our Text GCN is initialized with one-hot representation for word and document, it then jointly learns the embeddings for both words and documents, as supervised by the known class labels for documents. Our experimental results on multiple benchmark datasets demonstrate that a vanilla Text GCN without any external word embeddings or knowledge outperforms state-of-the-art methods for text classification. On the other hand, Text GCN also learns predictive word and document embeddings. In addition, experimental results show that the improvement of Text GCN over state-of-the-art comparison methods become more prominent as we lower the percentage of training data, suggesting the robustness of Text GCN to less training data in text classification.

Memory-based neural networks model temporal data by leveraging an ability to remember information for long periods. It is unclear, however, whether they also have an ability to perform complex relational reasoning with the information they remember. Here, we first confirm our intuitions that standard memory architectures may struggle at tasks that heavily involve an understanding of the ways in which entities are connected -- i.e., tasks involving relational reasoning. We then improve upon these deficits by using a new memory module -- a \textit{Relational Memory Core} (RMC) -- which employs multi-head dot product attention to allow memories to interact. Finally, we test the RMC on a suite of tasks that may profit from more capable relational reasoning across sequential information, and show large gains in RL domains (e.g. Mini PacMan), program evaluation, and language modeling, achieving state-of-the-art results on the WikiText-103, Project Gutenberg, and GigaWord datasets.

The pre-dominant approach to language modeling to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens. We propose a novel simplified gating mechanism that outperforms Oord et al (2016) and investigate the impact of key architectural decisions. The proposed approach achieves state-of-the-art on the WikiText-103 benchmark, even though it features long-term dependencies, as well as competitive results on the Google Billion Words benchmark. Our model reduces the latency to score a sentence by an order of magnitude compared to a recurrent baseline. To our knowledge, this is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.

23+阅读 · 2019年8月13日

15+阅读 · 2018年12月18日
Datartisan数据工匠
4+阅读 · 2018年8月2日

23+阅读 · 2018年3月5日

13+阅读 · 2018年1月9日

6+阅读 · 2017年11月29日

15+阅读 · 2017年11月16日

9+阅读 · 2017年10月7日

15+阅读 · 2017年7月19日

12+阅读 · 2016年6月16日

Binxuan Huang,Kathleen M. Carley
8+阅读 · 2019年9月5日
Junjie Huang,Huawei Shen,Liang Hou,Xueqi Cheng
6+阅读 · 2019年9月5日
Yang Liu,Jianpeng Zhang,Chao Gao,Jinghua Qu,Lixin Ji
4+阅读 · 2019年8月25日
Yang Liu,Lixin Ji,Ruiyang Huang,Tuosiyu Ming,Chao Gao,Jianpeng Zhang
4+阅读 · 2018年12月28日
Liang Yao,Chengsheng Mao,Yuan Luo
12+阅读 · 2018年9月15日
Linfeng Song,Zhiguo Wang,Mo Yu,Yue Zhang,Radu Florian,Daniel Gildea
6+阅读 · 2018年9月6日
Adam Santoro,Ryan Faulkner,David Raposo,Jack Rae,Mike Chrzanowski,Theophane Weber,Daan Wierstra,Oriol Vinyals,Razvan Pascanu,Timothy Lillicrap
8+阅读 · 2018年6月28日
Jing Yu,Yuhang Lu,Zengchang Qin,Yanbing Liu,Jianlong Tan,Li Guo,Weifeng Zhang
3+阅读 · 2018年2月13日
Petar Veličković,Guillem Cucurull,Arantxa Casanova,Adriana Romero,Pietro Liò,Yoshua Bengio
8+阅读 · 2018年2月4日
Yann N. Dauphin,Angela Fan,Michael Auli,David Grangier
5+阅读 · 2017年9月8日
Top