Figure 1. Deep Feed Forward NN and RNN Left & Middle Figure Source: Fjodor van Veen - asimovinstitute.org
The intrinsic property of RNN is its hidden unit can capture previous info as memories, and this can be implemented by adding a loop to the original hidden unit(Shown in Figure 1. the bottom half). More details are shown in the left part in Figure 2. Memories of previous info would be passed through the loop in that graph, and the black sqaure indicates one time step delay interaction. The other way is to unfolding the operation by using multiple copies of a single unit, and the number of these copied units equals to the size of the input sequence. The first one is very succinct, while the second one allows the model use same transition functions and share parameters at every time step, which decreases the # of parameters a model need to learn.
Figure 2. Folded and Unfolded RNN Hidden Unit Structure Source: Nature
RNN Application Scenarios
So what is RNN capable of? What kind of scenarios can RNN be applied to? The answer is any scenes which are in great need of context information, e.g. NLP and Time Series tasks. To be more specific, the detailed structures of RNN can be customerized into different types, according to each task’s input and output requirements. For example, to summarize the content shown in a single image, a one-to-many structure would be a proper answer. For language translation, each word input needs to be translated. So the optimal structure would be many-to-many.
Figure 3. Different Types of RNN and Related Application Scenarios
Attention: Though RNN is designed to carry context info, actually it can only look back within limited steps. This is caused by vanishing gradients. As the info passes along the units, it would be multipled by a small number(<0) over and over again, and eventually become lost.
Long Short Term Mememory Neural Network, LSTM
To fix this, LSTM has been invented. Compared to RNN, three switching gates have been introduced to control the flow of all info passed through the neuron unit. To clearly explain why LSTM can overcome this long term dependency problem, the best solution is to take a close look at its hidden unit structure.
Figure 4. LSTM Hidden Unit Structure Please forgive my terrible handwriting
Forget Gate: decides which part to be discarded from the previous cell state.
Input Gate: what new info will be added, to update the cell state, and limit the input value range.
Output Gate: what info will be kept to the next hidden state, and limit the output value range.
More details about LSTM, please see a marvelous post written by Christopher Olah. Here I will talk about how LSTM enables longer dependency compared to RNN. LSTM keeps two main flows carrying the info through all hidden units, instead of one. One flow would be processed through multiple nonlinear functions(sigmoid, tanh) for removing or keeping info. While the other will be only processed by simple linear operations, which is less likely to be lost. (The two horizontal lines in the left part of Figure 4 represent the two flows, carrying cell state info and hidden state info.)
Gated Recurrent Unit, GRU
There are several LSTM variants, containing various different computational components(e.g. peephole, full gate… details see in [6]), offering different roles and utilities. Here I’ll talk about a very famous variant, the Gated Recurrent Unit Neural Network.
GRU has removed the seperate cell state flow, and keeps only two gates: reset gate and update gate. The reset gate functions as the forget gate in LSTM, throwing away past info. The update gate decides what to throw from previous hidden state and what to keep in the candidate hidden state. Details shown in the following Figure.
Figure 4. GRU Hidden Unit Structure
Though there is no seperate cell state flow in GRU, in my opinion, the top horizontal line carrying the hidden state info works similarly as the top cell state line in LSTM. Because only simple linear functions have been operated on this top line in GRU. Another difference between LSTM and GRU is GRU doesn’t control its input and output value range(no tanh function to control the range). About LSTM and GRU, which one is better? It’s hard to tell, after all them work in a very similar way. Maybe GRU runs faster to converge, due to less operations. I think the answer depends on the specific situation where these models have been performed.
For the past 4 months, I have been working on cardiovascular disease risk prediction. Through this, I come up with an idea to utilize GAN to learn in a progressive way and decide to write a paper on this topic(Sry, I can’t talk much about my idea in details). Then, I began doing background research and found three related topic. In this post, I will give summarizations of these topic.
NLP algorithms are designed to learn from language, which is usually unstructured with arbitrary length. Even worse, different language families follow different rules. Applying different sentense segmentation methods may cause ambiguity. So it is necessary to transform these information into appropriate and computer-readable representation. To enable such transformation, multiple tokenization and embedding strategies have been invented. This post is mainly for giving a brief summary of these terms. (For readers, I assume you have already known some basic concepts, like tokenization, n-gram etc. I will mainly talk about word embedding methods in this blog)
Recently, I have been working on NER projects. As a greener, I have spent a lot of time in doing research of current NER methods, and made a summarization. In this post, I will list my summaries(NER in DL), hope this could be helpful for the readers who are interested and also new in this area. Before reading, I assume you already know some basic concepts(e.g. sequential neural network, POS,IOB tagging, word embedding, conditional random field).
This post is for explaining some basic statistical concepts used in disease morbidity risk prediction. As being a CS student, I have found a hard time figuring out these statistical concepts, hope my summary would be helpful for you.
Leave a Comment