By Yinggan XU Dibbla
This is generated by a previous courses (not included in Lee’s 2022 series), video can be found: RNN
The RNN aims to deal with sequential inputs. We can first focus on the problem of slot filling:
Time:______ Destination:_____
Here, the Time and Destination are the slots. We could like to automatically fill in the slots with given sentence: I would like to fly to Taipei on Nov 2nd. We have to know “Taipei” is the destination and “Nov 2nd” is the time.
Of course we can use a plain NN to accomplish the task.
- Convert word to vector (1-of-N encoding, word-hashing…)
- Input the word
- Output a distribution indicating which slot the word belongs to
But it’s not enough.
Time:______ Destination:_____ Place of Depart:_____
Given: I’d like to leave New York to Taipei on Nov 2nd. We have to use/memorize the context. Here we use RNN, an NN with “memory”.
graph TB
id1[input1]
id2[input2]
id3[Neuron1 with bias]
id4[Neuron2 with bias]
id5[output Neuron1]
id6[output Neuron2]
id1-->id3
id1-->id4
id2-->id3
id2-->id4
id3-->id5
id3-->id6
id4-->id6
id4-->id5
id7[a1]
id8[a2]
id7-->id3-->id7
id8-->id4-->id8
id7-->id4
id8-->id5
a1, a2 are initialized with certain value, say 0 and all weights are 1. For input [1,1], the Neuron 1&2 with bias both output $(1+1)+(1+1)+0+0=2$, so a1 is 2 and a2 is 2. For second input [1,1], Neuron 1 outputs $1+1+a_1+a_2=6$, Neuron 2 outputs $1+1+a_1+a_2=6$.
So, $$N1 = w_1^Tx+b_1+w_aa = w_1^Tx+b_1+w_{a1}a_1+w_{a_2}a_2$$ $$N2 = w_2^Tx+b_2+w_aa$$
And the whole process works like this:
This is called Elman Network, and if the memory stores the final output of the network, it is the Jordan Network.
The Network can also be bi-directional: